Step 1: Think of the variance decomposition for cluster sampling. The total population variance splits into a between-cluster component and a within-cluster component.
Step 2: The sampling variance of the cluster-sample mean depends mainly on the between-cluster variance, since only some clusters are chosen and each chosen cluster is studied completely. If clusters are internally heterogeneous (large within-cluster spread), the between-cluster means tend to be close to each other and close to the population mean, so between-cluster variance is small.
Step 3: A small between-cluster variance directly lowers the sampling error of the cluster-sample estimator, making the design more efficient. So clusters should deliberately be built so units differ a lot within each cluster.
This confirms that efficiency improves when there is $\text{high internal (within-cluster) variation}$.
\[\boxed{\text{More variation among units within a cluster}}\]