abstract:
In empirical work in economics it is common to report standard errors that account for clustering of units. Typically the motivation given for the clustering adjustments is that unobserved components for units within clusters are correlated. This motivation makes it difficult to justify clustering by some partitioning but not by others, say age cohorts or gender.
We argue that clustering is in essence a design problem. It can be a sampling design issue, with the sampling following a two stage process where in the first stage a subset of clusters were sampled randomly from a population of clusters, followed by a second stage where units were sampled randomly from the sampled clusters. We argue that the design perspective on clustering clarifies the role of clustering adjustments to standard errors and aids in the decision whether to, and at what level to, cluster.
For example, we show that, contrary to common wisdom, correlations between residuals within clusters are neither necessary, nor sufficient, for cluster adjustments to matter. Similarly, correlations between regressors within clusters are neither necessary, not sufficient, for cluster adjustments to matter. In fact, we show that cluster adjustments can matter, and substantially so, even when both residuals and regressors are uncorrelated within clusters. Moreover, we show that the question whether, and at what level, to adjust standard errors for clustering is a substantive question that cannot be informed solely by the data.