How does the likelihood of moving across US regions vary with changes in household characteristics, and how does the risk of a change in status vary given a move? Statistics aimed at these questions are calculated for households who earned formal market income in the US, 2001–2015, totaling about 1.7 billion observations with 82.7 million long-distance moves, and covering statuses such as income, school enrollment, age, number of children, local cost of living, and retirement or marital status. The key theoretical result of this article shows that the Cochran–Mantel–Haenszel statistic is the unique aggregate risk ratio within a broad class that has the “subset stability” property: If a statistic has value \(s_1\) for one subset and \(s_2\) for another, then the statistic for the union of the two sets is between \(s_1\) and \(s_2\). A sequence of pseudo-experiments generate a wealth of tests regarding the relationship between moving and a broad range of household characteristics, for the full population and salient subsets, with some focus on the characteristics of the 44.2% of movers who see negative income returns relative to the counterfactual of staying.

Due to IRS restrictions, the data can not be made public, but will be made available upon request to the IRS Statistics of Income division, after the appropriate clearance under 26 USC §6103.
Portions not containing IRS-restricted information are available upon request.
It is not always clear that a truly random experiment is desirable. Military moves due to redeployment perhaps approximate a truly random allocation [7], but individuals who chose to be in military families may have unobservable characteristics systematically different from those who do not. Whether these results would apply to families where one member is randomly drafted into the military is unknown. Similarly, a randomized trial hoping to describe outcomes for future movers would first find households who chose to move of their own volition, then make a randomized interference into some subset of that subpopulation. This may be impossible using general population surveys or administrative records.
It also features exposure to climate change; McLeman, et al [44] discuss the resulting out-migration.
As discussed in the appendix, 80km is also the definition of a move used by the US Internal Revenue Service. These are not small moves: the IRS Statistics of Income division estimates $3.5 billion in moving expenses claimed by those moving over 80km in 2016.
Alternatives to the strict adherence to a controlled pseudo-experiment, instead relying on household history, create more difficulties than they resolve. Classifying movers by their full pattern of moves is error-prone (what if a household moves twice in the same year?), and requires arbitrary decisions about how to treat different series. Is a mover who moves in years 1, 2, and 4 comparable to one who moves in years 1, 3, and 4? Throwing out moving households after they move again creates a sample that answers the question “what is the outcome from moving once and never moving again relative to the counterfactual of never moving?”, but this is is a biased measure of any activity among the full population. Specific questions about chain migrants versus once-in-a-lifetime movers is reserved for future research.
The more common version of the CMH statistic is an odds ratio, not a risk ratio. Odds is calculated by the ratio of count of occurrence of an event over count of non-occurrence; risk is the ratio of the same occurrence count over the full count of the population [54]. An odds ratio or risk ratio is the ratio of two so-defined odds or risks.
This article relies on the risk ratio. Colloquial references to the chance, likelihood, and typically even odds of an event refer to the risk, not the odds as defined here. The odds ratio is symmetric, giving equal odds to the chance of moving among retirees versus non-retirees, and the odds of retiring among movers versus stayers, for example. The risk ratio gives distinct values for the two, which can better advise causal inquiries.
For relatively unlikely events, such as a health condition in a typical medical study, the odds ratio approximates the risk ratio, but as the likelihood of the event grows, the odds overestimates the risk to the point of being almost unusable for discussing the relative chance that an event will occur [62].
In medical studies, when subjects are selected ex ante and split into ceteris paribus cells based on observed covariates, there is bias in the measure of odds or risk ratios, and so the CMH statistic, to the extent that those controlled covariates correlate to the outcome [12, 14, 58]. But that is not the situation in typical administrative record or commercial data sets, with a defined universe of observations with no subject selection. Multiple testing issues in mereological methods [17] are not a consideration for descriptive studies, or can be adjusted via methods such as Bonferroni corrections.
Via https://apps.bea.gov/iTable/index_regional.cfm, accessed April 2021.
This article was written during course of business by a US Treasury employee, as part of a project to improve tax modeling via improvements in demographic modeling.
Sole author. Much of the data preparation work was done before and independently of this study, as acknowledged in Sect. 3.
This article is a component of a larger study on the population characteristics underlying models of the inputs to tax revenue calculations, and how they evolve over time. Thanks to David Bridgeland, Randy Capps, Adam Cole, Aaron Schumacher, Bethany DeSalvo, Robin Fisher, Chung Kim, Gray Kimbrough, Elizabeth Landau, Ithai Lurie, Nick Turner, Elizabeth Maggie Penn, Joshua Tauberer, and the compilers of the data bank, Raj Chetty, John Friedman, Emmanuel Saez, Danny Yagan, and their counterparts at IRS.
- Migration
- Administrative records
- Demographic analysis
- Relative risk
- Risk ratios
- Returns to education
- Retirement