Abstract
How does the likelihood of moving across US regions vary with changes in household characteristics, and how does the risk of a change in status vary given a move? Statistics aimed at these questions are calculated for households who earned formal market income in the US, 2001–2015, totaling about 1.7 billion observations with 82.7 million long-distance moves, and covering statuses such as income, school enrollment, age, number of children, local cost of living, and retirement or marital status. The key theoretical result of this article shows that the Cochran–Mantel–Haenszel statistic is the unique aggregate risk ratio within a broad class that has the “subset stability” property: If a statistic has value \(s_1\) for one subset and \(s_2\) for another, then the statistic for the union of the two sets is between \(s_1\) and \(s_2\). A sequence of pseudo-experiments generate a wealth of tests regarding the relationship between moving and a broad range of household characteristics, for the full population and salient subsets, with some focus on the characteristics of the 44.2% of movers who see negative income returns relative to the counterfactual of staying.








Similar content being viewed by others
Availability of data and material
Due to IRS restrictions, the data can not be made public, but will be made available upon request to the IRS Statistics of Income division, after the appropriate clearance under 26 USC §6103.
Code availability
Portions not containing IRS-restricted information are available upon request. See also the author’s Cochrane-Mantel-Haenszel statistic calculator at https://github.com/b-k/cmh.py/.
Notes
It is not always clear that a truly random experiment is desirable. Military moves due to redeployment perhaps approximate a truly random allocation [7], but individuals who chose to be in military families may have unobservable characteristics systematically different from those who do not. Whether these results would apply to families where one member is randomly drafted into the military is unknown. Similarly, a randomized trial hoping to describe outcomes for future movers would first find households who chose to move of their own volition, then make a randomized interference into some subset of that subpopulation. This may be impossible using general population surveys or administrative records.
It also features exposure to climate change; McLeman, et al [44] discuss the resulting out-migration.
As discussed in the appendix, 80km is also the definition of a move used by the US Internal Revenue Service. These are not small moves: the IRS Statistics of Income division estimates $3.5 billion in moving expenses claimed by those moving over 80km in 2016.
Alternatives to the strict adherence to a controlled pseudo-experiment, instead relying on household history, create more difficulties than they resolve. Classifying movers by their full pattern of moves is error-prone (what if a household moves twice in the same year?), and requires arbitrary decisions about how to treat different series. Is a mover who moves in years 1, 2, and 4 comparable to one who moves in years 1, 3, and 4? Throwing out moving households after they move again creates a sample that answers the question “what is the outcome from moving once and never moving again relative to the counterfactual of never moving?”, but this is is a biased measure of any activity among the full population. Specific questions about chain migrants versus once-in-a-lifetime movers is reserved for future research.
The more common version of the CMH statistic is an odds ratio, not a risk ratio. Odds is calculated by the ratio of count of occurrence of an event over count of non-occurrence; risk is the ratio of the same occurrence count over the full count of the population [54]. An odds ratio or risk ratio is the ratio of two so-defined odds or risks.
This article relies on the risk ratio. Colloquial references to the chance, likelihood, and typically even odds of an event refer to the risk, not the odds as defined here. The odds ratio is symmetric, giving equal odds to the chance of moving among retirees versus non-retirees, and the odds of retiring among movers versus stayers, for example. The risk ratio gives distinct values for the two, which can better advise causal inquiries.
For relatively unlikely events, such as a health condition in a typical medical study, the odds ratio approximates the risk ratio, but as the likelihood of the event grows, the odds overestimates the risk to the point of being almost unusable for discussing the relative chance that an event will occur [62].
In medical studies, when subjects are selected ex ante and split into ceteris paribus cells based on observed covariates, there is bias in the measure of odds or risk ratios, and so the CMH statistic, to the extent that those controlled covariates correlate to the outcome [12, 14, 58]. But that is not the situation in typical administrative record or commercial data sets, with a defined universe of observations with no subject selection. Multiple testing issues in mereological methods [17] are not a consideration for descriptive studies, or can be adjusted via methods such as Bonferroni corrections.
Via https://apps.bea.gov/iTable/index_regional.cfm, accessed April 2021.
References
Akgündüz, Y. E. , Bağır, Y. K. , Cılasun, S. M. , & Kırdar, M. G. (2021). Consequences of a massive refugee influx on firm performance and market structure. Technical Report 21/01 .
Arah, O. A. (2008). The role of causal reasoning in understanding Simpson’s paradox, Lord’s paradox, and the suppression effect: Covariate selection in the analysis of observational studies. Emerging Themes in Epidemiology, 5(1), 1–5.
Barros, A. J., & Hirakata, V. N. (2003). Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio. BMC Medical Research Methodology, 3(1), 1–13.
Benson, M., & O’Reilly, K. (2015). From lifestyle migration to lifestyle inmigration: Categories, concepts and ways of thinking. Migration Studies, 4(1), 20–37.
Borjas, G. (1998). Immigration and welfare magnets. Technical report.
Brown, D. (2008). Rural Retirement Migration. Dordrecht: Springer.
Burke, J., & Miller, A. R. (2017). The effects of job relocation on spousal careers: Evidence from military change of station moves. Economic Inquiry, 56(2), 1261–1277.
Card, D. (2001). Estimating the return to schooling: Progress on some persistent econometric problems. Econometrica, 69(5), 1127–1160.
Cebula, R. J. (1974). Interstate migration and the Tiebout hypothesis: An analysis according to race, sex and age. Journal of the American Statistical Association, 69(348), 876–879.
Chau, N. H. (1997). The pattern of migration with variable migration cost. Journal of Regional Science, 37(1), 35–54.
Clark, D. E., & Hunter, W. J. (1992). The impact of economic opportunity, amenities and fiscal factors on age-specific migration rates. Journal of Regional Science, 32(3), 349–365.
Costanza, M. (1995). Matching. Preventive Medicine, 24(5), 425–433.
Dao, M., Furceri, D., & Loungani, P. (2017). Regional labor market adjustment in the united states: Trend and cycle. The Review of Economics and Statistics, 99(2), 243–257.
Deeks, J. (1998). When can odds ratios mislead? Odds ratios should be used only in case-control studies and logistic regression analyses. BMJ, 316(7136), 989–91.
Detang-Dessendre, C., Drapier, C., & Jayet, H. (2004). The impact of migration on wages: Empirical evidence from French youth. Journal of Regional Science, 44(4), 661–691.
DeWaard, J., Johnson, J. E., & Whitaker, S. D. (2018). Internal migration in the United States: A comparative assessment of the utility of the Consumer Credit Panel.
Dixon, D. O., & Simon, R. (1992). Bayesian subset analysis in a colorectal cancer clinical trial. Statistics in Medicine, 11(1), 13–22.
Duncan, D. T., Aldstadt, J., Whalen, J., White, K., Castro, M. C., & Williams, D. R. (2012). Space, race, and poverty: Spatial inequalities in walkable neighborhood amenities? Demographic Research, 26, 409–448.
Dyck, D. V., Cardon, G., Deforche, B., & Bourdeaudhuij, I. D. (2011). Do adults like living in high-walkable neighborhoods? Associations of walkability parameters with neighborhood satisfaction and possible mediators. Health & Place, 17(4), 971–977.
Edwards, C. (2018). Tax reform and interstate migration. Technical Report 84, Cato Institute.
Faggian, A., & McCann, P. (2009). Universities, agglomerations and graduate human capital mobility. Tijdschrift voor Economische en Sociale Geografie, 100(2), 210–223.
Fee, K., Wardrip, K., & Nelson, L. (2019). Opportunity occupations revisited: Exploring employment for sub-baccalaureate workers across metro areas and over time. Philadelphia Federal Reserve: Technical report.
Fournier, G. M., Rasmussen, D. W., & Serow, W. J. (1988). Elderly migration: For sun and money. Population Research and Policy Review, 7(2), 189–199.
Fox, W. F., Herzog, H. W., & Schlottman, A. M. (1989). Metropolitan fiscal structure and migration. Journal of Regional Science, 29(4), 523–536.
Garip, F. (2008). Social capital and migration: How do similar resources lead to divergent outcomes? Demography, 45(3), 591–617.
Greenwood, M. J. (1981). Migration and EconomicGrowth in the United States: National, Regional, and Metropolitan Perspectives. London: Academic Press.
Greenwood, M. J., & Sweetland, D. (1972). The determinants of migration between standard metropolitan statistical areas. Demography, 9(4), 665.
Gurak, D. . T., & Kritz, M. . M. (2000). The interstate migration of US immigrants: Individual and contextual determinants. Social Forces, 78(3), 1017–1039.
Hernán, M. A., Clayton, D., & Keiding, N. (2011). The Simpson’s paradox unraveled. International Journal of Epidemiology, 40(3), 780–785.
Hernández-Murillo, R., Ott, L. S., Owyang, M. T., & Whalen, D. (2011). Patterns of interstate migration in the United States from the Survey of Income and Program Participation. Federal Reserve Bank of St. Louis Review, 93(3), 169–85.
Herzog, H. W., & Schlottmann, A. M. (1986). State and local tax deductibility and metropolitan migration. National Tax Journal, 39(2), 189–200.
Hidalgo, M. D., & López-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel–Haenszel procedures. Educational and Psychological Measurement, 64(6), 903–915.
Hyatt, H., McEntarfer, E., Ueda, K., & Zhang, A. (2018). Interstate migration and employer-to-employer transitions in the United States: New evidence from administrative records data. Demography, 55(6), 2161–2180.
Ihrke, D. K., & Faber, C. S. (2012). Geographical mobility: 2005–2010. Technical report: US Census Bureau.
Jackman, R., & Savouri, S. (1992). Regional migration in Britain: An analysis of gross flows using NHS central register data. The Economic Journal, 102(415), 1433.
Katz, L. F., & Blanchard, O. (1992). Regional evolutions. Technical Report 1.
Kennan, J., & Walker, J. R. (2011). The effect of expected income on individual migration decisions. Econometrica, 79(1), 211–251.
Kleven, H., Landais, C., Muñoz, M., & Stantcheva, S. (2020). Taxation and migration: Evidence and policy implications. Journal of Economic Perspectives, 34(2), 119–142.
Krieg, R. G. (1997). Occupational change, employer change, internal migration, and earnings. Regional Science and Urban Economics, 27(1), 1–15.
Lerman, K. (2017). Computational social scientist beware: Simpson’s paradox in behavioral data. Journal of Computational Social Science, 1(1), 49–58.
Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
McKinnish, T. (2005). Importing the poor. Journal of Human Resources, XL(1), 57–76.
McKinnish, T. (2007). Welfare-induced migration at state borders: New evidence from micro-data. Journal of Public Economics, 91(3–4), 437–450.
McLeman, R. A., & Hunter, L. M. (2010). Migration in the context of vulnerability and adaptation to climate change: Insights from analogues. Wiley Interdisciplinary Reviews: Climate Change, 1(3), 450–461.
Molloy, R., Smith, C. L., & Wozniak, A. (2011). Internal migration in the United States. Journal of Economic Perspectives, 25(3), 173–196.
Nelson, M. A., & Wyzan, M. L. (1989). Public policy, local labor demand, and migration in Sweden, 1979–1984. Journal of Regional Science, 29(2), 247–264.
Nunn, R., Kawano, L., & Klemens, B. (2018). Unemployment insurance and worker mobility. Urban-Brookings Tax Policy Center: Technical report.
O’Reilly, K. (2016). Lifestyle Migration. London: Routledge.
Pack, J. R. (1973). Determinants of migration to central cities. Journal of Regional Science, 13(2), 249–260.
Palloni, A., Massey, D. S., Ceballos, M., Espinosa, K., & Spittel, M. (2001). Social capital and international migration: A test using information on family networks. American Journal of Sociology, 106(5), 1262–1298.
Pearce, N. (2004). Effect measures in prevalence studies. Environmental Health Perspectives, 112(10), 1047–1050.
Preuhs, R. R. (1999). State policy components of interstate migration in the United States. Political Research Quarterly, 52(3), 527–547.
Quinn, M. A., & Rubb, S. (2005). The importance of education-occupation matching in migration decisions. Demography, 42(1), 153–167.
Ranganathan, P., Aggarwal, R., & Pramesh, C. (2015). Common pitfalls in statistical analysis: Odds versus risk. Perspectives in Clinical Research, 6(4), 222.
Rapoport, J. (2018). The faster growth of larger, less crowded locations. Economic Review, 103(4), 5–38 (Fourth Quarter).
Roback, J. (1982). Wages, rents, and the quality of life. Journal of Political Economy, 90(6), 1257–1278.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel–Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17(2), 105–116.
Rose, S., & van der Laan, M. J. (2009). Why match? Investigating matched case-control study designs with causal effect estimation. The International Journal of Biostatistics. https://doi.org/10.2202/1557-4679.1127.
Rothman, K. J. (2012). Modern Epidemiology. Philadelphia: LWW.
Sala, H., & Trivín, P. (2014). Labour market dynamics in Spanish regions: Evaluating asymmetries in troublesome times. SERIEs, 5(2–3), 197–221.
Sander, N., & Bell, M. (2013). Migration and retirement in the life course: An event history approach. Journal of Population Research, 31(1), 1–27.
Schmidt, C. O., & Kohlmann, T. (2008). When to use the odds ratio or the relative risk? International Journal of Public Health, 53(3), 165–167.
Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society: Series B (Methodological), 13(2), 238–241.
Sjaastad, L. A. (1962). The costs and returns of human migration. Journal of Political Economy, 70(5 Part 2), 80–93.
Stark, O., & Bloom, D. E. (1985). The new economics of labor migration. The American Economic Review, 75(2), 173–178.
Tcha, M. (1995). Altruism, household size and migration. Economics Letters, 49(4), 441–445.
Tiebout, C. M. (1956). A pure theory of local expenditures. Journal of Political Economy, 64(5), 416–424.
Tunaru, R. (2001). Models of association versus causal models for contingency tables. Journal of the Royal Statistical Society Series D (The Statistician), 50(3), 257–269.
Vedder, R. (1990). Tiebout, taxes, and economic growth. Cato Journal, 10(1), 91–108.
Wacholder, S. (1986). Binomial regression in GLIM: Estimating risk ratios and risk differences. American Journal of Epidemiology, 123(1), 174–184.
Walker, K. E. (2017). The shifting destinations of metropolitan migrants in the US, 2005–2011. Growth and Change, 48(4), 532–551.
Yarnold, P. R. (1996). Characterizing and circumventing Simpson’s paradox for ordered bivariate data. Educational and Psychological Measurement, 56(3), 430–442.
Young, C., Varner, C., Lurie, I. Z., & Prisinzano, R. (2016). Millionaire migration and taxation of the elite. American Sociological Review, 81(3), 421–446.
Yule, G. U. (1903). Notes on the theory of association of attributes in statistics. Biometrika, 2(2), 121–134.
Funding
This article was written during course of business by a US Treasury employee, as part of a project to improve tax modeling via improvements in demographic modeling.
Author information
Authors and Affiliations
Contributions
Sole author. Much of the data preparation work was done before and independently of this study, as acknowledged in Sect. 3.
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is a component of a larger study on the population characteristics underlying models of the inputs to tax revenue calculations, and how they evolve over time. Thanks to David Bridgeland, Randy Capps, Adam Cole, Aaron Schumacher, Bethany DeSalvo, Robin Fisher, Chung Kim, Gray Kimbrough, Elizabeth Landau, Ithai Lurie, Nick Turner, Elizabeth Maggie Penn, Joshua Tauberer, and the compilers of the data bank, Raj Chetty, John Friedman, Emmanuel Saez, Danny Yagan, and their counterparts at IRS.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Klemens, B. An analysis of US domestic migration via subset-stable measures of administrative data. J Comput Soc Sc 5, 351–382 (2022). https://doi.org/10.1007/s42001-021-00124-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42001-021-00124-w
Keywords
- Migration
- Administrative records
- Demographic analysis
- Relative risk
- Risk ratios
- Returns to education
- Retirement