Yale and Columbia economists are building a massive dataset to better understand the role immigrants played in transforming the United States from its rural origins into a global economic power.
The researchers will merge individual level data from the historical U.S. Census with the records of 11 million immigrants who arrived at the port of New York between 1820 and 1892; the passenger lists of 5.5 million immigrants who departed for the United States from the port of Hamburg, Germany, between 1850 and 1934; data from the Historical Census of Manufacturers on manufacturing employment and productivity at the county level from 1860 to 1929; and records from the U.S. Patent and Trademark Office covering a similar timespan.
The resulting dataset will provide the researchers with an unprecedented trove of evidence of immigration’s impact on American prosperity, said Costas Arkolakis, professor of economics and one of the project’s principal investigators.
“This is the first project to use advances in quantitative spatial theory along with advanced big-data techniques to understand the contribution of immigrants to the process of U.S. economic growth,” he said. “The key benefit of our approach is to link modern theory with massive amounts of microeconomic data about individual immigrants, their locations and their occupations, to address questions that are extremely difficult to assess otherwise.”
Specifically, the dataset will help the researchers understand the extent to which the novel ideas and expertise immigrants brought to U.S. shores drove the nation’s emergence as an industrial and technological powerhouse, explained Michael Peters, assistant professor of economics at Yale, who, together with Sun Kyoung Lee, Ph.D. candidate in economics at Columbia University, are the other two principal investigators of the project.
“We are particularly interested in one potential mechanism by which immigrants might have spurred economic prosperity: the transfer of new knowledge,” said Peters, who has worked on the impact of immigrants on German economic growth after World War II. “The historical immigration records and passenger lists that we have access to have one crucial feature: They include information about people’s pre-immigration occupations. This will help us chart the flows of knowledge triggered by immigration.”
The researchers will also use the dataset to better understand 19th-century immigrants’ spatial mobility — how they spread out and where they located after arriving in the United States.
“We can trace individual immigrants for multiple decades from the day they disembarked in New York,” said Lee. “We can see the extent to which pre-migration skills influenced where immigrants moved and compare their location choices to those of the native population. Our dataset will allow us to calculate wages down to the county level, which is important in understanding people’s decisions to move to one place or another.”
The researchers recently received a $1 million grant from the National Science Foundation to build the dataset.
Neither the Castle Garden immigration database nor the Hamburg Passenger Lists have ever been used previously for empirical research. Economist Ira Glazier built the Castle Garden database, but he died before he could conduct scholarly research using it.
“Our research will hopefully give this data a new life after it was almost entirely forgotten,” Peters said.
The story of Heinrich Engelhard Steinweg, later Henry Engelhard Steinway — founder of Steinway & Sons piano company — illustrates the project’s potential to match records of individual immigrants among the different databases and track their lives over decades, the researchers note.
Steinweg departed Hamburg for New York City on May 28, 1850 with four family members, according to the Hamburg passenger lists. The records describe his occupation in Germany as “Instrumentenmacher,” or instrument maker.
The 1860 and 1870 U.S. population censuses identify Steinweg, now Steinway, as a piano manufacturer residing in New York City. U.S. Patent and Trademark Office records include a number of patents granted to Steinway and his sons for their piano designs. Data from the 1880 U.S. Census of Manufacturers illuminates the scale of Steinway’s economic success, demonstrating that his piano manufacturing plant in Queens was one of the most capital-intensive operations in New York City, representing more than $1.5 million in capital and about $500,000 in sales.
The researchers expect that their efforts to match individual records will enable them to track millions of individuals in a similar manner.
“Was Steinway just one extraordinary immigrant or were more systematic forces at play?” Peters said. “The dataset will provide information required to help us answer this question. In particular, the scale of the data will allow us to move beyond the selected prism of biographical historical accounts, which naturally tend to be biased towards success stories like the case of Mr. Steinway.”
Merging several massive databases and matching the records within them presents challenges — some straightforward and others more complex, note the researchers. For instance, the passenger lists were recorded in German and must be translated. Most datasets that the researchers deal with are in paper format, which involves image-to-text conversion. Even if some datasets are in machine-readable format, the data is not harmonized. For example, in Steinway’s case, some sources list his occupation as “instrument maker” whereas others record it as “laborer.” Moreover, the absence of a time-invariant individual-specific identifier makes tracking the same individual challenging.
Many of these techniques to connect these large datasets rely on modern machine learning tools and have been developed as a part of Lee’s doctoral thesis.
“We just have to keep finding creative ways to deal with all kinds of oddities in the data,” Lee said.
The dataset eventually could be expanded to include modern datasets containing post-1940 information as well as data on economic activity in sectors other than manufacturing and those works are in motion, according to the researchers.
“Aside from increasing our understanding of how immigration affected America’s economic growth in the late 19th century, the dataset will enhance the study of an entire range of aspects of the lifecycle of individuals in the United States, including their jobs, family characteristics, and all kinds of comparisons with people born in the United States,” Arkolakis said.
Arkolakis and Lee are engaged in a different project with economist Rodrigo Adao of the University of Chicago Booth School of Business on investigating historical intergenerational mobility in the United States and its main determinants.
The matched census data could provide insights into other questions concerning people’s spatial mobility and urbanization, both during the 19th-century and today, Arkolakis said.
“Our 19th-century data could provide important lessons and open all sorts of research possibilities across a number of fields,” he said.