Migration of European elites across the centuries
Author: Jasper Tjaden (University of Potsdam; Germany)
Ever wondered where Einstein was born and died? A few months ago, I was on a train reading a magazine. On the last few pages, they featured an obituary. As a migration scholar, I thought that obituaries could be an interesting source of data because they contain info on notable people, their place of birth and place of death. If people die in a country other than the one they were born in, this could be a decent approximation of migration (unless they died during a holiday).
Brilliant. So what if I had access to a digital library of all obituaries from newspapers and magazines that have been around for a long time to create a dataset of all notable people worldwide!?
Long story short…It didn’t really work. Maybe there is one but after some googling around, I thought it seemed too complicated to get access.
Instead, I discovered that Wikipedia has lists of people from different countries (i.e. “people from Portugal”). These lists compile famous people with a wikipedia entry (which most famous people, unlike me, do have).
I web-scraped these lists, then scraped their year and place of birth and death and the type of famous person they are (art, sports, science, business, state/church). I started with Germany. Then went down a rabbit hole, spent an unjustifiably large amount of time on compiling, in total, 25 European countries (EU + Norway, Uk, Switzerland).
So, this is how I got here. I now have information on the migration status of 10548 dead famous people. Academically speaking, I am not sure whether this is relevant at all or new. I simply found it interesting and it turned into a little fun side project that allowed me to toy around with web scraping. With more time/ resources, many missing countries and people could be added (please help!) to the database. We could turn this into a global database, heck, we could add data on economic development, conflict, culture, etc. and look at the effects on migration of elites. We could do….
Well, for now, here are some key results.
Results: Migration among famous (European) people
Across all countries in the data (EU + Norway, Switzerland, England (not UK)) and all available centuries, the migration rate among famous people was 27.9 %. This is pretty high given that currently the percentage of migrants among the world’s population is around 3%. So, the migration rate among famous people in these data is 10 times higher. This is maybe not surprising given that famous people have resources, and people with resources are more mobile. Maybe famous people are famous because more people know them partially because they were more mobile.
“Hold on!”, you may say, “What about countries which used to be one country, territory or empire?”. Good point. If I exclude all modern countries which used to be one at some point, the overall migration rate is 26.8, so not a huge difference. Also, if you take out countries which speak the same language, the rate is still 24.6.
We see a slight, linear-looking increase in the migration from 20% to 30% between 1400 to 1950 (see blue line and y-axis on the left). Probably the cumulative effects of globalization (transport, communication, trade etc.) and conflict (wars). However, I was kind of stunned by this. I mean, over 6 centuries, the migration rate remained fairly stable even though it got a lot easier to migrate over time.
We also see that famous people traveled longer distances (to their death) over time (see red line, y-axis on the right). The average distance between birth and death increased from approx. 500km to 700km between 1400 and 1500. Why the 15th century? I am no historian. Renaissance maybe? More economic relations across European countries?
The real jump in distance between the country of birth and death happens after 1750. Again, I am no historian, but this is very likely due to the fact that more and more Europeans emigrated to the US and Latin America, first, in search of opportunities and later, as a result from fleeing from conflicts.
Migration among famous Europeans across the centuries
This interpretation is consistent with the top 3 destinations across the centuries. The table below shows that 3 countries where most Europeans in the sample migrated to (i.e. died in). France is in the top 3 across all centuries (congratulations). We see that Italy is still popular in the 1400 and 1500s but less so since (sorry). Starting in the 1900s, the USA enters the scene. This probably drives the jump in distances (see above).
Top 3 destination countries across the centuries
I expected major differences between categories such as artists, athletes, business (wo)men, as well as states(wo)men. I thought, for example, politicians, royalty, soldiers, bishops etc. (who I subsumed under “state/church”) would be far less likely to die abroad because they held a national office. Then again, European royalty was often born in one country, then married off to another royal family in another country. I somehow thought that scientists and artists would be the most mobile. All the painters hanging out in Florence or Paris.
Well, the data shows that the average migration rate appears fairly similar across categories (Note that I don’t have the category information for roughly 30% of people. There could be some systematic missingness here that I am not aware of (yet).)
Migration by category of fame
Now, let’s look at the top countries of emigration, so the countries where - in relative terms - more famous people left. The migration rate varies pretty dramatically across countries. We see 10-15% in several Scandinavian countries, also France and Italy. Looks like famous people have always been pretty happy there. Among the top “leavers” are many Eastern European countries as well as Austria and Ireland. Spain and Germany are at around the average of 30%.
Migration among famous Europeans by country of birth
If this has generated some appetite for the data, you now have to opportunity to browse through it yourself. This is somewhat self-serving because I would like you to spot mistakes and missing information. I did my fair bit of cleaning, but lot’s more could be done. So, please, look up your favorite famous person and check whether the info below is correct. If not, please enter correct info here.
Find your favorite famous person: The full searcheable database
The data is based on 10548 persons overall, on average, 421.92 per country, minimum 131(Latvia), maximum 1236 (Italy).
The list only includes famous people who have already died and are famous enough that google search knows where they died. The dataset is far from complete. There are approx. 4600 people without any info on birth and death year, so I cannot know whether they are dead.
But who are these people anyway? What constitutes “fame”? Well, I don’t know. People in this data are famous enough that random internet users created a wikipedia entry for them, and famous enough that wikipedia created a list of them by country of birth. And famous enough that you immediately find where they died via google search. I checked about 50 entries that came to mind manually and they were all there.
I suppose we could call them “elites”? Elites sound more “academic” but at the same time, I am not sure all the famous people in my data are part of the “elite”. Although many are. Most have money or power or both. But the common denominator is that many people know them and somehow acknowledge that they are important. Although, Finland had many ice hockey players in the list, their impact on global affairs seems limited.
For 7647 people (approx. 70% of entried), I managed to get info on the category of famous person. I distinguish the following categories:
- Art (Da Vinci, van Gogh, Picasso etc.)
- Science (Einstein, Newton etc.)
- Business (Gucci, Steinway etc.)
- Sports (Max Schmeling, the famous German boxer)
- State/church (prime ministers, bishops, Popes, kings, soldiers etc.)
So how did I collect the data?
This whole project was implemented using
R and R studio. The R community is
amazing. I ran into many issues during the process, but the answer was only
a few clicks away.
rvest() by Hadley Wickham & RStudio
Rselenium() by John Harrison & Ju Yeong Kim
I used geocode function from the
ggmap() package by David Kahle, Hadley Wickham, Scott Jackson, and Mikko Korpela and the
countrycode package by Vincent Arel-Bundock. To calculate distances between countries, I used
dist_cepii by Mayer, T. & Zignago (2011).
ggplot() by Hadley Wickham, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington.
I will make the code and data available. First, I want to get some collective action going by improving and troubleshooting this data. Whoever is helping me to expand and improve these data, will get preferential access. Let’s add more countries, let’s add famous people, let’s correct potential mistakes in the data. When the data is in good-enough shape, I will make it publicly available.
Lots! Where to start…?
There might be disagreement on who is famous or elite. There can be famous people who are not listed on wikipedia. There can be people on the list that you have never heard of. However, let’s trust the swarm intelligence of wikipedia users to get most famous people right. If you can’t find an obvious one, please add it here.
Borders of countries change over time. I have info on modern-day countries who used to be one, but I am sure it is not perfect and borders often remain contested.
All wikipedia lists which I used were in English. It is possible that there are more famous people in the respective language of each country. However, you could argue that English is a good threshold. If somebody does not appear on the English list, maybe they weren’t that famous after all.
There might be death locations of people that google (or anyone, in fact) does not know. Think of some strange duke from the middle ages.
Just because somebody died in some country, does not mean they actually “migrated” there. They could die on their vacation or a business trip, for example.
When geo-coding, there can be issues that some countries have cities with the same name. Think “Amsterdam” in the Netherlands or US. I always use the bigger place.
There may be some poeple that are somewhat famous who have the same name. Google then does not know who you mean.
There is obvious selection in the data. People who are considered famous in one country may be less likely to be migrants. They are probably famous in that country because they were there for a long time and did not migrate somewhere else. Whoever stays has higher chances of becoming famous in the country where they stayed - is what I am saying. Then there is some obvious categories of people who are naturally less likely not to die in their “home” country, like royals, politicians etc. Staying in your country kind of comes with a job as a queen or duchess or prime minister. However, we might expect artists and athletes to be bit more mobile. As we have seen above, srtangely, that does not seem to be the case.
Let’s clean this more, add people, add more info on year and place of birth and death. If you notice errors, please mark them here.
Let’s add countries, heck, let’s make this global. If you are willing to help out and have a couple days worth of time on your hands, I will share the code with you and it should be easy to scale this up.
I hope you enjoyed it. I certainly had fun with it. Let’s see where it goes. Feel free to reach out and/or share this on Twitter.