I have been playing around with Chartjs
for a few projects and thought using real data would help me learn better. I stumbled across Social Security Administration's baby names data
, which has information dating from 1880 through 2018. Obviously SSA has not been around that long, so their metrics come from people who have applied for a Social Security Card and recording their date of birth, first name, and gender. If you want more information about the data you can visit this page
but supposedly this data includes every single person who has ever had a Social Security Number in the United States. This is all publicly accessible information. No actual social security numbers are provided. Lets dive in!
This first chart is a cumulative count of every person who was born each year broken down by male, female, and total. Pro-tip: You can click on an item in the table's legend to enable/disable viewing it, such as hiding the Total info to see more details.
I find it interesting there used to be more females born than males, according to this data, up through The Great Depression
of the 1930s. Then in to World War 2
they were roughly even, then as soon as WWII is over and the survivors make it home we see from 1946 onward more males than females.
Starting in 1911 and through World War I
(1914-1918) there is a population boom. Which is interesting it occurs leading up to the war and through it, while WWII saw a spike just before it, leveled off, then saw a boom starting in 1946 with all those appropriately named baby boomers
There appears to be no impact to population growth during the Korean War
, but there was significant drop during the American involvement with the Vietnam War
Could this be participation bias
or some other bias since we are looking at data from SSA, which you were not required to get a card unless you were working until 1986
But it does take two to tango so the observations do make sense. Hey this isn't a scientific study.
Next we take a look at the number of unique names per year, again broken down by gender, and a bonus of names also used for both men and women. While the first chart is every single person, this second chart counts each individual name only once, even if there are 50,000 people named James, it is only counted once. Apparently Americans love to have more unique names for females than males. It is also interesting that our creativeness for number of unique names per year has been steadily rising since the end of WWII through 2008.
Chart #3 gives us more details about unique names. I ran a script through the data to count the first time each name was used. This chart shows us the count of new, never before used names, per year and by gender. I did ignore all names used in 1880 since that was the start of the data so they would all be new. This was fun to look at because it seemed our creativity spiked in 1915 with names like Veston, Rayman, Bethany and Debora and that level of new names was not reached again until 1970 for females (Katrina, Lashannon, Natalya, Torii) and 1992 for males (Patryck, Daryon, Konor, Tiberius). Apparently the Great Recession
damaged our creativity as the rate of new names created each year has been in sharp decline since then.
Of course, everyone probably only cares about the top names. For the time period 1880 through 2018 the most used male name is James and female name is Mary. The top 5 names from most used to least are James, John, Robert, Michael, and Mary. This last chart follows the usage of the top 15 names of all time with their usage per year between the analyzed period. Interesting how most names spike up in popularity then kind of normalize then fall out of favor.
This type of post is not like my norm but was a way for me to play around with Chartjs. Thanks to those folks for writing and providing a great toolset. I might do some more analysis on the dataset and post if there is interest.
Written by Eric Wamsley
Posted: May 8th, 2020 2:02pm