View Single Post
Old 05-23-2021, 04:05 PM   #1
cbbl
All Star Starter
 
cbbl's Avatar
 
Join Date: Apr 2003
Location: Massachusetts
Posts: 1,179
CBBL's 1890 Start Modifications

See this thread for an introduction to the various mods I'll be posting in this and other subforums.

First, names.

As my league starts in 1890 and aims to be era-appropriate, I needed to update both the first_names_english.txt and names_english.txt files for the era.

This involved pulling data from namecensus.com and it's former URL names.mongabay.com under which some data files still reside.

First -- first names

Namecensus.com has historical names databases going back to 1880. The male first names databases contain anywhere from 1800 to over 8000 names. The database does not distinguish the ethnicity of first names (it does for surnames), so some challenges in using the data do exist, particularly as you get deeper into the 20th century.

Normally, when I use modified name files, I target name sets 20 years prior to the draft year, as that's a good representation of what those being drafted would be called. For my 1890 start date, I can't accomplish this as the oldest data is from 1880, so I'll have to compromise a bit. The positive side is that I won't have to reimport the first names database for 20 years -- when my universe gets closer to the year 1910.

Another thing to note -- because the data comes primarily from U.S. Census data, there are errors in it -- for example, females that were recorded as males (Betty, Lucy, Margaret, Sue, etc.). So, some editing is necessary and I will constantly need to tinker with the file to remove these obvious errors. Same goes for obviously non-caucasian names. Lastly, there are numerous abbreviations and spelling errors that may need adjustment (e.g. Chas for Charles). I usually discover these once imported into OOTP and then adjust them in the original file for future universes.

To establish a reasonable frequency that OOTP can handle, I reset the frequency of names such that the most popular name has a frequency of 50000. I don't know about OOTP22, but I've found with earlier versions if the frequency was too large, OOTP wouldn't load the database correctly. I may experiment with larger #s to see.

SO....

Attached are the first name files for the decades 1880 through 1920. The files only replace ethnicity id 0. Many female names and misspellings have been removed, but I haven't touched non-European names (Hispanic, Asian, etc.). Some female names and other errors may still remain.

I'll post another message about the last names modifications after this one.

Also -- I am happy to share the Excel files that I use to create and work with these .txt files. Let me know and I can share them with you.
Attached Files
File Type: txt first_names_english_1880.txt (302.4 KB, 391 views)
File Type: txt first_names_english_1890.txt (305.0 KB, 750 views)
File Type: txt first_names_english_1900.txt (310.1 KB, 390 views)
File Type: txt first_names_english_1910.txt (363.0 KB, 379 views)
File Type: txt first_names_english_1920.txt (376.3 KB, 2746 views)
cbbl is offline   Reply With Quote