Thursday, December 23, 2010

Making Sense of the Soundex

Soundex codes are four-character representations based on the way names sound rather than the way they are spelled. They were used extensively by the U.S. Work Projects Administration (WPA) crews working in the 1930s to organize Federal Census data from 1880 to 1920. (Pictured: WPA Census Project Historical Records Survey workers wearing masks while inventorying and surveying records in sub-cellar below river level, NY, NY.)

Soundex is also very popular in genealogy software and databases. If you search many genealogical records, sooner or later you will need to use it because you can often find a person by his or her code, even when the name has been misspelled.

It isn't difficult to learn. Every code consists of a letter and three numbers. The letter is always the first letter of the surname and the numbers are assigned to the remaining letters of the surname according to the following guide. When necessary, zeros are added at the end to produce a four-character code.

EACH NUMBER REPRESENTS LETTERS

1 = B, F, P & V
2 = C, G, J, K, Q, S, X & Z
3 = D & T
4 = L
5 = M & N
6 = R

NO CODE

A, E. I, O, U, H, W & Y

MORE COMPLEX RULES
  • If the surname has any double letters, they should be treated as one letter. (For example: Gutierrez is coded G-362 (G, 3 for the T, 6 for the first R, second R ingored, 2 for the Z.)
  • If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter. (Examples: Pfister is coded as P-236 - P, F is ignored, 2 for the S, 3 for the T, 6 for the R; Jackson is coded as J-250 - J, 2 for the C, K is ignored, S is ignored, 5 for the N, O is added; Tymczak is coded T-522 - T, 5 for the M, 2 for the C, Z is ignored, 2 for the K. Since the vowel "A" separates the Z & K, the K is coded.)
  • If a surname as a prefix  such as Van, Con, De, Di, La or Le, code both with and without the prefix because the surname might be listed under either code. Note that Mc and Mac are NOT considered prefixes. (For example, VanDeusen might be coded two ways: V-532 - V, 5 for the N, 3 for the D, 2 for the S OR D-250 - D, 2 for the S, 5 for the N, O is added.)
  • If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. (Example: Tymczak is coded as T-522 - T, 5 for the M, 2 for the C, Z is ignored & 2 for the K.) Since the vowel "A" separates the Z and K, the K is coded.
  • If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. (Example: Ashcraft is coded A-261 - A, 2 for the S, C is ignored, 6 for the R, 1 for the F). It is not coded A-226.
AMERICAN INDIAN & ASIAN NAMES

A phonetically-spelled American Indian or Asian name was sometimes coded as if it were one continuous name. If a distinguishable surmame was given, the name may have been coded in the normal manner.

MAKING SENSE OF IT ALL

To figure out a surname's code, follow this simple rule:
  • Eliminate A, E, I, O, U, W, Y & H
  • Write the first letter as is followed by the codes found in the table above
No matter how long or short the surname is, the soundex code is always the first letter of the name followed by three numbers. If you have coded the first letter and three numbers, but still have ore letters in the name, ignore them! If you have run out of letters in the name before you have three numbers, then add zeroes. Here are more examples:

JOHNSON = JNSN = J525 (ignored the vowels)
WASHINGTON = WSNGTON = W252 (ignored the ending TN)
KUHN = KN = K500 (add zeroes to the end)
LEE = L-000 (zeroes added)

Think through the possible variant spellings, misspellings and misreadings of the surname you are researching before concluding that it can't be found in the soundex listing. If all else fails, you'll find a Soundex Code Generator at http://www.progenealogists.com/soundex.htm.