Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.
SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. The first character of the code is the first character of character_expression and the second through fourth characters of the code are numbers that represent the letters in the expression. Vowels in character_expression are ignored unless they are the first letter of the string. Zeroes are added at the end if necessary to produce a four-character code.
The following tables defines the numbers that represent the various letters.
For example, the SOUNDEX
code for the expression 'Washington' is W252. W, 2 for the S, 5 for the
N, 2 for the G. The remaining letters are disregarded. For more
information about the SOUNDEX code,
WHERE SOUNDEX(NM_NAME) = SOUNDEX('POLLY')
SOUNDEX converts an alphanumeric string to a four-character code to find similar-sounding words or names. The first character of the code is the first character of character_expression and the second through fourth characters of the code are numbers that represent the letters in the expression. Vowels in character_expression are ignored unless they are the first letter of the string. Zeroes are added at the end if necessary to produce a four-character code.
The following tables defines the numbers that represent the various letters.
Number |
Represents the Letters |
---|---|
1 |
B, F, P, V |
2 |
C, G, J, K, Q, S, X, Z |
3 |
D, T |
4 |
L |
5 |
M, N |
6 |
R |
Ignored |
A, E, I, O, U, H, W, and Y. |
- Names With Double Letters
If the surname has any double letters, they should be treated as one letter. For example:
- Gutierrez is coded G-362 (G, 3 for the T, 6 for the first R, second R ignored, 2 for the Z).
- Names with Letters Side-by-Side that have the Same Soundex Code Number
If the surname has different letters side-by-side that have the same
number in the soundex coding guide, they should be treated as one
letter. Examples:
- Pfister is coded as P-236 (P, F ignored, 2 for the S, 3 for the T, 6 for the R).
- Jackson is coded as J-250 (J, 2 for the C, K ignored, S ignored, 5 for the N, 0 added).
- Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z
ignored, 2 for the K). Since the vowel "A" separates the Z and K, the K
is coded.
- Pfister is coded as P-236 (P, F ignored, 2 for the S, 3 for the T, 6 for the R).
- Names with Prefixes
If a surname has a prefix, such as Van, Con, De, Di, La, or Le, code
both with and without the prefix because the surname might be listed
under either code. Note, however, that Mc and Mac are not considered prefixes.
For example, VanDeusen might be coded two ways: V-532 (V, 5 for N, 3 for D, 2 for S)
or
- Consonant Separators
If a vowel (A, E, I, O, U) separates two consonants that have the
same soundex code, the consonant to the right of the vowel is coded.
Example:
Tymczak is coded as T-522 (T, 5 for the M, 2 for the C, Z ignored (see "Side-by-Side" rule above), 2 for the K). Since the vowel "A" separates the Z and K, the K is coded.
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.
WHERE SOUNDEX(NM_NAME) = SOUNDEX('POLLY')
No comments:
Post a Comment