TITLE: Assignment of Unique Identifiers to Water Bodies in the High Sierras of
California
 
AUTHOR: Kristina White
 
Geography 26, GIS Data Acquisition, Spring 2001
 
American River Community College
 
Sacramento, California
 
ABSTRACT: The importance of assigning unique identifiers (IDs) to spatial features is often overlooked when creating new data layers in a GIS. The following is an examination of three solutions to the problems associated with assignment of unique identifiers. This project will focus on creating unique IDs for water bodies in the High Sierras of California.
 
INTRODUCTION: The issue of unique IDs for identifying spatial features has not been fully addressed in the GIS community. There is little to no information on the web, through ESRI, or in textbooks addressing the tremendous importance of features containing unique IDs. This lack of information not only leaves GIS users ignorant of the importance of meaningful unique IDs, but contributes to the lack of procedural ways to assign IDs.
 
The tendency to use ARC/INFO assigned unique IDs has taken precedence over coming up with more meaningful and useful IDs. While there is nothing wrong with randomly assigning IDs, the lack of meaning has resulted in their insignificance and future demise. When an ID is spatially connected to a feature and contains information on its location, then the ID has a better chance of being used indefinitely. By allocating a specific and unchangeable ID to a spatial feature, similar to the name of a lake, that feature has the capacity to be researched to its fullest extent. Without a stable ID, there is no way to query out information on a given spatial feature over a range of different resources. For example, the Forest Service may have information on the same lake as The Department of Fish and Game; but without that link in place, information stays in its separate locations with neither department knowing the other has the information. With unchanging IDs in place, agencies would be able to maximize their work by not having to duplicate research that already exist.
 
BACKGROUND: As of March 2001, California’s resource agencies lacked a complete coverage of High Sierra water bodies. Several incomplete versions, at varying scales, were in existence; the criteria for which water bodies were digitized was not apparent. Using ArcView, a newly digitized water body layer Lakes_24k.shp, was created. Lakes_24k was later merged with two preexisting layers to produce a complete layer of all water bodies in the High Sierras. This new layer, fpb_lakes, contains unique identifiers for all water bodies in the High Sierras. Lakes was digitized from a 1:100,000 United States Geographic Survey (USGS) 7.5’ DRG topographic quads. Sn_lakes and Lakes_24k were digitized from 1:24,000 USGS 7.5’ DRG topographic quads. With a finalized and complete layer containing ALL water bodies in the High Sierras, the problem of assigning unique identifiers arose. The two preexisting layers had their own set of identifiers and the newly digitized layer had none. Table 1 shows the fields present in Lakes.dbf.
 
 
| AREA | PERIMETER | LAKES_ | LAKES_ID | WATER | NAME | WRCBLAKES | GNIS_ID | 
| 51181.734 | 1346.143 | 2 | 1 | 1 |   | 0 |   | 
| 3138397.000 | 16535.139 | 3 | 2 | 1 | Copco Lake | 3582 | 06070991 | 
| 1439049.875 | 6781.860 | 4 | 372 | 1 | Miller Lake | 0 | 06073458 | 
| 18621.189 | 538.039 | 5 | 3 | 1 | Azalea Lake | 3532 | 06001081 | 
| 4540013.000 | 53515.043 | 6 | 4 | 1 | Sheepy Lake | 3528 | 06029224 | 
| 1801812.500 | 6563.040 | 7 | 5 | 1 | Indian Tom Lake | 3639 | 06015924 | 
| 4420350.000 | 24950.043 | 8 | 6 | 1 | Lower Klamath Lake | 3650 | 06019579 | 
| 2773841.000 | 14671.305 | 9 | 7 | 1 | White Lake | 0 | 06038265 | 
| 3698.470 | 242.426 | 10 | 8 | 1 |   | 0 |   | 
| 15823.263 | 482.540 | 11 | 9 | 1 |   | 0 |   | 
| 4806.873 | 294.153 | 12 | 10 | 1 |   | 0 |   | 
 
Lakes.dbf has four water body identifiers; Lakes_, Lakes_id, Wrcblakes, and Gnis_id. Lakes_ and Lakes_id were ARC/INFO assigned identifiers. When polygons are digitized in ARC/INFO, they are automatically assigned unique IDs. Wrcblakes is an identifier, “containing information on water quality and fisheries management” (lakes.txt). Gnis_id, “USGS Geographic Names Information System (GNIS) code uniquely identifying the instance of the given lake name” (lakes.txt).
 
Table 2 shows the fields present in Sn_lakes.dbf.
 
 
| AREA | PERIMETER | LAKES_ | LAKES_ID | LCODE | CFF_CODE | WATER | NAME | 
| 689.55021 | 126.72091 | 12795 | 25 | 1 | 412 | 1 |   | 
| 5221.96251 | 398.35090 | 12796 | 27 | 1 | 410 | 1 |   | 
| 119940.49676 | 1897.45907 | 12797 | 28 | 1 | 410 | 1 |   | 
| 269.04632 | 62.71005 | 12798 | 29 | 1 | 410 | 1 |   | 
| 33144.42936 | 1239.66887 | 12799 | 30 | 1 | 412 | 1 |   | 
| 349.54098 | 68.17613 | 12800 | 31 | 1 | 410 | 1 |   | 
| 463.58258 | 82.40878 | 12801 | 32 | 1 | 410 | 1 |   | 
| 404.04723 | 77.08140 | 12802 | 33 | 1 | 410 | 1 |   | 
| 1513.65958 | 312.51002 | 12803 | 34 | 1 | 410 | 1 |   | 
| 890.06391 | 127.22250 | 12804 | 38 | 1 | 412 | 1 |   | 
| 522.97880 | 99.04505 | 12805 | 39 | 1 | 412 | 1 |   | 
 
Sn_lakes.dbf has three water body identifiers; Lakes_, Lakes_id, and Cff_code. Lakes_ and Lakes_id were ARC/INFO assigned identifiers. Cff_code was, “ populated by creating a relate between poly# and lpoly#” (sn_lakes.txt).
 
USGS Geographic Names Information System (GNIS) in essence should have provided unique IDs for all water bodies in California. GNIS states, “The Federally recognized name of each feature described in the data base is identified, and references are made to a feature's location by State, county, and geographic coordinates (http://mapping.usgs.gov/www/gnis/)”. If all water body features would have been given a GNIS ID then the current problem of trying to find ways to uniquely identify the water body features would not have arose. GNIS IDs in essence are not adequate IDs due to the fact that they allowed names to be used multiple times by differing water bodies (Table 3).
 
 
| AREA | PERIMETER | LAKES_ | LAKES_ID | WATER | NAME | WRCBLAKES | GNIS_ID | 
| 2004292.625 | 13338.780 | 1477 | 913 | 1 | Almanor, Lake | 0 |   | 
|   | 80098.250 | 1488 | 921 | 1 | Almanor, Lake | 2965 | 06000442 | 
| 243629.422 | 8489.172 | 1495 | 928 | 1 | Almanor, Lake | 0 |   | 
| 349050.906 | 2804.329 | 355 | 160 | 1 | Bass Lake | 0 | 06073793 | 
| 268497.969 | 2819.887 | 2925 | 2758 | 1 | Bass Lake | 268 | 06001575 | 
| 45216.406 | 929.125 | 4248 | 4352 | 1 | Bass Lake | 2157 | 06030274 | 
| 4169629.750 | 21768.623 | 4970 | 5424 | 1 | Bass Lake | 1949 | 06030275 | 
| 9307.176 | 486.897 | 5296 | 5915 | 1 | Bass Lake | 2262 | 06001574 | 
| 810.033 | 102.889 | 1264 | 764 | 1 | Hidden Lake | 0 | 06014641 | 
| 10666.720 | 410.675 | 1531 | 963 | 1 | Hidden Lake | 0 | 06014646 | 
| 28102.094 | 831.617 | 2504 | 2129 | 1 | Hidden Lake | 0 | 06014644 | 
| 22013.289 | 626.512 | 4499 | 4648 | 1 | Hidden Lake | 2195 | 06014639 | 
| 40862.984 | 1179.892 | 5729 | 6415 | 1 | Hidden Lake | 0 | 06014648 | 
 
OBJECTIVE: The significance of unique identifiers is often overlooked by the layperson, but to scientist working in the field, they are of the utmost importance. Imagine that you are a scientist gathering information on a particular water body in the High Sierras. The lake has no name on a topographic map and you are unsure of how to input your information into your GPS, so the information you have gathered is indeed associated with the correct water body. With each water body containing its own unique ID, the problem is solved.
The unique IDs will be displayed on topographic maps, which will be used by biologist out in the field (See Illustration 1).
 
 

 
My objective is to create a unique IDs for all of the water bodies that are not too lengthy. If the unique ID is too long, it becomes a hindrance out in the field versus a positive way to identify water body features. A biologist wants to be able to quickly input the ID into their GPS versus a lengthy cumbersome number. So it is my objective to create an ID that is no longer than 10 digits in length. The unique IDs will be numeric in character rather than a string. This is to avoid duplications and to allow for easier post-processing. A computer is much more adept at dealing with numbers versus strings.
 
METHODOLOGY: I have approached the problem of assigning unique identifiers in several different ways. Three solutions to creating unique IDs were contemplated. The first was a random assignment, the second was based on the latitude and longitude of the water body, and the third solution was based on the United States Social Security Number.
 
Initially the water bodies were given a chronologically random identifier with no duplicates. An example of this can be seen in Illustration 1. These random identifiers are currently being used, although a change to a more efficient identifying system is currently being contemplated. The main problem with using this system is there is no locational information, of any sort tied to the IDs. So if a biologist wanted to find a water body occurring in a specific water shed they would have to search through roughly 24,000 entries. The benefit of using this system is that the number is no longer than five digits, allowing for easy use in the field. The overall reason for switching to a more proficient system, is that the IDs mean absolutely nothing. Allowing information to be tied to the ID would make its overall use more effective.
 
Assigning water bodies by their latitude and longitude was briefly considered. This would allow the IDs to not only be a way to identify the lake but would also give it meaning. The main problem with this system of identification is the length of the ID. For example, a given water body has the latitude 38 degrees 42’27”, with a longitude of 120 degrees 15’53”. Translated, this would provide a thirteen-digit ID (3842271201553). This digit would prohibit the ease of its use out in the field.
 
The unique IDs that I found most effective are based on the concept of the United States Social Security Number. “A Social Security Number (SSN) consists of nine digits, commonly written as three fields separated by hyphens: AAA-GG-SSSS. The first three-digit field is called the ‘area number’. The central, two-digit field is called the ‘group number’. The final, four-digit field is called the ‘serial number’ (http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html).” The first three digits of an individuals Social Security Number reports the state the person was born in. For example, a Social Security Number that starts with “565”, means that the person was born in California (Table 4).
 
 
  001-003 NH    400-407 KY    530     NV 
  004-007 ME    408-415 TN    531-539 WA
  008-009 VT    416-424 AL    540-544 OR
  010-034 MA    425-428 MS    545-573 CA
  035-039 RI    429-432 AR    574     AK
  040-049 CT    433-439 LA    575-576 HI
  050-134 NY    440-448 OK    577-579 DC
  135-158 NJ    449-467 TX    580     VI Virgin Islands
  159-211 PA    468-477 MN    581-584 PR
Puerto Rico
  212-220 MD    478-485 IA    585     NM
  221-222 DE    486-500 MO    586     PI Pacific Islands*
  223-231 VA    501-502 ND    587-588 MS
  232-236 WV    503-504 SD    589-595 FL
  237-246 NC    505-508 NE    596-599 PR
Puerto Rico
  247-251 SC    509-515 KS    600-601 AZ
  252-260 GA    516-517 MT    602-626 CA
  261-267 FL    518-519 ID    627-645 TX
  268-302 OH    520     WY    646-647 UT
  303-317 IN    521-524 CO    648-649 NM
  318-361 IL    525     NM    *Guam, American Samoa, 
  362-386 MI    526-527 AZ    
Philippine Islands, 
  387-399 WI    528-529 UT     Northern
Mariana Islands
 
  650-699 unassigned, for future use
  700-728 Railroad workers through 1963, then
discontinued
  729-799 unassigned, for future use
  800-999 not valid SSNs.  Some sources have claimed that numbers
          above 900 were used when some state
programs were converted
          to federal control, but current SSA
documents claim no
          numbers above 799 have ever been
used.
 
 
The central two numbers, “is not related to geography but rather to the order in which SSNs are issued for a particular area”. The last four digits, “are assigned in chronological order within each area and group number as the applications are processed”.
 
RESULTS: The unique IDs were assigned values based on the principles of the Social Security Number, minus the hyphen separating the group values. The first four numbers represent the watershed the water body is contained in. The central value relates to the section of the watershed the water body is located in. Lastly, the last three digits are randomly assigned values.
 
 

Polygons with green borders represent watersheds. Water bodies are seen in blue. Black lines represent quarter division of watershed.
 
In the case of Figure 1, water bodies within this watershed would be assigned the following ID: 35734020. "3573" is the watershed ID, "4" is the quadrant it is located in, and "020" is the randomly assigned value.
ANALYSIS OF INFORMATION:  
 
Following the assignment of unique IDs, to all water bodies in the High Sierras, analysis can be efficiently performed. Unique IDs provide the foundation for all data to be referenced. Whether its collecting data on amphibians, fish, water quality, or anything pertaining to a lake, all information can be tied together using the unique ID. These unique IDs allow biologist to increase their analytical capabilities on any given lake in the High Sierras. By pulling together several different biological studies, increased awareness and understanding on any given water body can be derived.
 
CONCLUSION:
 
Water bodies in California do not have a standardized identification system in place. By implementing a state wide identification system for natural features, departmental resources could be used to their fullest potential. Currently the random assignment of unique identifiers to water bodies is in place, although only one department uses these identifiers. The ultimate goal would be the creation and distribution of a meaningful identification system, allowing departments to share and utilize their invaluable information with the rest of the community in a timely and efficient manner. To date the most efficient and meaningful way to uniquely identify water bodies in California is the identification system based on the Social Security Number.
 
REFERENCES:
 
Lakes.txt. Lakes metadata. Department of Fish and Game, Fisheries Programs Branch,
May 3, 2001.
Sn_Lakes.txt. Sn_lakes metadata. Department of Fish and Game, Fisheries Programs
Branch, May 3, 2001.
http://mapping.usgs.gov/www/gnis/. May 3, 2001.
http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html. May 3, 2001.
 
DATA SOURCE:
 
Watersheds: California Department of Fish and Game, May 2001.
 
Lakes, Sn_Lakes, fpb_lakes: California Department of Fish and Game, May 2001.
 
Social Security Area Number data. http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html