TITLE: Assignment of Unique Identifiers to Water Bodies in the High Sierras of

California

AUTHOR: Kristina White

Geography 26, GIS Data Acquisition, Spring 2001

American River Community College

Sacramento, California

ABSTRACT: The importance of assigning unique identifiers (IDs) to spatial features is often overlooked when creating new data layers in a GIS. The following is an examination of three solutions to the problems associated with assignment of unique identifiers. This project will focus on creating unique IDs for water bodies in the High Sierras of California.

INTRODUCTION: The issue of unique IDs for identifying spatial features has not been fully addressed in the GIS community. There is little to no information on the web, through ESRI, or in textbooks addressing the tremendous importance of features containing unique IDs. This lack of information not only leaves GIS users ignorant of the importance of meaningful unique IDs, but contributes to the lack of procedural ways to assign IDs.

The tendency to use ARC/INFO assigned unique IDs has taken precedence over coming up with more meaningful and useful IDs. While there is nothing wrong with randomly assigning IDs, the lack of meaning has resulted in their insignificance and future demise. When an ID is spatially connected to a feature and contains information on its location, then the ID has a better chance of being used indefinitely. By allocating a specific and unchangeable ID to a spatial feature, similar to the name of a lake, that feature has the capacity to be researched to its fullest extent. Without a stable ID, there is no way to query out information on a given spatial feature over a range of different resources. For example, the Forest Service may have information on the same lake as The Department of Fish and Game; but without that link in place, information stays in its separate locations with neither department knowing the other has the information. With unchanging IDs in place, agencies would be able to maximize their work by not having to duplicate research that already exist.

BACKGROUND: As of March 2001, California’s resource agencies lacked a complete coverage of High Sierra water bodies. Several incomplete versions, at varying scales, were in existence; the criteria for which water bodies were digitized was not apparent. Using ArcView, a newly digitized water body layer Lakes_24k.shp, was created. Lakes_24k was later merged with two preexisting layers to produce a complete layer of all water bodies in the High Sierras. This new layer, fpb_lakes, contains unique identifiers for all water bodies in the High Sierras. Lakes was digitized from a 1:100,000 United States Geographic Survey (USGS) 7.5’ DRG topographic quads. Sn_lakes and Lakes_24k were digitized from 1:24,000 USGS 7.5’ DRG topographic quads. With a finalized and complete layer containing ALL water bodies in the High Sierras, the problem of assigning unique identifiers arose. The two preexisting layers had their own set of identifiers and the newly digitized layer had none. Table 1 shows the fields present in Lakes.dbf.

Table 1

AREA	PERIMETER	LAKES_	LAKES_ID	WATER	NAME	WRCBLAKES	GNIS_ID
51181.734	1346.143	2	1	1		0
3138397.000	16535.139	3	2	1	Copco Lake	3582	06070991
1439049.875	6781.860	4	372	1	Miller Lake	0	06073458
18621.189	538.039	5	3	1	Azalea Lake	3532	06001081
4540013.000	53515.043	6	4	1	Sheepy Lake	3528	06029224
1801812.500	6563.040	7	5	1	Indian Tom Lake	3639	06015924
4420350.000	24950.043	8	6	1	Lower Klamath Lake	3650	06019579
2773841.000	14671.305	9	7	1	White Lake	0	06038265
3698.470	242.426	10	8	1		0
15823.263	482.540	11	9	1		0
4806.873	294.153	12	10	1		0

Lakes.dbf has four water body identifiers; Lakes_, Lakes_id, Wrcblakes, and Gnis_id. Lakes_ and Lakes_id were ARC/INFO assigned identifiers. When polygons are digitized in ARC/INFO, they are automatically assigned unique IDs. Wrcblakes is an identifier, “containing information on water quality and fisheries management” (lakes.txt). Gnis_id, “USGS Geographic Names Information System (GNIS) code uniquely identifying the instance of the given lake name” (lakes.txt).

Table 2 shows the fields present in Sn_lakes.dbf.

Table 2

AREA	PERIMETER	LAKES_	LAKES_ID	LCODE	CFF_CODE	WATER	NAME
689.55021	126.72091	12795	25	1	412	1
5221.96251	398.35090	12796	27	1	410	1
119940.49676	1897.45907	12797	28	1	410	1
269.04632	62.71005	12798	29	1	410	1
33144.42936	1239.66887	12799	30	1	412	1
349.54098	68.17613	12800	31	1	410	1
463.58258	82.40878	12801	32	1	410	1
404.04723	77.08140	12802	33	1	410	1
1513.65958	312.51002	12803	34	1	410	1
890.06391	127.22250	12804	38	1	412	1
522.97880	99.04505	12805	39	1	412	1

Sn_lakes.dbf has three water body identifiers; Lakes_, Lakes_id, and Cff_code. Lakes_ and Lakes_id were ARC/INFO assigned identifiers. Cff_code was, “ populated by creating a relate between poly# and lpoly#” (sn_lakes.txt).

USGS Geographic Names Information System (GNIS) in essence should have provided unique IDs for all water bodies in California. GNIS states, “The Federally recognized name of each feature described in the data base is identified, and references are made to a feature's location by State, county, and geographic coordinates (http://mapping.usgs.gov/www/gnis/)”. If all water body features would have been given a GNIS ID then the current problem of trying to find ways to uniquely identify the water body features would not have arose. GNIS IDs in essence are not adequate IDs due to the fact that they allowed names to be used multiple times by differing water bodies (Table 3).

Table 3

AREA	PERIMETER	LAKES_	LAKES_ID	WATER	NAME	WRCBLAKES	GNIS_ID
2004292.625	13338.780	1477	913	1	Almanor, Lake	0
	80098.250	1488	921	1	Almanor, Lake	2965	06000442
243629.422	8489.172	1495	928	1	Almanor, Lake	0
349050.906	2804.329	355	160	1	Bass Lake	0	06073793
268497.969	2819.887	2925	2758	1	Bass Lake	268	06001575
45216.406	929.125	4248	4352	1	Bass Lake	2157	06030274
4169629.750	21768.623	4970	5424	1	Bass Lake	1949	06030275
9307.176	486.897	5296	5915	1	Bass Lake	2262	06001574
810.033	102.889	1264	764	1	Hidden Lake	0	06014641
10666.720	410.675	1531	963	1	Hidden Lake	0	06014646
28102.094	831.617	2504	2129	1	Hidden Lake	0	06014644
22013.289	626.512	4499	4648	1	Hidden Lake	2195	06014639
40862.984	1179.892	5729	6415	1	Hidden Lake	0	06014648

OBJECTIVE: The significance of unique identifiers is often overlooked by the layperson, but to scientist working in the field, they are of the utmost importance. Imagine that you are a scientist gathering information on a particular water body in the High Sierras. The lake has no name on a topographic map and you are unsure of how to input your information into your GPS, so the information you have gathered is indeed associated with the correct water body. With each water body containing its own unique ID, the problem is solved.

The unique IDs will be displayed on topographic maps, which will be used by biologist out in the field (See Illustration 1).

Illustration 1

My objective is to create a unique IDs for all of the water bodies that are not too lengthy. If the unique ID is too long, it becomes a hindrance out in the field versus a positive way to identify water body features. A biologist wants to be able to quickly input the ID into their GPS versus a lengthy cumbersome number. So it is my objective to create an ID that is no longer than 10 digits in length. The unique IDs will be numeric in character rather than a string. This is to avoid duplications and to allow for easier post-processing. A computer is much more adept at dealing with numbers versus strings.

METHODOLOGY: I have approached the problem of assigning unique identifiers in several different ways. Three solutions to creating unique IDs were contemplated. The first was a random assignment, the second was based on the latitude and longitude of the water body, and the third solution was based on the United States Social Security Number.

Initially the water bodies were given a chronologically random identifier with no duplicates. An example of this can be seen in Illustration 1. These random identifiers are currently being used, although a change to a more efficient identifying system is currently being contemplated. The main problem with using this system is there is no locational information, of any sort tied to the IDs. So if a biologist wanted to find a water body occurring in a specific water shed they would have to search through roughly 24,000 entries. The benefit of using this system is that the number is no longer than five digits, allowing for easy use in the field. The overall reason for switching to a more proficient system, is that the IDs mean absolutely nothing. Allowing information to be tied to the ID would make its overall use more effective.

Assigning water bodies by their latitude and longitude was briefly considered. This would allow the IDs to not only be a way to identify the lake but would also give it meaning. The main problem with this system of identification is the length of the ID. For example, a given water body has the latitude 38 degrees 42’27”, with a longitude of 120 degrees 15’53”. Translated, this would provide a thirteen-digit ID (3842271201553). This digit would prohibit the ease of its use out in the field.

The unique IDs that I found most effective are based on the concept of the United States Social Security Number. “A Social Security Number (SSN) consists of nine digits, commonly written as three fields separated by hyphens: AAA-GG-SSSS. The first three-digit field is called the ‘area number’. The central, two-digit field is called the ‘group number’. The final, four-digit field is called the ‘serial number’ (http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html).” The first three digits of an individuals Social Security Number reports the state the person was born in. For example, a Social Security Number that starts with “565”, means that the person was born in California (Table 4).

Table 4

001-003 NH 400-407 KY 530 NV

004-007 ME 408-415 TN 531-539 WA

008-009 VT 416-424 AL 540-544 OR

010-034 MA 425-428 MS 545-573 CA

035-039 RI 429-432 AR 574 AK

040-049 CT 433-439 LA 575-576 HI

050-134 NY 440-448 OK 577-579 DC

135-158 NJ 449-467 TX 580 VI Virgin Islands

159-211 PA 468-477 MN 581-584 PR Puerto Rico

212-220 MD 478-485 IA 585 NM

221-222 DE 486-500 MO 586 PI Pacific Islands*

223-231 VA 501-502 ND 587-588 MS

232-236 WV 503-504 SD 589-595 FL

237-246 NC 505-508 NE 596-599 PR Puerto Rico

247-251 SC 509-515 KS 600-601 AZ

252-260 GA 516-517 MT 602-626 CA

261-267 FL 518-519 ID 627-645 TX

268-302 OH 520 WY 646-647 UT

303-317 IN 521-524 CO 648-649 NM

318-361 IL 525 NM *Guam, American Samoa,

362-386 MI 526-527 AZ Philippine Islands,

387-399 WI 528-529 UT Northern Mariana Islands

650-699 unassigned, for future use

700-728 Railroad workers through 1963, then discontinued

729-799 unassigned, for future use

800-999 not valid SSNs. Some sources have claimed that numbers

above 900 were used when some state programs were converted

to federal control, but current SSA documents claim no

numbers above 799 have ever been used.

The central two numbers, “is not related to geography but rather to the order in which SSNs are issued for a particular area”. The last four digits, “are assigned in chronological order within each area and group number as the applications are processed”.

RESULTS: The unique IDs were assigned values based on the principles of the Social Security Number, minus the hyphen separating the group values. The first four numbers represent the watershed the water body is contained in. The central value relates to the section of the watershed the water body is located in. Lastly, the last three digits are randomly assigned values.

Figure 1

Polygons with green borders represent watersheds. Water bodies are seen in blue. Black lines represent quarter division of watershed.

In the case of Figure 1, water bodies within this watershed would be assigned the following ID: 35734020. "3573" is the watershed ID, "4" is the quadrant it is located in, and "020" is the randomly assigned value.

ANALYSIS OF INFORMATION:

Following the assignment of unique IDs, to all water bodies in the High Sierras, analysis can be efficiently performed. Unique IDs provide the foundation for all data to be referenced. Whether its collecting data on amphibians, fish, water quality, or anything pertaining to a lake, all information can be tied together using the unique ID. These unique IDs allow biologist to increase their analytical capabilities on any given lake in the High Sierras. By pulling together several different biological studies, increased awareness and understanding on any given water body can be derived.

CONCLUSION:

Water bodies in California do not have a standardized identification system in place. By implementing a state wide identification system for natural features, departmental resources could be used to their fullest potential. Currently the random assignment of unique identifiers to water bodies is in place, although only one department uses these identifiers. The ultimate goal would be the creation and distribution of a meaningful identification system, allowing departments to share and utilize their invaluable information with the rest of the community in a timely and efficient manner. To date the most efficient and meaningful way to uniquely identify water bodies in California is the identification system based on the Social Security Number.

REFERENCES:

Lakes.txt. Lakes metadata. Department of Fish and Game, Fisheries Programs Branch,

May 3, 2001.

Sn_Lakes.txt. Sn_lakes metadata. Department of Fish and Game, Fisheries Programs

Branch, May 3, 2001.

http://mapping.usgs.gov/www/gnis/. May 3, 2001.

http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html. May 3, 2001.

DATA SOURCE:

Watersheds: California Department of Fish and Game, May 2001.

Lakes, Sn_Lakes, fpb_lakes: California Department of Fish and Game, May 2001.

Social Security Area Number data. http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html