TITLE:  Assignment of Unique Identifiers to Water Bodies in the High Sierras of

  California

 

AUTHOR:  Kristina White

 

Geography 26, GIS Data Acquisition, Spring 2001

 

American River Community College

 

Sacramento, California

 

ABSTRACT:  The importance of assigning unique identifiers (IDs) to spatial features is often overlooked when creating new data layers in a GIS.  The following is an examination of three solutions to the problems associated with assignment of unique identifiers.  This project will focus on creating unique IDs for water bodies in the High Sierras of California. 

 

INTRODUCTION:  The issue of unique IDs for identifying spatial features has not been fully addressed in the GIS community.  There is little to no information on the web, through ESRI, or in textbooks addressing the tremendous importance of features containing unique IDs.  This lack of information not only leaves GIS users ignorant of the importance of meaningful unique IDs, but contributes to the lack of procedural ways to assign IDs. 

 

The tendency to use ARC/INFO assigned unique IDs has taken precedence over coming up with more meaningful and useful IDs.  While there is nothing wrong with randomly assigning IDs, the lack of meaning has resulted in their insignificance and future demise.  When an ID is spatially connected to a feature and contains information on its location, then the ID has a better chance of being used indefinitely.  By allocating a specific and unchangeable ID to a spatial feature, similar to the name of a lake, that feature has the capacity to be researched to its fullest extent.  Without a stable ID, there is no way to query out information on a given spatial feature over a range of different resources.  For example, the Forest Service may have information on the same lake as The Department of Fish and Game; but without that link in place, information stays in its separate locations with neither department knowing the other has the information.  With unchanging IDs in place, agencies would be able to maximize their work by not having to duplicate research that already exist. 

 

BACKGROUND:  As of March 2001, California’s resource agencies lacked a complete coverage of High Sierra water bodies.  Several incomplete versions, at varying scales, were in existence; the criteria for which water bodies were digitized was not apparent.  Using ArcView, a newly digitized water body layer Lakes_24k.shp, was created.   Lakes_24k was later merged with two preexisting layers to produce a complete layer of all water bodies in the High Sierras.  This new layer, fpb_lakes, contains unique identifiers for all water bodies in the High Sierras.  Lakes was digitized from a 1:100,000 United States Geographic Survey (USGS) 7.5’ DRG topographic quads.  Sn_lakes and Lakes_24k were digitized from 1:24,000 USGS 7.5’ DRG topographic quads.  With a finalized and complete layer containing ALL water bodies in the High Sierras, the problem of assigning unique identifiers arose.  The two preexisting layers had their own set of identifiers and the newly digitized layer had none.  Table 1 shows the fields present in Lakes.dbf.

 

Table 1

 

AREA

PERIMETER

LAKES_

LAKES_ID

WATER

NAME

WRCBLAKES

GNIS_ID

51181.734

1346.143

2

1

1

 

0

 

3138397.000

16535.139

3

2

1

Copco Lake

3582

06070991

1439049.875

6781.860

4

372

1

Miller Lake

0

06073458

18621.189

538.039

5

3

1

Azalea Lake

3532

06001081

4540013.000

53515.043

6

4

1

Sheepy Lake

3528

06029224

1801812.500

6563.040

7

5

1

Indian Tom Lake

3639

06015924

4420350.000

24950.043

8

6

1

Lower Klamath Lake

3650

06019579

2773841.000

14671.305

9

7

1

White Lake

0

06038265

3698.470

242.426

10

8

1

 

0

 

15823.263

482.540

11

9

1

 

0

 

4806.873

294.153

12

10

1

 

0

 

 

Lakes.dbf has four water body identifiers; Lakes_, Lakes_id, Wrcblakes, and Gnis_id.  Lakes_ and Lakes_id were ARC/INFO assigned identifiers.  When polygons are digitized in ARC/INFO, they are automatically assigned unique IDs.  Wrcblakes is an identifier, “containing information on water quality and fisheries management” (lakes.txt).  Gnis_id, “USGS Geographic Names Information System (GNIS) code uniquely identifying the instance of the given lake name” (lakes.txt). 

 

Table 2 shows the fields present in Sn_lakes.dbf.

 

Table 2

 

AREA

PERIMETER

LAKES_

LAKES_ID

LCODE

CFF_CODE

WATER

NAME

689.55021

126.72091

12795

25

1

412

1

 

5221.96251

398.35090

12796

27

1

410

1

 

119940.49676

1897.45907

12797

28

1

410

1

 

269.04632

62.71005

12798

29

1

410

1

 

33144.42936

1239.66887

12799

30

1

412

1

 

349.54098

68.17613

12800

31

1

410

1

 

463.58258

82.40878

12801

32

1

410

1

 

404.04723

77.08140

12802

33

1

410

1

 

1513.65958

312.51002

12803

34

1

410

1

 

890.06391

127.22250

12804

38

1

412

1

 

522.97880

99.04505

12805

39

1

412

1

 

 

Sn_lakes.dbf has three water body identifiers; Lakes_, Lakes_id, and Cff_code.  Lakes_ and Lakes_id were ARC/INFO assigned identifiers.  Cff_code was, “ populated by creating a relate between poly# and lpoly#” (sn_lakes.txt). 

 

USGS Geographic Names Information System (GNIS) in essence should have provided unique IDs for all water bodies in California.  GNIS states, “The Federally recognized name of each feature described in the data base is identified, and references are made to a feature's location by State, county, and geographic coordinates (http://mapping.usgs.gov/www/gnis/)”.  If all water body features would have been given a GNIS ID then the current problem of trying to find ways to uniquely identify the water body features would not have arose.  GNIS IDs in essence are not adequate IDs due to the fact that they allowed names to be used multiple times by differing water bodies (Table 3).

 

Table 3

 

AREA

PERIMETER

LAKES_

LAKES_ID

WATER

NAME

WRCBLAKES

GNIS_ID

2004292.625

13338.780

1477

913

1

Almanor, Lake

0

 

 

80098.250

1488

921

1

Almanor, Lake

2965

06000442

243629.422

8489.172

1495

928

1

Almanor, Lake

0

 

349050.906

2804.329

355

160

1

Bass Lake

0

06073793

268497.969

2819.887

2925

2758

1

Bass Lake

268

06001575

45216.406

929.125

4248

4352

1

Bass Lake

2157

06030274

4169629.750

21768.623

4970

5424

1

Bass Lake

1949

06030275

9307.176

486.897

5296

5915

1

Bass Lake

2262

06001574

810.033

102.889

1264

764

1

Hidden Lake

0

06014641

10666.720

410.675

1531

963

1

Hidden Lake

0

06014646

28102.094

831.617

2504

2129

1

Hidden Lake

0

06014644

22013.289

626.512

4499

4648

1

Hidden Lake

2195

06014639

40862.984

1179.892

5729

6415

1

Hidden Lake

0

06014648

 

OBJECTIVE:  The significance of unique identifiers is often overlooked by the layperson, but to scientist working in the field, they are of the utmost importance.  Imagine that you are a scientist gathering information on a particular water body in the High Sierras.  The lake has no name on a topographic map and you are unsure of how to input your information into your GPS, so the information you have gathered is indeed associated with the correct water body.  With each water body containing its own unique ID, the problem is solved. 

            The unique IDs will be displayed on topographic maps, which will be used by biologist out in the field (See Illustration 1). 

 

Illustration 1

 

 

My objective is to create a unique IDs for all of the water bodies that are not too lengthy.  If the unique ID is too long, it becomes a hindrance out in the field versus a positive way to identify water body features.  A biologist wants to be able to quickly input the ID into their GPS versus a lengthy cumbersome number.  So it is my objective to create an ID that is no longer than 10 digits in length.  The unique IDs will be numeric in character rather than a string.  This is to avoid duplications and to allow for easier post-processing.  A computer is much more adept at dealing with numbers versus strings. 

 

METHODOLOGY:  I have approached the problem of assigning unique identifiers in several different ways.  Three solutions to creating unique IDs were contemplated.  The first was a random assignment, the second was based on the latitude and longitude of the water body, and the third solution was based on the United States Social Security Number. 

 

Initially the water bodies were given a chronologically random identifier with no duplicates.  An example of this can be seen in Illustration 1.  These random identifiers are currently being used, although a change to a more efficient identifying system is currently being contemplated.  The main problem with using this system is there is no locational information, of any sort tied to the IDs.  So if a biologist wanted to find a water body occurring in a specific water shed they would have to search through roughly 24,000 entries.  The benefit of using this system is that the number is no longer than five digits, allowing for easy use in the field.  The overall reason for switching to a more proficient system, is that the IDs mean absolutely nothing.  Allowing information to be tied to the ID would make its overall use more effective. 

 

Assigning water bodies by their latitude and longitude was briefly considered.  This would allow the IDs to not only be a way to identify the lake but would also give it meaning.  The main problem with this system of identification is the length of the ID.  For example, a given water body has the latitude 38 degrees 42’27”, with a longitude of 120 degrees 15’53”.  Translated, this would provide a thirteen-digit ID (3842271201553).  This digit would prohibit the ease of its use out in the field. 

 

The unique IDs that I found most effective are based on the concept of the United States Social Security Number.  “A Social Security Number (SSN) consists of nine digits, commonly written as three fields separated by hyphens: AAA-GG-SSSS. The first three-digit field is called the ‘area number’. The central, two-digit field is called the ‘group number’. The final, four-digit field is called the ‘serial number’ (http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html).”  The first three digits of an individuals Social Security Number reports the state the person was born in.  For example, a Social Security Number that starts with “565”, means that the person was born in California (Table 4). 

 

Table 4

 

  001-003 NH    400-407 KY    530     NV

  004-007 ME    408-415 TN    531-539 WA

  008-009 VT    416-424 AL    540-544 OR

  010-034 MA    425-428 MS    545-573 CA

  035-039 RI    429-432 AR    574     AK

  040-049 CT    433-439 LA    575-576 HI

  050-134 NY    440-448 OK    577-579 DC

  135-158 NJ    449-467 TX    580     VI Virgin Islands

  159-211 PA    468-477 MN    581-584 PR Puerto Rico

  212-220 MD    478-485 IA    585     NM

  221-222 DE    486-500 MO    586     PI Pacific Islands*

  223-231 VA    501-502 ND    587-588 MS

  232-236 WV    503-504 SD    589-595 FL

  237-246 NC    505-508 NE    596-599 PR Puerto Rico

  247-251 SC    509-515 KS    600-601 AZ

  252-260 GA    516-517 MT    602-626 CA

  261-267 FL    518-519 ID    627-645 TX

  268-302 OH    520     WY    646-647 UT

  303-317 IN    521-524 CO    648-649 NM

  318-361 IL    525     NM    *Guam, American Samoa,

  362-386 MI    526-527 AZ     Philippine Islands,

  387-399 WI    528-529 UT     Northern Mariana Islands

 

  650-699 unassigned, for future use

  700-728 Railroad workers through 1963, then discontinued

  729-799 unassigned, for future use

  800-999 not valid SSNs.  Some sources have claimed that numbers

          above 900 were used when some state programs were converted

          to federal control, but current SSA documents claim no

          numbers above 799 have ever been used.

 

 

The central two numbers, “is not related to geography but rather to the order in which SSNs are issued for a particular area”.  The last four digits, “are assigned in chronological order within each area and group number as the applications are processed”. 

 

RESULTS:  The unique IDs were assigned values based on the principles of the Social Security Number, minus the hyphen separating the group values.  The first four numbers represent the watershed the water body is contained in.  The central value relates to the section of the watershed the water body is located in.  Lastly, the last three digits are randomly assigned values.

 

Figure 1

 

Polygons with green borders represent watersheds.  Water bodies are seen in blue.  Black lines represent quarter division of watershed. 

 

In the case of Figure 1, water bodies within this watershed would be assigned the following ID: 35734020.  "3573" is the watershed ID, "4" is the quadrant it is located in, and "020" is the randomly assigned value. 


ANALYSIS OF INFORMATION: 

 

Following the assignment of unique IDs, to all water bodies in the High Sierras, analysis can be efficiently performed.  Unique IDs provide the foundation for all data to be referenced.  Whether its collecting data on amphibians, fish, water quality, or anything pertaining to a lake, all information can be tied together using the unique ID.  These unique IDs allow biologist to increase their analytical capabilities on any given lake in the High Sierras.  By pulling together several different biological studies, increased awareness and understanding on any given water body can be derived. 

 

CONCLUSION: 

 

Water bodies in California do not have a standardized identification system in place. By implementing a state wide identification system for natural features, departmental resources could be used to their fullest potential.  Currently the random assignment of unique identifiers to water bodies is in place, although only one department uses these identifiers.  The ultimate goal would be the creation and distribution of a meaningful identification system, allowing departments to share and utilize their invaluable information with the rest of the community in a timely and efficient manner.  To date the most efficient and meaningful way to uniquely identify water bodies in California is the identification system based on the Social Security Number. 

 

REFERENCES:

 

Lakes.txt.  Lakes metadata.  Department of Fish and Game, Fisheries Programs Branch,

  May 3, 2001.

Sn_Lakes.txt.  Sn_lakes metadata.  Department of Fish and Game, Fisheries Programs

Branch, May 3, 2001. 

http://mapping.usgs.gov/www/gnis/.  May 3, 2001. 

http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html.  May 3, 2001. 

 

DATA SOURCE: 

 

Watersheds:  California Department of Fish and Game, May 2001.

 

Lakes, Sn_Lakes, fpb_lakes:  California Department of Fish and Game, May 2001.

 

Social Security Area Number data. http://www.cpsr.org/cpsr/privacy/ssn/ssn.structure.html