TITLE

CATEGORY:


METADATA


TITLE:


ORGANIZING DATA SOURCES
FOR A FOCUSED STUDY OF AQUATIC BIRDS:
A WEB-BASED UTILITY


PRESENTER:


Anne Millington
For Geography 26, Data AcquisItion
American River College
Paul Veisze, Instructor
December 6, 1999


ABSTRACT

In the course of developing an information base for a study of influences on two species of aquatic birds, a tool to facilitate data collection was designed. To support the initial data acquisition phase of this study, a web-based utility, allowing the acquisition, evaluation, classification, storage and reporting of highly disparate information resources was developed.

Methods were developed for recording and transmitting information about discovered resources into a central database customized for the project, and for categorizing and evaluating both spatial and non-spatial reference material from external sources, based on an iterative successive development model.

The data collection instrument is customizable and modifiable, accommodates successive refinements of search questions, and will serve further functions as the study matures.

Possibilities for further refinement of the utility are identified.

INTRODUCTION

One of the more creative phases of a project, whether a simple term paper or a complex collaborative investigation, is the initial period of open-minded evaluation, collection and synthesis of the relevant ideas of others. Articulation of a topic for investigation serves as a kernel over which will be elaborated not only the specific findings that it spawns, but also the depth and breadth of the work of others who have pursued similar questions.

Further, the accumulation and evaluation of external data resources need not be a solo activity; the value of the exchange of ideas both within and among research groups should be facilitated. The search net is best cast widely, at least in the initial data gathering period. Of course, not all data are created equal; some sources are comprehensive and of great value, others will contribute only an idea, a stimulus to thought.

The question guiding development of this utility was this: how, in the excitement of the chase, do you enable what is essentially a free-roaming right-brain activity, while constraining it in a left-brain sense so that the findings from your searches are preserved and accessible?

This paper describes a solution: an Internet-based utility to facilitate the collection and classification of data sources during the initial phase of a specific investigation. References from a variety of sources can be pursued, evaluated, categorized and stored systematically according to an initial categorization scheme, and later analyzed, rejected or followed as desired.

BACKGROUND

Duane Marble's rapid prototyping spiral model of software design, as described in Michael DeMer's text Fundamentals of Geographic Information Systems, provides a development paradigm for this utility. Although the scope of Marble's model far outstrips the extent of this tool, it did start as a very simple idea that through successively informed modifications and deployment developed into a fairly involved utility to aid data acquisition.

The initial model for a data acquisition tool was simple: a means of collecting promising URLs obtained while surfing the Internet. The tool was refined and modified as needs developed, and as acquired data were organized and analyzed. Reciprocally, the references and data acquired were classified and organized by means of the tool, which in turn spawned a new round of data acquisition, analysis, and reporting.

The second-level refining from an initial model to a conceptual model was guided to some extent by the Content Standards for Digital Geospatial Metadata adopted by the Federal Geographic Data Committee. Because these standards provide a comprehensive common set of definitions for documenting geospatial data, a method of incorporating these into collected data was initiated and would be expected to develop further.

METHODS

A hypothetical research project was defined, for which data was sought. In an iterative fashion, methods of acquiring, categorizing, and organizing spatial, experimental, investigative and narrative data were developed and fine-tuned.

MOCK STUDY
An introduction that might serve for this mock study is as follows:

Factors influencing populations of aquatic birds range from human-engendered phenomena such as habitat loss, toxic chemical spills and poaching, to natural events such as disease, parasitism and severe weather. The relative weight of these deleterious factors will vary, depending on the biology and natural history of the species of interest, the time span in question, the duration of the events, the degree of insult, and the interaction of the factors themselves. Spatial display and analysis of these factors may demonstrate some of these interactive relationships, and suggest further relationships that have not heretofore been considered.

This preliminary study identifies sources of data for a spatial database relating to two specific populations of aquatic birds: common murres (Uria aalge) breeding off the coast of California and migratory snow geese (Anser caerulescens) wintering in the Central Valley. Data will be evaluated as to their accuracy, reliability and suitability to the development of multiple spatial data layers relevant to analysis of population pressures on these two species. These data include bird counts and other measures of population size and health, morbidity & mortality statistics and reports, oil spill and contamination incident reports, various types of site surveys and sampling data, historical reports and maps, and remotely sensed change data.

Proposed methods of modifying, interpreting and integrating the various data sources, in the service of a robust and useful set of GIS coverages, are discussed.

The common murre (Uria aalge) and the snow goose (Anser caerulescens) were chosen as representative of two basic classes of study species.


Common Murre
Murres are long lived, have few eggs per clutch; geese prolific. Many common murre populations are struggling to maintain their numbers; the health of certain snow geese populations, and possibly adjacent avian and human populations may be threatened by excessive size.

Snow geese are seasonally migratory; the movement of pelagic murre populations is more limited. Because of their differing habitat, a different collection of State, Federal and local agencies maintains relevant maps and data, and has jurisdiction over their habitat.

Snow Goose
 

TECHNOLOGIES USED IN DEVELOPMENT

Technologies readily at hand were used in developing this tool:
  • Database program: Microsoft Access (Office 97)
  • Spreadsheet: Microsoft Excel (Office 97)
  • Graphics: Paint Shop Pro v 6.0
  • HTML Editor: HomeSite v 4.0
  • A free cgi script modified to transmit data to a data file
  • Web space on a commercial web server allowing Tel net and ftp access for customization of scripts
Results from initial data forays were used to modify and refine the functionality of the tool.


RESULTS

WEB BASED INFORMATION ORGANIZER



This is a web enabled utility, allowing remote access to a data file developed to receive information for later analysis. An HTML form, accessed from a web browser, collects data which is sent to a data file on a remote server. The data from this file is periodically downloaded to a local Microsoft Access database. The Access database includes tables to which the data file is appended, more elaborate forms for local entry, reports, links to images and to text editors in order to enable more elaborate description, modification or analysis of the reference.

COMPONENTS
  • Access database:
    • Forms
      Data Entry (and Display) Form to Microsoft Access Database



    • Table
      Microsoft Access database "organizer.mdb" showing data table



    • Queries for appending records from data file on server, and for report definition.
    • Reports: metadata documents, communiques, other documentation
    • Links to images via OLE
    • links to text documents via OLE

  • Data file residing on server;
    receiving output of HTML form



  • HTML form;
    implemented via web browser



  • Script on server
    Modified locally, installed by ftp, permissions set via Tel net
    Returns acknowledgement, emails results to specified locations and appends to text data file

    Acknowledgement:


    Data file
  • Instructions
  • Browser
  • FTP access
  • Graphics program for screen shots

    ArcView display of ESRI World Satellite Images with overlay of lat/lon grid


  • Word or other text editor
INFORMATION DISPLAY:

Access database form entries such as the following:

  • Veterinary Pathology Text


  • ArcView display of ESRI World Wildlife Fund Ecological Regions


  • Image obtained from Patuxent of oiled murre


Access database tables

         

FLOW
Data enters system via direct input to organizer's table, or via remote HTML form.

EDITING AND REPORTING
Resources can be developed, integrated with other documents, summarized or deleted within the Microsoft Access database.

ANALYSIS

This is a useful, customizable tool for organizing references which could be extended to assist in organizing and maintaining material through initial analytical and evaluation phase as well.

It is a little cumbersome at present, requiring too much manipulation when appending data from the data file to the organizer table. It needs some manipulation to become more seamless and user-friendly. If the utility were entirely web-based, SQL requests for appending, editing and reporting from the database tables could occur via a user interface. A PHP3/mysql combination would be a good candidate to handle such a task inexpensively.

Other modifications might include:
  • In addition to providing individual access to the database, the html form could be modified so that many individuals could use it, and have their id stored in a field in their entry to the data file.
  • The utility or parts thereof could be password protected.
  • The complexity of the remote html form could be increased to collect as much information as the local form.
  • Associated FGDC Metadata Standard identifiers, stored in a field with each record or referencing a look-up table, could be used to generate reports; entire metadata documents could be autogenerated.

CONCLUSION

This information-gathering tool was designed with the sometimes conflicting goals of flexibility and systematization. As such, it permits the collection of sometimes only remotely related data and references, that may nonetheless have bearing on the project at hand.

However, its use appears best confined to the earlier phases of an investigation; later it would appropriately be integrated with or replaced by more formalized methods of organizing and presenting information.

REFERENCES

Atkinson, L. Core PHP Programming: Using PHP to build dynamic Web Sites. Prentice-Hall, Inc. Upper Saddle River, NJ, 1999.

Content Standard for Digital Geospatial Metadata (CSDGM). Federal Geographic Data Committee. Available: http://www.fgdc.gov/metadata/contstan.html, December 1999.

DeMers, M. Fundamentals of Geographic Information Systems. John Wiley & Sons, Inc., New York. 1997.

Schwartz, R.L. and T. Christiansen, Learning Perl, Second Edition. O'Reilly and Associates, Inc., Sebastopol, CA, 1997.

Seitz, B. BFormMail script. Available: http://www.infosheet.com/iScripts.html. November 1999.