Geography 26 Project Papers: Fall 1999
Starting to Study Supermarkets
Bob Strickland
STARTING TO STUDY SUPERMARKETS
Trials and Tribulations in Bringing a GIS Project to Fruition

Bob Strickland
American River College
Geography 26 - Data Acquisition in GIS - Fall, 1999

Abstract

            A student geographic information systems project stays simple and sticks to the basics and still turns up amazing discoveries about the six supermarkets located in a defined urban area.

Foreword

           This semester I am taking two GIS courses which require projects. Early on, I settled on a supermarket study for Geography 9 (Introduction to GIS). As I proceeded to research and collect and manipulate data for the supermarket project, it occurred to me that this project involved a great enough investment in time and resources to apply toward the requirements of both Geography 9 and Geography 26 (Data Acquisition). This seems especially useful, because it gives me the opportunity to discuss in detail the gathering and processing of data for the Geography 9 supermarket project and the problems encountered along the way. These projects tend to complement each other. For Geography 26, I discuss both  the process and final product of data acquisition; for Geography 9, I briefly mention how I acquired the data and set up and demonstrate queries in ArcView of that data. The form for reporting on the two projects also lead to a useful divergence. The Data Acquisition course format is HTML (web page), while Geography 9 projects are to be submitted using a computer presentation format, such as Microsoft PowerPoint, and a demonstration in ArcView.

Introduction
                Data acquisition is a necessary prerequisite to any geographic information systems (GIS) project. This paper explores the process of accumulating data for a supermarket project conducted for my Geography 9 class (Introduction to GIS); the study would attempt to compare supermarkets based on the prices they charge their customers . The purpose of that project is to use the skills and knowledge learned during this, my first, semester in GIS. The general rule adhered to is summarized by the popular acronym KISS (keep it simple, stupid) with short but meaningful added in. The tasks involved are: 1. define the geographic area to be studied; 2. collect and process the relevant spatial data (which must be georeferenced); 3. design and gather the needed (nonspatial) attribute data needed for price comparisons between supermarkets;  4. draw conclusions, evaluate the study's methodology, and suggest additional steps to make the study more valid; and 5. present the project to a body of my peers for their evaluation and solicit from them suggestions which might improve the conclusions of the study. My approach to the problem is to use the data which are most easily and economically accessible and to avoid expenses which would bankrupt the project.

Background

                A guiding principle throughout this project is that "data conversion, like so many things in life, quickly becomes an exercise in balancing trade-offs. Normally, you can only have two of the three coveted conversion characteristics -- good, fast, and cheap -- and must sacrifice the third. You can have data fast and cheap, but they're not going to be the best possible quality. You can obtain good data quickly, but they will be expensive. Or, you can have good data cheaply, but you're not going to get them very fast" (Hohl, 1998:8). This is illustrated in a model.  In any project, managers must select their two highest priorities; the third will result from that choice. For lack of a better term, I will refer to this as Hohl's Rule. My experience in this project led to a correlary for Hohl's Rule: the smaller the project, the more likely all three of these will be optimized. My project involves a limited amount of time, few dollars, and tentative results! This brings up a second guiding principle: when resources are limited, limit the scale of the project accordingly. Even large projects are often preceded by sample tests.
            The greatest opportunity in a project of this kind is to be able to use the techniques and express the concepts learned during a semester's toil. The quest for map data and the creation of a working map, as well as the less satisfactory sources and attempts cast aside, are valuable. My map production and other uses of ArcView were made possible by my previous experience with that application; but, mostly, I owe my instructors in Geography 9 and 25A/B/C (Dale Van Dam and Tom Lupo) and the publishers of Getting to Know ArcView GIS (1997). Instructors Van Dam and Paul Veisze (Geography 26) are responsible for my orientation in theories, concepts, and methods of data acquisition, manipulation, and analysis. My intention was to plow through the material covered in these courses to recall useful and relevant material that relates to this project and to record it for posterity. Murphy has struck again; his law is omnipresent and demands our attention, to forever play the "what if" game. "If something can go wrong, it will. And it will go wrong at the worst possible time." Ugly. But worth being aware of and now reminding me that time has played its course. DeMers has made a theme of the travel and adventure involved in the GIS experience. And this rings a familiar bell. Many times in our travels, we are just on a reconnaissance, finding places where we are not allowed to remain for long but jotting them down in our memories as sites to be revisited on the next best occasion. So be it with the current GIS efforts!

Methods

                Limit Geographic Area of Project.  The study area is limited to include the nearest supermarkets to the American River College vicinity. This area includes the area bounded by and adjacent to these major streets:  Auburn Boulevard, Madison Avenue, Manzanita Avenue, Fair Oaks Boulevard, Marconi Avenue, and Watt Avenue. Six stores represent major supermarket chains within the study area: Albertson's, Safeway, Bel Air, Ralphs, Raley's, and SavMax. (During the study, "Lucky married Albertson's." The store that started out as Lucky took on the Albertson's name as a result of a merger or buy-out. For simplicity's sake, that store is referred to as Albertson's throughout this paper. Ralphs is the "new kid  on the block"; it occupies the location of a former Albertson''s, closed due to the impending marriage to Lucky.)
                Spatial Data -- Primary Locations.  The primary geographic entities are the supermarkets. I collected GPS (Global Positioning System) data for each of these with a Garmin 12 XL unit. Tests with this unit by our class this semester indicate that it is reliable within certain limits, especially when selective availability is corrected for by using Garmin add-ons to make use of Coast Guard beacons. My data consisted of single points recorded in the supermarket parking lots, usually closer to adjacent streets than to the store itself. I did not correct for selective availability. When displayed with georeferenced maps of the area, the GPS positions appear to be "generally" accurate (looks right; certainly never more than 30 meters off; and never on the wrong side of the street). I also collected a GPS point on the American River College campus as a central reference point (the place to start to go shopping).  In making my final map for this project I made use of the "symbol placement compromise" by changing the locations of my GPS points; I made them more inaccurate by moving them so that they did not intersect the adjacent streets.
                Spatial Data -- Background.  For map background, I considered using files from Geography 26 data bank. Sheridan.bil image is an aerial photo which includes the project area; but it eats up a lot of computer space, comes up slowly, and I could not think of how it would be immediately useful. Sacramento.tif is a scanned map that also has a large size and too low of a resolution (large pixels). The set of scanned 1:24,000 quads at first appeared useful -- until I realized that they were not seamless along their edges. Seamless quads can be purchased locally at California Surveying and Drafting Supply (4733 Auburn Boulevard, Sacramento); they've got  MapTech (all of northern California for $319.00 but probably not georeferenced), DeLorme TopoUSA (at $99 for the whole country, probably not useful for GIS), and Garmin MapSource (1:100,000 quads that load into their GPS).  The best product I've seen is Sure!MAPS RASTER (seamless and georeferenced; available from GeoWarehouse on the internet at www.geowarehouse.com).  To stay within the budget (zero dollars), I decided against buying commercial products.
                Then I decided to digitize the freeways and major streets within the project area. This produced shapefiles for the freeways and major streets, making it unnecessary to use the georeferenced quads.  This worked out well.  But I found that I could improve the map by trashing what I had at this point and starting over, because I found the TIGER files for Sacramento County to be available from the SACOG (Sacramento Area Council of Governments) web page (http:\\www.sacog.org). My project is projected in Teale Albers, and the SACOG TIGER files are unprojected Lat/Lon. I was able to build a Teale Albers projection of the SACOG TIGER files in Arc View. ArcView includes an extension for projecting vector files. It is not loaded in the ArcView Extensions until the user moves it (prjctr.avx) from esri\av_gis30\arcview\samples to esri\av_gis30\arcview\ext32. After projecting the county TIGER shapefile to Teale Albers,  I clipped the project area from it, using the rectangle graphics tool, "Select Features from Graphics" button, and Theme|Convert to Shapefile in ArcView. MyClip shows the progress to this point.
                I then worked with the TIGER clip of the project area to select the major geographic features. Within ArcView, I used the Query Builder and Theme|Convert to Shapefile to create themes for freeways, major streets, and creeks. To the view, I then added the Locations by GPS theme. A little more editing and adding symbology resulted in the final map layout for the project:
                Time Frame.  Data were collected during the period October 15 through December 12, 1999.
                Data Collection Intervals.  My intention was to revisit each store once a week during the project's lifetime; however, it immediately became obvious that such an effort would exceed project resources. This resulted in a compromise, where I collected prices from a single store (Albertson's) over a period of six weeks and of all six  stores on two occasions, at the beginning and end of the project (referred to as Slice 1 and Final Slice and coinciding with supermarket weeks including October 15 and December 3). This gives us an idea of how the six stores compare at two slices in time and how prices change over time at a single supermarket. In this report, only the first slice is illustrated in order not to overwhelm the reader with charts.
                Grocery List and Shopping Cart.  I arbitarily selected 38 items from among four categores:  dairy, produce, grocery, and household. These included eggs, milk, cottage cheese, and yogurt; bananas, lettuce, cauliflower, and potatoes; bottled water, juice, coffee, vegetable oil, olive oil, ketchup, mayonaise, peanut butter, flour, sugar, macaroni, spaghetti pasta, spaghetti sauce, rice, pinto and black beans, corn flakes, raisin bran, and canned tuna; and, finally,  laundry detergent and toilet tissue. The choice of these particular items was mostly random, with a vague plan to cast a net somewhat widely. In some cases, specific brands were targeted (e.g., Wesson oil in 24 ounce size and 1/2 gallon size of Breyers Ice Cream). In most cases, I selected the lowest price brand that fitted the general description of the item (e.g., taking the lower price of Post or Kellogg's raisin bran).  Some items had to be dropped off the list when I found that one or more stores did not carry an equivalent item. I eventually made up a "shopping list" composed of one or more items from the grocery list. It was from this list (or cartful of items) that I compared supermarket prices. If you have Microsoft Excel, you can pull up a sample of one of the spreadsheets I used by clicking here. It shows Albertson's prices at Slice 1 and the Shopping Cart list.
                Data on Prices.  As expected, the acquisition of nonspatial data was the most time consuming. Early on I learned to empathize with and admire those valient clerks who persevere and serve with wonderous smiles, as they ply their 8-to-9 hour work day over those hard concrete floors. To collect comparative price data, I prepared a hardcopy form with a list of 38 grocery items.  An entire day (10/15/99) was required for data collection during Week 1; subsequent pricing visits required a minimum of 30 minutes in the store (not counting travel time).  An additional 30 minutes or so would be required to enter the data in three separate Excel spreadsheets for each visit.
               Attribute Table.  I brought up the View in ArcView, selected my Locations by GPS theme, and brought up the attribute table for that theme. I edited the table to include an additional field, my key field, for each of the located points.
               Related Tables.  One of my goals was to build a database which would include a number of different tables with information about the supermarkets (address, phone number, etc.) and their prices.
               Charting Prices.  I constructed charts in ArcView to compare prices.
                Photographs.  I took digital photographs of each supermarket. In most cases, the photo is taken in the parking lot at the location where the supermarket GPS point was recorded.

Results

                Summary.  Slice 1 of the study shows a consistent pattern: highest prices were at Bel Air and lowest prices were at SavMax.  One surprise was that there was such a wide disparity between Shopping Cart prices for Bel Air and Raley's, two affiliated supermarkets. That led me to collect prices for a second slice at the end of the project's lifetime. To keep matters simple (usually but not always the best idea), I will stick to the shopping strategy which focuses on lowest prices (choosing store brands, along with all sale prices). In such a comparison, in Slice 1 the Shopping Cart at Bel Air totaled $101.40 to Raley's $90.97. The Final Slice found these two stores closer together with totals of $102.00 and $98.62, respectively. Safeway was the Final Slice low price winner for folks who are willing to join their club and stock up (by buying two when the offer is "Buy One, Get One Free." The lowest prices on the Shopping Cart for Slice 1 were SavMax ($87.67) and Safeway ($90.75); and for the Final Slice, Safeway ($88.27) and SavMax ($90.57). The Super Shopper's quest for lowest prices for the Shopping Cart was consistently just under $76.00 in both slices. For Excel spreadsheets showing Shopping Cart lowest prices due to buying store brands and discounted items, click on:  Slice 1 or Final Slice.

               Click to see a map showing the supermarket locations. (This map is a reminder only; it's the same as the "final map" above.)

               Click on the supermarkets to see a photo:  Albertson's, Bel Air, Raley's, Ralphs, Safeway, and SavMax

                Different Strokes for Different Folks.  People are different. This study tries to compare prices for people who might make different choices in the market place. Some customers will stick with national or regional brand names, steering free of store brands for the most part. We show comparisons for these people -- when some of the items are on sale and what the price would have been if the full price had been charged.  Some customers focus only on prices; we show how they fare in their quest for "lowest price." If you look closely, you will even find the "super shopper," who goes from store to store, looking for the lowest of the low. Good luck on your sanity as you navigate the Slice 1 charts which follow!

                Shopping cart total:  National and regional brand products at full price:  Compare!

                Shopping cart total:  National and regional brand products with some sale items:  Compare!

                Shopping cart total:  Any brand to get the lowest price: Compare!

                Shopping cart total:  The determined super shopper saves a bundle of dollars (probably at extreme cost to everything else in her life!) by bouncing from store to store to find the lowest overall price.  Compare!

                A second look:  The above charts emphasize differences in total price.  Here is what Slice 1 price totals would look like when we chart from zero dollars:  N-R Brands, NRB with Sale Items, Low Price Leader.

Analysis

                The analysis of supermarket prices is difficult due to the many pricing strategies used by different stores. Price variables include store brands, simple sale prices (e.g., an item on sale for $1.79), complex sale prices (e.g., "buy one, get one free" or  "get $0.50 gift certificate with purchase of two cans of soup"), and "club" prices which exclude "nonmembers" from most sale items. The project tried to reflect prices as they would affect different types of customers: (1) those who would buy only well known national or regional brands, (2) those who would shop only a single store but always choose the "low price," and (3) those shoppers who are willing to shop multiple stores for the "super low price."
In this project, the "low price" shopper is  considered to be a customer who is willing to join the club, buy in quantity if necessary, or put up with anything in order to get the lower price. Coupons were not considered in the current study, because this factor is expected to apply across the board; no known instance of "double coupons" was encounered. No-label brands, overstocks, damaged containers, or close-dated perishable foods were not knowingly encountered during the project.

Conclusion

                No firm conclusions can be drawn at this time, except to note the simple, tentative comparisons above. For the most part, this project followed the advice expressed by DeMers (1997:428). Methodology furnished a framework.  Literature was reviewed. Field work was carried out properly and in a timely fashion. Computations were made. Analyses were a little flakey and can be improved upon. And the paper is written.

Disclaimer

                The conslusions above are tentative at best. The study is flawed in almost every direction. The comparisons made are only for the items selected for the study and only for the the time periods mentioned above. A different shopping list would result in differences in total prices. The data were checked for accuracy; still undetected errors are likely. In addition, prices make up only one of a number of factors which lead shoppers to one store instead of another. These include proximity (convenience of being nearby or on one's travel route), felt "cleanliness" or "orderliness," choice of products, customer-clerk-manager-store relations, and "ambiance" in general. A friend of mine once said, "Please don't ask me to read the ingredients or look at the prices."
 

References

Clarke, Keith C., 1997. Getting Started with Geographic Systems. Upper Saddle River, New Jersey: Prentice Hall, Inc.

DeMers, Michael N., 1997. Fundamentals of Geographic Information Systems. New York: John Wiley & Sons, Inc.

ESRI, 1997. Getting to Know ArcView GIS. Redlands, California: Environmental Systems Research Institute, Inc.

Hernandez, Michael J., 1997. Database Design for Mere Mortals.  Berkeley, California: Addison-Wesley Developers Press.

Hohl, Pat, ed., 1998.  GIS Data Conversion. Santa Fe, New Mexico: OnWord Press.

Robinson, Arthur H.,  Joel L. Morrison, Phillip C. Muehrcke, A. Jon Kimerling, and Stephen C. Guptill, 1995. Elements of Cartography, 6th edition. New York: John Wiley & Sons, Inc.