http://www.dlib.org/dlib/september11/neatrour/09neatrour.html

At the University of Utah J. Willard Marriott Library, librarians wanted to enhance the metadata of many of their collections with geodata. They knew that Google had APIs that could ingest their metadata and come back with latitudes and longitudes that could then be reinserted into the XML files:

“In an effort to create more robust geographic data for the collection, we developed a three step process:

1) Use the Google Geocoding API to return latitude and longitude data based on existing place names in the metadata.

2) Create a table and scripting program to add the new latitude and longitude values to the core metadata XML file within CONTENTdm.

3) Upload links to the digital collection items with the newly compatible latitude and longitude data to GoogleMaps.”

Using PHP they extracted a list of unique place names: “For digital libraries using software that supports import and export of collection data in XML files, the locations can be extracted easily with PHP’s preg_match function, which is a regular expression matcher used to look for the applicable xml tag, in our case ‘covspa.’ (They use Dublin Core.)” Unfortunately, the IMH place name data is not properly encoded, and preg_match will yield dirty data until the encoding problem is fixed.

Step 2: “The Google Geocoding API Lookup script searches for all occurrences of <covspa>, reads the metadata, and breaks it up into distinct locations based on semicolons as a separator. Each location is put into an associative array that is later output into a comma-separated values (CSV) file. This spreadsheet is then manually reviewed for errors in the metadata.” Taking an article I hand-extracted place names from, using the Google Geocoding API became possible.

Step 3: “The second part of our script iterates through the location list, sending locations to the Google Geocoding API one at a time. This is done with the cURL library in PHP, which provides a mechanism for the API to transmit data using a variety of protocols, including automated HTTP requests. Google sends coordinates back if it finds a match. The coordinates are saved and then used to create a table populated with both the place names for the collection and their applicable geographic coordinates5.”(This is the part I’ll need help for!)

Step 4, Multiple Place Names (how they dealt with this could be very helpful to Brianna): “The metadata librarian ranked the place names and coordinates, so we were able to assign the most specific latitude and longitude coordinates to items with multiple place names in their metadata. This ranking system is necessary to get the subsequent script to update the item with the most local and accurate coordinate data. Since we have multiple place names in records separated by semicolons, the scripting program populates the latitude and longitude fields with the most specific information first. This process would not be necessary for other library collections where items have only one place name assigned. See Appendix Item 1 for the coordinate ranking system.”

Step 5: “The second script is an XML Modification script which takes the table of coordinate pairs and collection place names returned by the Google Geocoding API lookup script and inserts them into the core descriptive metadata file for the collection.”

Step 6: “Once the new latitude and longitude coordinates are in the metadata for the collection, the next step is to use the updated metadata to generate a KML file8 that can be used in GoogleMaps applications … Google MyMaps has size limits that restrict KML file rendering9.”

Step 7: “To generate thumbnails we add columns of script that, with exported item identification metadata, execute a command to generate hyperlinked thumbnails. In the formula we include additional descriptive metadata (place name, recording description) and add the persistent URL for the object in CONTENTdm. A final step involves adding a blank second row with a command to allow the Description column to exceed 256 characters.”

There are some more details to explore here, but this sounds like a great starting point for Brianna’s image collections!

Advertisements