When surveying other systems, McInstosh startys with GYPSY, a system with “automatic geographical indexing of text documents that could then be searched with a spatial interface.” GYPSY creates a polygon mesh on the x, y, and z axes that allows for multiple mentions to push up through the z axis. In a given example about Nevada, if Las Vegas is mentioned often, it becomes a spike in the polygon, or “This created a skyline where it was clear which geographical areas the document(s) focused upon by the height the mesh rose out of the surface at different points on the mesh.”
Pre-NewsExplorer is more interested in ranking places according to importance as it tags them: “The first technique used the concept of the “importance” of a place. Each place in their database is given a value from 1 to 6 that denotes how “important” that particular place is. For example, 1 means that this place is the capital city of a country and 6 means that the place is a small town or village.” A second version of this system, NewsExplorer, improves upon visualizations, including using Google Earth’s KML.
The Informedia Digital Video Library has done some amazing work:
To utilise this in- formation they began the development of a system that could automatically extract this information from the narrative of the video (obtained through the use of the Carnegie Mellon University Sphinx speech-recognition engine [HAHR93]). The system also extracted any words that had been shown on the screen through the use of OCR and checked these for place names. This information could then be used to provide spatial video searching and display map footage relative to a video in sync with the content… (14)
So far it has become clear that each project used different methods to disambiguate place names, something we will have to think about with the IMH, if the TGN is not sufficiently detailed enough for nuanced locales in Indiana. McIntosh observed the following strategies in his survey:
There were several methods of disambiguation used by the different systems, these included: methods based on linguistic rules (e.g., understanding “Cambridge, England” to mean Cambridge in England); methods based on other heuristics such as minimum geographic distance, population comparison, the importance of a place (i.e., a capital city is more important than other cities), examining the local context (i.e., other surrounding place names) and score- based methods. (24)
The considerations that GYPSY finally takes into account by limiting their gazetteer so that there are fewer chances of overlapping place names is clearly not an option for us as I am expecting many of the articles to be heavily focused on a small geographical area. However, by using the weighting system employed by NewsExplorer and more heavily weighting Indiana instances of a name, perhaps we can overcome false positives for places like Princeton–a town name in several states. I will have to do more reading of articles to be sure that this kind of skewing is appropriate or if the geographical bent is not as Indiana-specific as I am expecting.
McIntosh also details why the project chose Google Maps (a good list to contrast against MAF although I still dislike the way in which frequency of hits is dealt with by Google):
GWT has a Java API for Google Maps which allows for easy integration with the rest of the web application.
• Included in the Google Maps API is the functionality to call Google’s geocoder2. The geocoder is a powerful tool that takes an address (e.g. “Hamilton, Waikato, New Zealand”) and attempts to return the latitude and longitude of that place.
• It has a well designed user interface with a large number of useful features.
• Google Maps is free to use for non-commercial purposes.
• Google Maps is an interface that many users will already be familiar with due to its wide spread use around the world. (41)
Much of the rest of this thesis deals with acquiring place names and rendering maps on the fly–a stage that I do not think we are quite ready to contemplate.