Moving the Unlock blog

April 30, 2010

We have proper blog hosting set up at EDINA so we’re moving the Unlock service blog to a new home:

http://unlock.blogs.edina.ac.uk/

The past contents will stay here and also be duplicated at the new blog. Thanks.


Unlock Places API — version 2.2

April 21, 2010

The Unlock Places API was recently upgraded to include Ordnance Survey’s Open data. This feature rich data from Code-Point Open, Boundary-Line and the 1:50,000 gazetteer includes placenames and locations (points, boxes and shapes) and is now open for all to use! You can just get started with the API.

We’ve also added new functionality to the service, including an HTML view for features, more feature attributes, the ability to request request results in different coordinate systems as well as the usual speed improvements and bug-fixes.

The new data and features are available from Tuesday, 20th April 2010. Please visit the example queries page to try out some of the queries.

We welcome any feedback on the new features – and if there’s anything you’d like to see in future versions of Unlock, please let us know. Alternatively, why not just get in touch to let us know how you’re using the service, we’d love to hear from you!

Full details of the changes are listed below the fold.

Read the rest of this entry »


Linking Placename Authorities

April 9, 2010


Putting together a proposal for JISC call 02/10 based on a suggestion from Paul Ell at CDDA in Belfast. Why post it here? I think there’s value in working on these things in a more public way, and I’d like to know who else would find the work useful.

Summary

Generating a gazetteer of historic UK placenames, linked to documents and authority files in Linked Data form. Both working with existing placename authority files, and generating new authority files by extracting geographic names from text documents. Using the Edinburgh Geoparser to “georesolve” placenames and link them to widely-used geographic entities on the Linked Data web.

Background

GeoDigRef was a JISC project to extract references to people and places from several very large digitised collections, to make them easier to search. The Edinburgh Geoparser was adapted to extract place references from large collections.

One roadblock in this and other projects has been the lack of open historic placename gazetteer for the UK.

Placenames in authority files, and placenames text-mined from documents, can be turned into geographic links that connect items in collections with each other and with the Linked Data web; a historic gazetteer for the UK can be built as a byproduct.

Proposal

Firstly, working with placename authority files from existing collections, starting with the existing digitised volumes from the English Place Name Survey as a basis.

Where place names are found, they can be linked to the corresponding Linked Data entity in geonames.org, the motherlode of place name links on the Linked Data web, using the georesolver component of the Edinburgh Geoparser.

Secondly, using the geoparser to extract placename references from documents and using those placenames to seed an authority file, which can then be resolved in the same way.

An open source web-based tool will help users link places to one another, remove false positives found by the geoparser, and publish the results as RDF using an open data license.

Historic names will be imported back into the Unlock place search service.

Context

This will leave behind a toolset for others to use, as well as creating new reference data.

Building on work done at the Open Knowledge Foundation to convert MARC/MADS bibliographic resources to RDF and add geographic links.

Making re-use of existing digitised resources from CDDA to help make them discoverable, provide a path in to researchers.

Geonames.org has some historic coverage, but it is hit and miss (E.g. “London” has “Londinium” as an alternate name, but at the contemporary location). The new OS OpenData sources are all contemporary.

Once a placename is found in a text, it may not be found in a gazetteer. The more places correctly located, the higher the likelihood that other places mentioned in a document will also be correctly located. More historic coverage means better georeferencing for more archival collections.


Work in progress with OS Open Data

April 2, 2010

The April 1st release of many Ordnance Survey datasets as open data is great news for us at Unlock. As hoped for, Boundary-Line (administrative boundaries), the 50K gazetteer of placenames and a modified version of Code-Point (postal locations) are now open data.

Boundary Line of Edinburgh shown on Google earth. Contains Ordnance Survey data © Crown copyright and database right 2010

We’ll be putting these datasets into the open access part of Unlock Places, our place search service, and opening up Unlock Geocodes based on Code-Point Open. However, this is going to take a week or two, because we’re also adding some new features to Unlock’s search and results.

Currently, registered academic users are able to:

  • Grab shapes and bounding boxes in KML or GeoJSON – no need for GIS software, re-use in web applications
  • Search by bounding box and feature type as well as place name
  • See properties of shapes (area, perimeter, central point) useful for statistics visualisation

And in soon we’ll be publishing these new features currently in testing:

  • Relationships between places – cities, counties and regions containing found places – in the default results
  • Re-project points and shapes into different coordinate reference systems

These have been added so we can finally plug the Unlock Places search into EDINA’s Digimap service.

Having Boundary-Line shapes in our open data gazetteer will mean we can return bounding boxes or polygons through Unlock Text, which extracts placenames from documents and metadata. This will help to open up new research directions for our work with the Language Technology Group at Informatics in Edinburgh.

There are some organisations we’d love to collaborate with (almost next door, the Map Library at the National Library of Scotland and the Royal Commission on Ancient and Historical Monuments of Scotland) but have been unable to, because Unlock and its predecessor GeoCrossWalk were limited by license to academic use only. I look forward to seeing all the things the OS Open Data release has now made possible.

I’m also excited to see what re-use we and others could make of the Linked Data published by Ordnance Survey Research, and what their approach will be to connecting shapes to their administrative model.

MasterMap, the highest-detail OS dataset, wasn’t included in the open release. Academic subscribers to the Digimap Ordnance Survey Collection get access to places extracted from MasterMap, and improvements to other datasets created using MasterMap, with an Unlock Places API key.