Thoughts on Unlocking Historical Directories


Last week I talked with Evelyn Cornell, of the Historical Directories project at the University of Leicester. The directories are mostly local listings information, trade focused, that pre-date telephone directories. Early ones are commercial ventures, later ones often produced with the involvement of public records offices and postal services. The ones digitised at the library in Leicester cover England and Wales from 1750 to 1919.

This is a rich resource for historic social analysis, with lots of detail about locations and what happened in them. On the surface, the directories have a lot of research value for genealogy and local history. Below the surface, waiting to be mined, is location data for social science, economics, enriching archives.

Evelyn is investigating ways to link the directories with other resources, or to find them by location search, to help make them more re-useful for more people.

How can the Unlock services help realise the potential in the Historical Directories? And will Linked Data help? There are two strands here – looking at the directories as data collections, and looking at the data implicit in the collections.

Let’s get a bit technical, over the fold.

Geo-references for the directories

Right now, each directory is annotated with placenames – the names of one or more counties containing places in the directory. Headings or sub-sections in the document may also contain placenames. Sample record for a directory covering Bedfordshire

As well as a name, the directories could have a link identifying a place. For example, the geonames Linked Data URL for Bedfordshire. The link can be followed to get approximate coordinates for use on a map display. This provides an easy way to connect with other resources that use the same link.

The directory records would also benefit from simpler, re-usable links. Right now they have quite complex-looking URLs that look like lookup.asp?[lots of parameters]. To encourage re-use, it’s worth composing links that look cleaner, more like /directory/1951/kellys_trade/ This could also help with search engine indexing, making the directories more findable via Google. There are some Cabinet Office guidelines on URIs for the Public Sector that could be useful here.

Linked Data for the directories

Consider making each ‘fact file’ of metadata for a given directory available in a machine-readable form, using common Dublin Core elements where possible. This could be done embedded in the page, using a standard like RDFa or it could be done at a separate URL, with an XML document describing and linking to the record.

Consider a service like RCAHMS’ Scotland’s Places, which brings together related items from the catalogues of several different public records bodies in Scotland, when you visit a location page. Behind the scenes, different archives are being “cross-searched” via a web API, with records available in XML.

Mining the directories

The publications on the Historical Directories site are in PDF format. There have been OCR scans done but these aren’t published on the site – they are used internally for full-text search. (Though note the transcripts along with the scans are available for download from the UK Data Archive). The fulltext search on the Historical Directories site works really well, with highlights for found words in the PDF results.

But the gold in a text-mining effort like this is found in locations of the individual records themselves – the listings connected to street addresses and buildings. This kind of material is perfect for rapid demographic analysis. The Visualising Urban Geographies project between the National Library of Scotland and University of Edinburgh is moving in this direction – automatically geo-coding addresses to “good enough” accuracy. Stuart Nicol has made some great teaching tools using search engine geocoders embedded in a Google Spreadsheet.

But this demands a big transition – from “raw” digitised text, to structured tabular data. As Rich Gibson would say about Planet Earth – “It’s not even regularly irregular”, and can’t currently be successfully automated.

Meanwhile of the directories do have more narrative,descriptive text, interleaved with tabular data on population, trade, livestock. This material reminds me of the Statistical Accounts of Scotland.

For this kind of data there may be useful yield from the Unlock Text geoparsing service – extracting placenames and providing gazetteer links for the directory. Places mentioned in Directories will necessarily be clustered together, so the geoparser’s techniques for ranking suggested locations and picking the most likely one, should work well.

This is skimming the surface of what could be done with historic directories, and I would really like to hear about other related efforts.

Advertisements

One Response to Thoughts on Unlocking Historical Directories

  1. To follow up from Jo’s post I wanted to flag up a project that EDINA are currently preparing into a bid for funding.

    The idea would combine materials created as part of several existing digitisation projects (led by the National Library of Scotland) and building upon those with lots of active community participation. Initially this would be focused on the Edinburgh area but the hope is to build a technical solution and sustainability model that could be extended to other parts of Scotland and other areas of the UK.

    The NLS have created high quality digital scans of historic Edinburgh maps which have been geocoded so that they can be combined, compared and used for mapping specific features, notes etc. in intuitive web-based mapping interfaces. These maps are also included in the Visualising Urban Geographies Project Jo mentions above.

    There is also an ongoing project to digitise a large number of (annual) Edinburgh Post Office Directories printed between the 1790s and the 1930s. Like the Leicester Directories that Jo has talked about these are rich resources with names, addresses and additional business listings and listings of duty and tax rates, Law, Church, Education and Masonic directories, detailed local ads, standard rates for Hackney carriages, etc.

    We are thinking about creating a tool that would show you an historic map and a set of names and addresses and would let you easily place this directory information on the appropriate spot on the map. You could use the mapped information to build custom maps for use as part of a social history research project, a family history site, etc. We are also hoping that we could provide an API for other projects to use this user generated content. Since there is approx. one directory every 1-2 years we think that this could be an incredibly interesting tool and could build into a very rich resource for discovering local history and genealogical information.

    It could be an incredibly exciting project but we are keen to gauge interest and hear any comments or views, particularly if you would find such a resource useful. We’d particularly love to hear if you would participate by mapping addresses, what kind of projects you might use this tool for (if any – you might be happy just to contribute to the mapping) and if there is any particular group(s) that you think we should speak to who, if this project is funded and goes ahead, could help us reach members of the community interested in contributing to this work. If you have any comments please either post a response here or email me: Nicola.osborne@ed.ac.uk.

    Thank you for reading this, and thank you to Jo for letting me add this call for help here.

    – Nicola Osborne, Social Media Officer for EDINA.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: