Linked Data, JISC and Access

January 8, 2010

With 2010 hindsight, I can smile at statements like:

“The Semantic Web can provide an underlying framework to allow the deployment of service architecture to support virtual organisations. This concept is now sometimes given the description the Semantic Grid.”

But that’s how it looked in the 2005 JISC report on “semantic web technologies”, which Paul Miller reviews at the start of his draft report on Linked Data Horizons.

I appreciate the new focus on fundamental raw data, the “core set of widely used identifiers” which connect topic areas and enable more of JISC’s existing investments to be linked up and re-used. JACS codes for undergraduate courses, or ISSNs for academic journals – simple things that can be made quickly and cheaply available in RDF, for open re-use.

It was a while after I read Paul’s draft before I clocked what is missing – a consideration of how Access Management schemes will affect the use of Linked Data in academic publishing.

Many JISC services require a user to prove their academic credentials; so do commercial publishers, public sector archives – the list is long, and growing.

URLs may have user/session identifiers in them, and to access a URL may involve a web-browser-dependent Shibboleth login process that touches on multiple sites.

Publishers support UK Federation, and sell subscriptions to institutions. On their public sites, one can see summaries, abstracts, thumbnails, but to get data, one has to be attached to an institution that pays a subscription and is part of the Federation.

Sites can publish Linked Data in RDF about their data resources. But if publishers want their data to be linked and indexed, they have to make two URLs for each bit of content; one public, one protected. Some data services are obliged to stay entirely Shibboleth-protected for licensing reasons, because the data available there is derived from other work that is licensed for academic use only.

EDINA’s ShareGeo service has this problem – its RSS feed of new data sets published by users is public, but to look at the items in it, one has to log in to Digimap through the UK Federation.

Unfortunately this breaks with one of the four Linked Data Principles – “When someone looks up a URI, provide useful information, using the standards“.

Outwith the access barrier, non-commercial terms of use for scholarly resources don’t complement a Linked Data approach well. For example, OCLC’s WorldCat bibliography search forbids “automated information-gathering devices“, which would catch a crawler/indexer looking for RDF. As Paul tactfully puts it:

To permit effective and widespread reuse, data must be explicitly licensed in ways that encourage third party engagement.