Friday, May 30, 2008

Stable URIs

Since we published the "Creeps paper", Ben and I have received several queries asking us what we meant when we said "National Center for Biotechnology Information's (NCBI) current use of stable URLs for identifying much of its data". Unfortunately, the reference we gave in the manuscript wasn't very useful - it takes you to the NCBI homepage! Our apologies for that.

So, for anyone who hasn't discovered this on their own, NCBI have made much of their data (and even queries) available through "stable" URIs, as discussed here: Unfortunately, the NCBI's stable links send you to an HTML representation of the record, including all of the various visual paraphernalia on the page, so it isn't particularly suited for automated retrieval/scraping (does anyone know if i'm missing something? Is there a 'switch' I can throw in the URL that gives it back to me as a raw flatfile or as XML or as RDF?). In any case, that's what we were referring to. There is no indication on NCBI's page as to what they mean by "stable", so I can't comment on the actual stability of these URLs, but at least they're there and can be used in semantic-webby type applications to refer in a predictable and "clean" way to NCBI resources.

Since we're on the topic, I'd be amiss if I didn't bring the UniProt efforts into the discussion, since these are (IMO) a model of how we should be approaching the task. The reference for the UniProt work is here (if anyone knows a better, more appropriate reference please let me know). the UniProt data is available in a variety of formats from predictable URLs. e.g. provides the RDF version of the record, while or provides the HTML version of the record and provides an XML version of the record. All nice and predictable.

Anyway, all I wanted to do was put this out on my blog in case anyone is looking for the references and happens to stumble across this page.

Cheers all!