Friday, June 29, 2007

The argument for LSIDs

I posted this as a comment to Ben's blog post about LSIDs, but I want to re-post it here because it sounds like the working group is planning to contact me and Carole to discuss our use of LSIDs so I might as well make my arguments more visible and explicit. Cartik Kothari, a Post-Doc in my lab has also waded into the fray

here was my response to Ben's assertion that LSIDs should be abandoned:

I agree with only a part of what you [Ben] say, but think you aren't being ambitious enough. What we should be pushing for is that the LSID spec (or something very very similar to it) is re-branded and ADOPTED BY THE W3C!!

What worries me about NOT adopting a new identifier system as we move into the Semantic Web is that we start to hack and kludge our way to full functionality by adding novel behaiours on top of URLs, or start putting the "intelligence" of where to find data/metadata into redirects, purl URLs, or other nasty, centralized, and IMO unsustainable architectures.

LSIDs solve a very distinct set of problems - separation of identity from location; separation of data from metadata; and multiple end-points/protocols for both data and metadata retrieval. As far as I can tell, NONE of the solutions that have been proposed in the discussions within the HCLS community have come close to addressing these three issues in anywhere near as elegant a way as the LSID spec does, and some of the proposals have been a bit worrisome (e.g. "just add a ? to the end of your URL if you want metadata"... where is THAT in the HTTP spec??). Even more odd, to me, is that all of this contorting and hand-wringing is only because people want to be able to stick a URI in their browser and see something at the end of it. Frankly, I just don't see the point of designing architectures around browsers! (I quite liked Cartik's argument that, in the hey-day of AOL, you simply typed a keyword into your browser! **NOBODY** wants to type URLs (URIs) into their browser! Good Lord! The sooner we move the end-user away from the "guts" of the Web architecture the better!)

One of the keynote talks at the WWW2007 meeting was from a Microsoft fellow (can't remember his name) who reminded us that, within the next 10 years, the interfaces into the Web will become ubiquitous in our lives. "The Browser" is going the way of the Dodo! Why are we so concerned about designing next-generation architectures around last-generation interfaces?

In the BioMoby project we use LSIDs extensively (and by the way, I have almost never found the need to plug one of them into my browser...). Here's one of the uses we have for them:

A Web Service is identified by an LSID. The Moby Central registry knows certain things about that service (its inputs, its outputs, its semantic type, its authorship), and through an hourly "ping" it knows if that service is visible/available or not. This information is available as getMetadata from Moby Central. In addition, however, the service provider knows things about their own service. They know what example inputs and outputs might be, they know system maintenance schedules, etc. All of these things can be provided as getMetadata from the service provider. As a consumer, I want to know about a service, so I go to the LSID authority and say "where can I get information about this service?", the authority says "you can go here (Moby) and here (provider)", I do so, and I can combine the knowledge both resources have about that service. THIS IS ALL PART OF THE LSID SPEC! No hacks, no kludges, no new consensus was required within the community.

I don't know about you, but as for me and my family, we are going to continue using LSIDs until someone comes up with a BETTER alternative!

1 comment:

Benjamin Good said...

Just for interest, there is a nice comparison of the LSID, DOI, Handle, and URL based object identification strategies in this entry on the Biodiversity Information Standards (TDWG) wiki

I want to clarify something .. when you say "... people want to be able to stick a URI in their browser and see something at the end of it. I just don't see the point of designing architectures around browsers!"

This clouds the issue. Its not that people are designing architectures around browsers, its that they are designing architectures around messaging standards - namely HTTP. My issues with LSID based dead ends were not discovered in my browser window, they were discovered in a GET call from a client I wrote myself.

As I said before, I (and I suspect other client-side developers such as those that write Web browsers) would be happy to have my agent automatically utilize getData or getMetaData calls on the LSIDs it discovers, but until the infrastructure necessary for dependable discovery of (working) resolver services comes into being, this is essentially impossible without manual intervention (which defeats the whole point).

The development of such an infrastructure seems to me to be the role of the W3C, not of the bioinformatics community. To succeed at what the LSID initiative was trying valiantly to accomplish, we need buy in from domains outside of life science. Since there is absolutely nothing in the LSID spec that has anything to do with life science, perhaps a start would be a simplification of the name and thus a clarification of the idea and its many possible applications.