Friday, June 29, 2007

re-post: Relational databases on the Semantic Web

this is a re-post of the rant I had on my old blog now re-posted here:

Greetings from WWW 2007!

I've been "hit" several times in the past couple of weeks with a recurring idea that seems to be gaining momentum with a wide variety of groups - the idea of exposing "traditional" relational databases through an OWL-mapping layer. To name just a few, we have:

Bio2RDF DartGrid ComparaGrid and an offering from SMI

Now, don't get me wrong! I am not criticizing any of these projects in any way, and am in fact extremely excited about their successes! But it does make me wonder...

The Semantic Web, IMO, is something more than just the exposure of relational databases on the Web (even with the hidden semantics of their relational model fully explicit and exposed). I would argue that, because we have never had the ability to express the kinds of semantics that we can express with OWL, we have never captured the kinds of semantically rich data that we are going to want when the Semantic Web is finally established. My own experience in leading the SIRS DB project (a component of the CardioSHARE project, where we are attempting to build an RDF/OWL datastore that truly behaves in the way we envision the Semantic Web could behave) I have noticed that we are collecting far more data in this semantic database than we would ever have attempted to store in a more traditional RDB... simply because the pain of building a relational model to hold this extra data is somewhat higher than sticking a few extra triples into a triple store.

I understand that, from the perspective of the W3C, OWL isn't a necessary part of the Semantic Web (and I'm not entirely convinced that OWL will survive in the long-term either!); however I do think that, if the SW is going to live-up to it's promises... or more importantly, not disappoint the funding agencies so badly that they cut their investments after we have built-up their expectations... we are going to have to do more than just expose our databases on the Web in RDF.

As Eric Neumann argued when I asked this question of the HCLS Workshop panel yesterday, this is a necessary first-step, and I agree with him that it might succeed in bootstrapping a somewhat lackluster (IMO) start to the entire SWHCLS enterprise... but I hope that we aren't thinking of it as anything more than the low-hanging fruit. I fear that, if we don't go the next step and start focusing on data/metadata capture and modelling in a "true" SW manner, and encouraging others to do so by example, we may unnecessarily delay our achievement of the high expectations that we, and our funding agencies, have for the Semantic Web in Health Care and Life Sciences.

No comments: