Semantic Musings: February 2010

I just read THIS paper in Nature Biotechnology, concerning the manipulation of GO to enhance its utility for automated high-throughput annotation. I found it quite... amusing...?? ...disturbing...??

It speaks to what we have been saying for years - that one of the problems with GO is that the answer to the question 'why is this protein annotated into this node' is, and can only be, 'Because I Say So'. There is no class-definition for any of the classes beyond a human readable description of what the class term means. As a result, it's really hard to know if the GO represents 'reality' of any sort (biological or otherwise), and the fact that they can manipulate this 'reality' in order to maximize entropy strongly suggests to me that it doesn't represent 'reality' at all, but rather represents something different.

That isn't to say that it isn't USEFUL!! There's no question at all about the utility and power of the GO!! Whether or not it represents 'reality' is irrelevant if that's not what it is designed to represent :-) (though I think the Barry Smith's of the world would be pretty choked if we were happily building ontologies that didn't attempt to represent reality LOL!)

Nevertheless, I am increasingly convinced that we (in the Semantic Web community) need an alternative gene ontology that somehow classifies something... 'definable'.

Semantic Musings

Thursday, February 18, 2010

No basis in "reality" from the get-GO?