Tuesday, November 22, 2011

Linked Data, Semantic Web, and Web 3.0

There are three "movements" in the community that seem to be synonymous in many discussions, but are becoming quite distinct in my own mind. Linked Data, the Semantic Web, and "Web 3.0". This is my current mindset regarding how these movements might be more precisely defined, and how the three movements differ.

Linked-Data is a movement to get all data on the Web exposed as Triples. There isn't much attention paid to the entity-types or relationship-types used to achieve this - the goal is just to get the data (a) published, and (b) published in a standard computationally-consumable "model". This movement is being pushed by the W3C (in particular) and I can understand why! Like HTML, publishing data in Triples is a pretty low barrier-to-entry and helps fulfill the objective of getting the maximum amount of buy-in from the global community. The football of data integration is kicked down the road to solve at a later date, just as technologies were invented after the fact to (try to) integrate HTML-formatted data.

The Semantic Web movement spends much more time thinking about what the entities ARE, and what the relationships between them ARE (and can be). So far, this movement is being spearheaded by a relatively small number of ontology consortia; the global thought-leaders who are defining these entities and relationships are quite visible (and influential!) on various mailing lists and blogs. This "concentration of power" in the hands of a few leaders is, I believe, the result of a much higher level of difficulty in deep and accurate semantic modeling. Frankly, getting semantics right is hard! I am certainly one of the masses in this regard - when I have a semantic representation problem, I generally defer to one of these thought-leaders to tell me how to do it properly (though amusingly, not all thought-leaders agree on "properly"...)

"Web 3.0", however, in my opinion, is (should be) a completely different animal! What distinguished Web 2.0 was that the content was user-generated. Individuals could publish their own opinions and thoughts and information through straightforward interfaces, and this data ended-up (often) being produced in a form that could easily be consumed by other interfaces, and "mashed-up" into enormously useful applications. So, given the analogous 3.0 moniker, I suggest that Web 3.0 should represent the combination of Semantic Web with Web 2.0 - a Web in which individuals are producing fragments of ontologies... INDIVIDUALIZED ontologies... which can be shared, compared, mashed-up, and utilized to interpret Linked Data. Rather than the ontology being the product of consortium group-think, it is the product of an individual... perhaps representing the opinion of only the individual who published it!

This vision of Web 3.0 is what my group is pushing for, and we're trying to build the interfaces that make it (a) easy for an individual to create these personalized knowledge-fragments, (b) easy for others to use these artifacts to interpret their data "through the eyes of another" in order to promote crucial scientific discourse and disagreement, and (c) easy for anyone to compare and contrast the ideas and understand the foundation for the differences between them (and then hopefully conceive and conduct experiments to evaluate the "truth" behind these differences!)

I realize that I am imposing definitions on existing words, but... I'm not satisfied with the current (loose) definitions of these words! So, this is how I define them, for myself, in order to keep them clear in my own head :-) Certainly, I think "the holy grail" is Web 3.0, and it's the goal that I am devoting my research career to achieving!

Thursday, February 18, 2010

No basis in "reality" from the get-GO?

I just read THIS paper in Nature Biotechnology, concerning the manipulation of GO to enhance its utility for automated high-throughput annotation. I found it quite... amusing...?? ...disturbing...??

It speaks to what we have been saying for years - that one of the problems with GO is that the answer to the question 'why is this protein annotated into this node' is, and can only be, 'Because I Say So'. There is no class-definition for any of the classes beyond a human readable description of what the class term means. As a result, it's really hard to know if the GO represents 'reality' of any sort (biological or otherwise), and the fact that they can manipulate this 'reality' in order to maximize entropy strongly suggests to me that it doesn't represent 'reality' at all, but rather represents something different.

That isn't to say that it isn't USEFUL!! There's no question at all about the utility and power of the GO!! Whether or not it represents 'reality' is irrelevant if that's not what it is designed to represent :-) (though I think the Barry Smith's of the world would be pretty choked if we were happily building ontologies that didn't attempt to represent reality LOL!)

Nevertheless, I am increasingly convinced that we (in the Semantic Web community) need an alternative gene ontology that somehow classifies something... 'definable'.

Tuesday, March 3, 2009

Connotea + ED + SWAN

Anyone who is following this blog will notice that I stopped blogging some time ago. I guess I'm old enough to be "old school", and I worry about putting half-baked ideas online because they become associated with my own reputation (such as it is! LOL!). Chances are, that isn't true! I am probably looking at Blogs far too formally, and assuming that my peers read blogs in the mindset that they read peer-reviewed papers... and while I know that is not true, it's still difficult for me to get over that perception.

Nevertheless! This evening I was having a conversation with Ben Good and he convinced me that I should put the content of that conversation up on my Blog so that he could run with it, create a killer-app, take-over the world, make a fortune, claim credit, and then acknowledge me in a footnote somewhere (He's gonna KILL me for saying it that way! LOL!)

So... here, verbatim, is the content of our email conversation... and I suspect that Ben and Eddie will, in a matter of days, create the said killer-app that brings these ideas to reality!

___________________________________________

Ben: What I see when I look hard is basically that the
tags themselves are not of very high value - especially where lots of
text is available. What is likely higher value is the availability of
a publicly accessible record of scientific attention - perhaps similar
to the Pubmed query logs except that its very easily accessible to the
public. This data tells us what people think is important. Its
reminding me a lot of things I read a million years ago in cognitive
science regarding human attention - we are generally very tightly
focused on small sections of the incoming data while we also process
the bits at the edges but to a much lower degree. I suspect that if
we could plot the amount of attention within, for example, the visual
field, we might see something very similar to the plots of posts per
paper (the power laws) from our data. That could make a very
interesting analogy if it worked out. Social tagging repositories as
the record of the hive mind's track of attention.


Mark: I don't disagree with you AT ALL! to be honest, the greatest benefit that *I* get from Connotea is the ability to follow the literature that is being read by people I respect/trust! i almost never use Connotea to re-discover stuff that I already have read, except in certain cases where I need to quote something, and then I find that Google finds that paper more easily than Connotea because I can remember enough of the sentence to discover it with a text search but not enough of the paper to remember what I would have tagged it with!! ...I suspect there's something to be learned in that... further to that thought... I think that SWAN does what Connotea desperately lacks! I use Connotea to learn what I need to know, based on what others in my "peer group" are reading. While Connotea has a "comments" field, it is almost never used... but that is the ONLY purpose of SWAN! What I want to know is (a) what are you reading, but more importantly (b) what are you THINKING! Connotea misses that mark... by a LONG shot... What has always bothered me about SWAN was that it is just another Silo {Tim, I don't mean that in a disparaging way! I just mean that it isn't clear how to link-in to SWAN from other tools that I already use. I LOVE SWAN!}. But just now (in the toilet!) I realized that what we need is an ED2Connotea2SWAN! We need to over-ride both the tagging interface AND the comment box, and connect the comment-box into the SWAN infrastructure. We should look into this, and if it isn't obvious how to do it, talk to Tim about the idea...
____________________________________________

So... Ben... go rule the world!!!

Friday, May 30, 2008

Stable URIs

Since we published the "Creeps paper", Ben and I have received several queries asking us what we meant when we said "National Center for Biotechnology Information's (NCBI) current use of stable URLs for identifying much of its data". Unfortunately, the reference we gave in the manuscript wasn't very useful - it takes you to the NCBI homepage! Our apologies for that.

So, for anyone who hasn't discovered this on their own, NCBI have made much of their data (and even queries) available through "stable" URIs, as discussed here: http://view.ncbi.nlm.nih.gov/. Unfortunately, the NCBI's stable links send you to an HTML representation of the record, including all of the various visual paraphernalia on the page, so it isn't particularly suited for automated retrieval/scraping (does anyone know if i'm missing something? Is there a 'switch' I can throw in the URL that gives it back to me as a raw flatfile or as XML or as RDF?). In any case, that's what we were referring to. There is no indication on NCBI's page as to what they mean by "stable", so I can't comment on the actual stability of these URLs, but at least they're there and can be used in semantic-webby type applications to refer in a predictable and "clean" way to NCBI resources.

Since we're on the topic, I'd be amiss if I didn't bring the UniProt efforts into the discussion, since these are (IMO) a model of how we should be approaching the task. The reference for the UniProt work is here http://dev.isb-sib.ch/projects/uniprot-rdf/intro.html (if anyone knows a better, more appropriate reference please let me know). the UniProt data is available in a variety of formats from predictable URLs. e.g. http://beta.uniprot.org/uniprot/P04626.rdf provides the RDF version of the record, while http://beta.uniprot.org/uniprot/P04626 or http://beta.uniprot.org/uniprot/P04626.html provides the HTML version of the record and http://beta.uniprot.org/uniprot/P04626.xml provides an XML version of the record. All nice and predictable.

Anyway, all I wanted to do was put this out on my blog in case anyone is looking for the references and happens to stumble across this page.

Cheers all!
Mark

Monday, January 7, 2008

Adventures with a Roomba 540

Okay, Okay, I know this isn't REALLY about semantics, but I just bought a Roomba 540 on the weekend and it has been such a great adventure learning how it "thinks" that I want to share my adventures with everyone!

I have a 650 sqft apartment. This should be a piece of cake for a little vacuum robot that is advertised as being capable of managing four rooms... let's see, shall we?

Day 1: I hit "start". The little robot makes a "the robot is happy!" noise and starts to explore my apartment. It first gets tangled in my television cable wire and spins itself around a few times, but is successful in getting itself untangled - this is as advertised, so well done! It then gets tangled in the speaker wires that are (to my shagrin and frustration) laying across my living room floor because my landlord wont let me cut the carpet to bury them. It, again, untangles itself perfectly and continues on its merry way. It seems that it really does understand what's happening because even after getting twisted and self-rotated in these wires it still "anticipates" walls (if you have never seen these robots work, they clearly have a map of the room in their heads because they will slow-down as they approach a wall such that they nudge it ever-so-gently before turning around. But alas, my Ikea furniture was the death of this little robot... I have two Pella Chairs (http://www.ikea.com/PIAimages/38296_PE130209_S4.jpg) from Ikea. The base of these chairs are, effectively, three wooden planks in a 3/4 square. The robot climbed half-way over the plank, and then was helplessly unbalanced and couldn't get either wheel on the floor to extract itself. It made the "I'm an unhappy little robot" noise, and died. ...Recharge...

Day 2: Same story. Same result. my little robot was helplessly danging on the plank of an Ikea chair. so I lifted it up and moved it 6 inches so that it could get a grip and then restarted it. It made the "happy little robot" sound and ran around cleaning my living room; however somehow it was unable to find its base-station again for charging. It came oh so very close! Within about 6 inches, amusingly... Recharge...

Day 3: This time it seems that it decided to visit the bathroom before fully exploring the living room... and got hung-up on the 2'X3' shag shower mat... unhappy little robot.... Recharge...

Day 4: I decide to tip the Pella chairs over so that the planks are no longer on the floor. THIS time it happily navigates around the chairs and cleans about half of the living room before deciding to wander off and explore the corridor. The corridor leads to my bedroom. It wandered into my bedroom and had a wonderful time in there dashing around under the bed cleaning all sorts of things that I really don't want to know... but it couldn't find it's way out again! Every time it came close to the exit doorway, it hesitated (as if there were a wall there) and turned around and went back in. I watched it do this 7 times!! (my bedroom carpet is now cleaner than it has ever been!). Finally I decided to teach it a lesson and I smaked it's nose every time it turned the wrong way. After a good many smacks (it was absolutely DETERMINED not to go out of the exit!! it would forcefully turn around even when it was half-way out the door!!) I finally got it back into the corridor. Once in the corridor it was happy again, and even anticipated the walls! This surprised me since I must have totally buggered-up it's map my smacking it in the nose so often... nevertheless, it navigated it's way back down the corridor and into the living room. It anticipated the overturned chairs, and then went on to re-program my digital cable box, which is sitting at ground level, and has all of it's buttons exposed to the little robot's bumble-bee-on-a-window bounces as it explores obsticles. I was about to give-up hope when, suddenly, it started to move quite aggressively around a path and directly back to it's charger!! It oriented itself and drove straight into it's charger - perfectly aligned - and stopped for the day!

I suspect that my experience is very similar to what new parents feel when they have to child-proof their apartments... I now know that I have to overturn my chairs (oh, and I have to lift my computer table off of the ground, because it is just a half-centimeter higher than the edges of the robot, and when it hits the underside of the table at full-speed it becomes nicely jammed underneath!)

So I have a half-clean carpet in my living room, a completely clean carpet in my bedroom, and a robot that finally navigated my house without making any "unhappy robot" noises. I think I'm close to having a clean house every day!

Thursday, November 22, 2007

A more complex LSID client

This one merges metadata from multiple endpoints, and then parses it using RDF::Core.


use strict;
use LS::ID;
use LS::Locator;
use RDF::Core::Model::Parser;
use RDF::Core::Storage::Memory;
use RDF::Core::Model;
use RDF::Core::Resource;
use RDF::Core::Literal;
use RDF::Core::Statement;

my $storage = new RDF::Core::Storage::Memory;
my $model = new RDF::Core::Model (Storage => $storage);
my $id = 'urn:lsid:biomoby.org:serviceinstance:bioinfo.icapture.ubc.ca,FASTA2HighestGenericSequenceObject:2006-04-12T18-27-15Z';

# the lines below convert the LSID into a URL since that's what BioMoby
# currently returns in its metadata... sorry!
my $r = 'http://biomoby.org/RESOURCES/MOBY-S/ServiceInstances#';
$id =~ /urn:lsid:biomoby.org:serviceinstance:([^:]+)/;
$r .= $1;

my $lsid = LS::ID->new($id);

my $locator = LS::Locator->new();
my $authority = $locator->resolveAuthority($lsid);
my $resource = $authority->getResource($lsid);
my $locations = $resource->getMetadataLocations;

foreach my $locs(keys %$locations){
foreach my $loc(@{$locations->{$locs}}){

my $data;
eval{$data = $resource->getMetadata(location => $loc);};
next if $@;

my $response_filehandle = $data->response;
my $RDF = join "", <$response_filehandle>;

my %options = (Model => $model,
Source => $RDF,
SourceType => 'string',
BaseURI => "http://www.foo.com/",
BNodePrefix => "genid"
);
my $parser = new RDF::Core::Model::Parser(%options);
$parser->parse;


}
}
print "model contains ", $model->countStmts, " statements\n\n";
my $resource = RDF::Core::Resource->new($r);
my $enumerator = $model->getStmts($resource);
my $statement = $enumerator->getFirst;
while (defined $statement) {
my $s= $statement->getSubject->getLocalValue;
my $o= $statement->getObject;
$o=($o->isLiteral?$o->getValue:$o->getLocalValue);
my $p= $statement->getPredicate->getLocalValue;
print "$s $p $o\n" unless $o =~ /genid/;
$statement = $enumerator->getNext
}
$enumerator->close;

Wednesday, November 14, 2007

LSID Perl libraries submitted to CPAN

Just a quick note to announce that my lead developer, Eddie Kawas, has recently uploaded the Perl LSID stack to CPAN. As such, it is no longer necessary to get the code from the sourceforge Subversion repository. Just start-up CPAN and say "install LS".