In preparation for attending the second annual Conference on Semantic Web in Healthcare and Life Science (CSHALS 09) next week, I have been reviewing my notes from last year's conference. As I twittered yesterday, my favorite line last year was from conference chair Eric Neumann (herewith properly attributed, unconstrained by the 140 character twitter limit): “What do we mean by semantics?”
The semantic web is all about providing meaning that computers can read (or “understand”, to the level that they can infer new knowledge from old). Using one of the foundational standards, RDF, each semantic web statement is called a “triple” since it is constructed of subject-verb-object triple. Examples would be “Venturecyclist is-a blog” and “Venturecylist is-authored-by Richard-Dale” and “Richard-Dale is-a Venture-Capitalist.” From this, the right kind automated inference engine could calculate “Venturecyclist is-authored-by a-venture-capitalist.” As you can see, writing even simple knowledge in this form is verbose (lots of triples for even simple statements of knowledge). When there are enough triples (millions, or even billions), then automated inferencing does actually produce interesting and novel information, as reported in Scientific American, for example.
Eric Neumann also provided a wonderful definition of the goal of the semantic web community, semantic interoperability: automated interoperating systems where a consumer of semantic web content understands all the needed context and meaning of the content, whatever the use of the content, even though the publisher knows nothing of the use case of the recipient.
My own extension to this is that the publisher should self-certify which ontologies and authorities they are using in publishing the content, and there should be an annotation system available (like link-backs), which allows for anyone to add their own semantic annotations. Self-annotation might include provenance of a triple (where it came from, source, current level of confidence in its truth etc).
In the healthcare arena, semantic web technologies allow for combining clinical data about a patient (Patient blood-sugar-level-is X) with standards of care (Blood-sugar-level-threshold is Y) and (Insulin treats blood-sugar-above-threshold). As life science researchers in the lab discover new knowledge about genes, proteins, pathways and treatments, then this can be matched with clinical data about patients in semantically interoperating systems. This allows for further clinical data about populations to inform further research and also, as implied above, for prescribing evidence based treatments.
This may all seem a little esoteric; it is, and it is still a series of commercially untested technologies (see yesterday's post on betting the bank). However, consider how easy it is to contribute photos and more to Google Earch. KML files are used by Google Earth (and other apps) to provide geographic meaning to anything from photos to facts. KML may not, strictly, be expressed in triples, but it provides an interesting example where real meaning (in this case, of geography) is available in machine-readable form, usable by any application that wants it. Imagine Google Body or Google DNA or Google Heart, where all the lab and medical knowledge can be combined in a model of our body or our DNA or our heart, the same way anyone can contribute new material for Google Earth using KML. As a triple, my conclusion would be:
--> (That-idea has value).