Wednesday, April 18, 2007

My shortcut into Taxonomy in W3C XML

pencil icon, that"s clickable to start editing the post

Full of energy i ran at full speed into the RDF specs, oh boy it's solid! Even with the best of intentions they are to heavy for me, to I though about another route into this mountain of knowledge that I've never climbed before. I realized that I needed examples that I could work with and also a tool to get my own examples going. The tool part was easy since I've known about protégé - a free, open source ontology editor and knowledge-base framework) from Stanford. I downloaded a copy of the current beta and started it up and wanted to create a small taxonomy. Uugh, for a first-timer it's big and all the terms used doesn't give any (relevant) special meaning to me. Properties must part of the triplet, but the classes do i need them? Okay, i must find some examples and went for a taxonomy since i figured that taxonomy are one of the simplest things in the semweb world. I found some on the UN website, but I didn't find any in RDF or something alike. Back to searching and then I found nice little example by Styrheim called Taxonomies in OWL. In this article the example is based on the taxonomy of the Germanic languages, and it's shown in three variants:

When I loaded this example into Protege it didn't look right, since 'Danish' was at the top level. After taking a second look i found an extra whitespace in the definition of Danish with rdf:ID="Danish ".

The example is exactly so small that it's easy to grasp and still contains the needed complexity to give an idea of how it works. I'm sure that there are tons of good examples out there, but the main point here was that I was looking for an example for a Taxonomy in RDF but found that it needed RDFS/OWL. Does it have to be like that? I just thought of a statement that would sound like:

Danish is a kind of Continental North Germanic

which is a triplet, that could be expressed in RDF, but with my private/custom property, so in that sence it's better to use a standard property of RDFS and express it all in OWL, which also deems them as classes.


Jan Egil Kristiansen said...

Think I just got rid of that space character. Thank you!

But I have fallen into waiting position for RDF, only use the more pragmatic tags these days.

1) Google & Yahoo! does not read RDFs in any useful way.
2) I really need URIs for real world entities, e.g. ships, but I don't like interpreting a web page's URL as the ship's URI.

Sweetxml said...

Hey Jan Egil Kristiansen

I'm the one thanking.

As you can see I'm a newbie into RDF and regarding the URI stuff, I agree that we need some common schemes. All the examples are well just examples, but when I created my first example I had to choose a way to identify myself, and thought of using my LinkedIn profile and had I been younger it might have been myspace, but the point is that I felt like fumbling in the dark.
I'd also like the URI/URL's to be pretty since it's not just machines that's gone be using them for a long time.

Could you elaborate on your view on RDF and the missing usage? Could GRDDL change that?

Best regards

Jan Egil Kristiansen said...

I'm a total pessimist. The key to publisher usage is search engine usage. If people used RDF, Google's job would become much easier. If Google told webmasters to use RDF, 10% of sites would have some RDF within 4 weeks.

The problem is - Google does not want that, because it also makes it easier for G's competition. Google's strength is now raw computing power - nobody can match them there. And no other company can persuade a common webmaster to use RDF. (Similar conspiracy theory: M$ promoting bad HTML, because Opera&Mozilla have limited manpower, and can't afford wasting time on code to clean up dirty markup)

So unless some non-profit RDF search engine enters the equation (EU sponsored?), RDF will not happen on a global scale. Maybe on small domains with a limited but very enthusiastic population. GRDDL won't matter.

Thus I'm still a newbie, even if my article is a couple of years old.

styrheim said...

I don't know any GRDDL.

But I have tried to spice up my contact info with RDF. The result is an invalid xhtml document.

Now, the people who want correct syntax are the same people who want machine readable semantics, so this is a problem. Does GRDDL offer any solution to this?

I am not going to write a new XSD for each extension I make. (Left as an exercise for the student...) What I might need, is validation in two steps:

1) well-formed XML
2) valid XHTML if all tags and attributes from other namespaces are ignored. (Content of foreign tags should still be validated.)

Sweetxml said...

Hi Styrheim

I read your former comment with great interest, and I'm afraid that you're all to right about your forecast. Please excuse me that I didn't get back to you after you answered my question.

I've only had a quick glance af GRRDL, so I can't help you that much. The idea that I got of it, is that you could use some of the micro formats (watch out, I'm on thin ice here) and then write a XSL-stylesheet that could transform the valid XHTML into the RDF that you've embedded right now, based on the micro format structure/attributes. This is all very Tim Bray like, a not really my cup of tea, but what I see as a hack doesn't necessary make i so :-).
I'm not sure what you could use it for right now, since I haven't heard of any GRDDL agents yet, but there'll probably be one as a plug in for Firefox first. As for the validity of the XHTML I've never really digged deep enough to find what is valid, but if the browser (agent) processing it ignores it, could be categorised as something else that validating.

Lots of loose ends to spend more time on :-)

Once more thank you for your comments.

Best regards, Brian