Microformats and the Semantic Web
Currently, the code that is used to construct a web page in (X)HTML contains very little meaning about the information written on the page. Although the term ‘semantic markup’ is often mentioned in relation to web standards, the only semantic information we will find on a web page relates to document structure: HTML tags identify specific portions of text as being a ‘header’, a ‘paragraph’, a ‘list-item’, etc. And when we go to a search engine, this kind of meta-data is of no use. If I want to look for lawyers in Newcastle, for example, I might enter the words ‘lawyer’ and ‘newcastle’ into Google and get a range of types of results including pages about jobs for lawyers in Newcastle, law schools in Newcastle, and so on. And if I want to find something more complex such as ‘positive reviews of the Ritz hotel written by Americans last year’, then the amount of human filtering increases dramatically and using a search engine becomes a real problem
Initiatives such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL) are proposals for adding additional structures to a web page that enable us to encode semantic information about what we are writing. RDF is quite an established format but has a reputation of being a bit complex and tricky to implement. Microformats, on the other hand, are a way of using the (X)HTML we have today and adding some level of real-world meta-data in specific areas. One strength of microformats is that they work invisibly within the existing (X)HTML format, so we can have a web page that looks and works exactly like any other web page most of the time. Then, when a program that understands microformats comes along, it is suddenly able to extract and make use of the extra level of semantic information. Another key strength of microformats is that, being of limited scope, they are much easier to write.
Microformats rely on a standardised use of class names applied to HTML elements, allowing the content of these elements to be parsed by a robot. There are currently defined microformat standards for people and organisations (hCard), events (hCalendar) and draft specifications for types such as resumes (hResume) and reviews (hReview). As an example, if you were to mark up a review of the Ritz Hotel by George Clooney using the hReview format, you might have something like this:
<div class="hreview">
<h2 class="summary">I Love <span class="item fn">The Ritz Hotel</span></h2>
<h4><span class="rating">5</span> out of 5 stars</h4>
<p>Reviewer: <span class=”reviewer fn”>George Clooney</span> – <abbr title=”20050228T2300-0700″ class=”dtreviewed”>February 28, 2005</abbr></p>
<p class=”description”>I really liked staying here. The beds were soft and so was the butter.</p>
</div>
We might also assume that, perhaps on another webpage, there is an hCard for Mr Clooney that records his contact information, including his country of residence. Given this kind of data it becomes possible to accurately search for things like ‘positive reviews of the Ritz Hotel by Americans in 2005’ and not have to filter the results at all.
It is clear that there is a bit of a buzz about Microformats at the moment. For example, you can see people wearing ‘Microformat tee-shirts’ at conferences and web-standards meeting-ups. Taking the long-term view, I’m not sure exactly how they compare with other frameworks like RDF, except that they are far more limited in scope and probably a lot easier to implement. In a way, I suppose you could look at the Microformat concept as a bit of a bodge, as opposed to a truly scalable solution to having ontologies for the web. But on the other hand, as long as we can parse microformats and thereby translate them into other frameworks further down the road if we need to, this should not be too much of a problem. And in the meantime we should start to see real-world applications and services that begin to harness the power of machine-readable semantic information.

It’s been pointed out to me that microformats aren’t really about improving search engines, but more about exchanging and syndicating data. This is very true and I think I got a bit carried away with the ‘Mr Clooney at the Ritz’ example (I guess I was really trying to stress the difference between human and machine-readable data with this example).
Anyway, so a more practical scenario for making use of the hReview microformat, for example, would be that review sites can scour the web picking up reviews, and so your own hotel review published on your own blog could then be syndicated to those sites.
Ben at 1:54 pm on 25 Aug 06
Тут боты ведут дискуссию? Привет ботам от человека!
derwaywherm at 6:02 am on 9 Oct 09