Eric Ververs

Practical use of Semantic Markup formats

 

By on July 30, 2012

Some semantic markup formats that aim at extending the web with structured data just like linked data. These formats can be placed in between the HTML code and can act as metadata some elements on a website. Search engines can for example use these semantic markup formats to determine the position of a website or present a page in their search results using rich snippets.

Rich snippets present extra information in a search result than the ordinary title, page description and website URL. For example, user reviews geographic location, dates and images. Figure 1 presents an example of a rich snippet.

Rich snippets grow in popularity since the introduction in 2009. Websites that present reviews, people, products, businesses, recipes, shopping, fitness, events, music, entertainment, job search and education can use semantic markup formats to attempt to get rich snippets for their website. The major search engine Google supports three types of semantic markup formats at the moment, i.e., RDFa, Microformats and Microdata.

Rich Snippet Example
Figure 1: Rich snippet example

RDFa uses standard XHTML elements like and and apply them on all other elements, so the entire page is filled with semantic information. With a mapping it is possible extract RDF triples from the RDFa annotated page. RDFa triples are pieces of information that always contain a subject, predicate and object. For example, the dbpedia page about Amsterdam has a property called title and the value of the property is Amsterdam:

  • "Amsterdam"

Microformats specify class names and rel attributes to be used in between HTML code in order to give content a specified meaning. For example, to specify vote links, personal information, vCards, iCal events, friendship relations and tags. When HTML5 came out there was a discussion on how to incorporate RDFa into HTML5. The conclusion was that RDFa was not suited for this purpose. That's why a new format needed to fill this gap. As a result, Microdata was introduced.

Business impact of Rich Snippets

Improving how search results are displayed can have a profound effect on businesses. Prior case studies (Boston.com and Monster.com) stated that search results of search engines showing extra semantic data in the form of rich snippets have a higher click-through-rate. Having rich snippets can greatly impact the sales of a company. Rich snippets cause websites to stand out over other websites, thus drawing more visitors to that website. These visitors can in turn convert for example to a sale. A rich snippet often outperforms a regular snippet that is a few positions higher in search engines. This proves that rich snippets can have a high impact.

By incorporating semantic markup formats in a website. Businesses don't just make that extra information available to search engines, but also to other services that support the same standard. Semantic markup formats are relatively new, but already show great promise for the future. In 2009, Best Buy implemented RDFa in to their website. They noted an improved search position in Google. The traffic to BestBuy store pages (the pages that included RDFa) increased by 30%. In the Yahoo! Search engine, there was a 15% higher click-through-rate (CTR). In short, these results are substantial and prove that RDFa and other semantic markup formats achieving the same results have great commercial promise. More about the Best Buy case study can be found in the article how best buy is using the semantic web. Google says that Rich Snippets can improve the efficiency of metadata harvesting and interlinking, which has great business impact. What this business impact is and how big it is unclear.

RDFa

RDFa, a standard of the W3C, has the vision of 'Bridging the Human and Data Webs.' The idea is to provide attributes and processing rules for embedding RDF (and all of its graph-based, Linked Data goodness) in HTML. With all that expressive power comes some difficulty, and implementing RDFa has proven to be overly complex for most Web developers. Google has supported RDFa in some fashion since 2009, and over that time has discovered a large error rate in the application of RDFa by webmasters.

Pros of using RDFa:

  • Publishers are independent and each website is allowed to use their own standards;
  • Supported and developed by W3C;
  • Works with the Don't Repeat Yourself (DRY) principle.

Cons of using RDFa:

  • Large error rate of RDFa by web developers because of complexity;
  • Not widely used by websites;
  • Introduces new attributes, thus increasing the learning curve;
  • Is likely to be replaced entirely by Microdata in the future;
  • Will only work with XHTML 1.1 and XHTML 2 for syntactically valid code;
  • Unlimited amount of namespaces, but the support for these namespaces varies per search engines.

In conclusion, RDFa was the oldest semantic markup format that is available. It is very extensive and complex, but when implemented correctly can have great benefits for companies. However, it is likely to be become less unsupported in the future. For these reasons, we don't see a great business value in the long run.

Microformats

Microformats is one of the earliest efforts to provide 'a general approach of using visible HTML markup to publish structured data to the Web.' Some Microformat specifications like hCard, hCalendar, and rel-license are used in common across the Web.

Pros of using Microformats:

  • Easy to implement because it uses existing HTML semantics;
  • Widely supported by (X)HTML versions;
  • Widely used by web developers;
  • Works with the Don't Repeat Yourself (DRY) principle.

Cons of using Microformats:

  • Limited in adding semantic meaning to data compared to the other two formats, but usually it will suffice in most cases;
  • Not standardized.

In conclusion, Microformats is a widely used and supported semantic markup format. It is relatively easy to learn. It is however limited in the semantic meaning it can add to a page. There is also no standardized vocabulary available for microformats. That's why constant review is needed to see if the used attributes are still valid. It can be of great value for businesses and is likely to be supported by search engines for quite some time. Because it is easy to implement, low costs have to be made to add microformats to a website. For that reason, this method is suitable for SMEs.

Microdata

HTML5 brought up a lot of changes in Web authoring. One of the new changes was Microdata. HTML5 allows web developers to define sections on a page, for example, header, footer, navigation menu, article, etc. However, it does not allow web developers to describe what the HTML5 document is about. This is where Microdata comes into play. With Microdata, it is possible to add meaning to parts of a HTML5 document.

Pros of using Microdata:

  • Simple compared to RDFa;
  • Most elaborate and practical for businesses;
  • Preferred method by Google;
  • Works with Schema.org, which is a vocabulary that all major search engines (Google, Bing, and Yahoo!) support.

Cons of using Microdata:

  • Only works with HTML5, which is not yet widely supported by all browsers;
  • Unlimited amount of namespaces, but the support for these namespaces varies per search engines.

Schema.org is a collaboration project by Google, Microsoft, and Yahoo!. The goal of this project is to help webmaster to use structured data that is supported by the major search engines. Users have to link to the relevant page on the schema.org website and mention the property they want to use.

In conclusion, Microdata is the way to go for businesses. It combines the structured and elaborate vocabularies of RDFa, and the simplicity of Miroformats. Google advises to use this format, which is a good indicator that it has a lot of future potential and will be widely supported. HTML5 is not supported entirely by web browsers, but in practice this often impacts the appearance of a website slightly.

Semantic markup formats in practice

Up till now, we only spoke about semantic markup formats by looking at the literature. But how popular is the use of these formats on websites in practice? That is why we did a survey among 48 price comparison websites. We chose for this kind of website because they can achieve great value in applying these formats.

Survey results
Figure 2: Survey results

Out of 48 comparison websites we found that 17% used microdata, 14,5% use microformats, while no website appeared to be using RDFa for their pricing pages (figure 2). There was only a small minority (6%) that used the microdata and microformats in conjunction. These were generally the most popular websites in the dataset according to Alexa Traffic Rank. It was also found that a in 7% of the cases the microdata using websites did not provide a specific product page but immediately redirected the crawler to the actual web shop selling the product, which in most cases meant that there was no need to use semantic markup data.

When looking into detailed usage statistics we are able to derive that microformats in almost 60% of the cases was used to display the rel="me" (attribute="value" structure) to consolidate their identities. This was always applied in unison with Facebook and Twitter links.

The microdata that was used mainly expressed reviews/rating, lowest prices, currencies and product information (url, title, image and description). This is inherently what one would expect to notice given the nature of the websites that were investigated.

The RDFa usage was strictly confined to Facebook open graph conventions. However, this was excluded from the results. After cleaning up the data it became clear that RDFa was not used in the total data set.

Having semantic markup formats does not provide a guarantee of getting picked up by the major search engines. Therefore, the crawled findings were manually checked to see whether they actually resulted in enriched search engine results while taking into account that potential recent changes might not have been picked up yet by the search engines. In order to check for rich snippets the "inurl:{domain.com} prices" query was used, where the {domain} was replaced by the actual domain.

Conclusion

If we look at our initial question we can clearly say that all three formats are theoretically suitable for businesses. All three formats support at least the basic semantic data that businesses want to express like products, organization information, persons and reviews. The differences start to become visible when we look at the future prospects and implementation costs of the semantic markup formats.

In that perspective, RDFa clearly is the least favorable option. It is already been replaced by Microdata from HTML5 and onwards. That's why it is likely to become obsolete in the future. Combined with the complexity and implementation costs that are higher as a result of this, we conclude that RDFa contains the least opportunities for business.

Microformats is showing great potential, and appears to be the mainstream solution at the moment. But the limited applications and the fact that there is no standardized vocabulary could be an indicator that it is not going to last forever. It is however easy to implement, making it suitable for businesses with a low budget in combination with a website not using HTML5.

Microdata is the most favorable format available at the moment. Search engines endorse this format and it has a lot of business advantages. It is easy to implement and shows the best future potential. Earlier research shows that it is the largest growing format, showing a significant increase in the absolute development indices of microdata application between 2010 and 2012.

In order to answer "To what degree do websites offering price comparison services use opportunities that originate from each type of format?" we have looked at the usage of semantic markup formats for search engines. The most interesting format microdata was used the most, but not a whole lot. Even if we take out the applications in where there is no price comparison page (in 14% of the cases users are immediately redirected to a webshop), we still see a lot of websites not using the format. Considering the way Best Buy is achieving business improvement, there is a lot of opportunity because in only 5% of the dataset rich snippets for microdata - the most promising semantic markup format - appeared in the search engine for displaying price, company, review or product information. At least one or more of these are needed to fully benefit from using semantic markup data, now and in the future. This is especially true for price comparison services, web shops and company information pages.

Having said that, implementation of rich snippets need to become a bigger priority for business as they seem to have the idea that it's something one can do in the spirit of perfection, because its benefits are small. While in fact, search engines are continuously increasing their usage of semantic markup data, and already today, we see sizable benefits from successful implementation cases. Nonetheless, there have not been that much measurements and good business cases to draw decision support information from. Which supports our notion that more evidence based research is needed.

Twitter link share button Facebook Share Button Google+ Share Button