Global information company Reuters has taken a step that it hopes will leave a big footprint on the development of the semantic web.
With the debut last week of the Calais web service, and the open application programming interface for the service, Reuters’ acquisition last year of tagging platform vendor ClearForest Ltd. is bearing fruit in interesting ways.
Reuters has been working on the changes for a while. AOL Search and AltaVista veteran Gerry Campbell moved to Reuters about a year and a half ago as president of Reuters Search and Content Technologies group, heading up a new team charged with developing new ways to use and surface content to customers.
That vision is being realized as the company extends its internal web service for content-tagging structured and unstructured data (its vast store of corporate information as well as reporters’ stories), based upon ClearForest technology, to the world at large.
“The semantic web at its core is about tagging capabilities being made available, for applications to throw in content and get it back tagged, which helps with the ability to associate other content with it, with navigation, with creating tagged clouds and all kinds of other things,” says Campbell.
With its new service, “we made what used to be a half-million or quarter-million dollar, high-end deployment of enterprise software available for free, with the idea that the more folks use it the better Reuters can associate that content back in, so our customers can see more of what is going on in the world.”
Metadata results are stored centrally and returned to publishers in industry-standard RDF format with a unique identifier.
Reuters is careful to note that it won’t retain any outside content itself. “We’re being respectful of copyrights,” Campbell says.
Broad adoption of this service can help solve a big problem for Reuters, Campbell says.
“When information is out on the web that is of general interest and possibly of a market-moving nature, it doesn’t easily associate into the way our customers look at the world,” he notes.
As more publishers and bloggers begin generating semantic metadata, it will be easier to connect that externally produced information to its own content. Does that raise questions of authenticity and verifiability for Reuters?
“I can tell you this is something we take very seriously,” says Campbell. “It is discussed on many dimensions and we have a pretty well thought-through stance that says we need to protect the integrity of Reuters and our partners’ validated content.
But as long as we identify [external content] as such, we find customers benefit from broader access and more information.
So we just have to make sure it is clear what the source is and how much our brand is attached to it.”
But Campbell says the benefits extend beyond those Reuters will experience. It will extend to the Web community at large, which also profits from the interoperability of content and the development of the semantic web.
“This is a high-value service that we are giving away for free,” he says. Calais eliminates the cost, in dollars and time, of manually creating tags while giving publishers an effective means of metadata transport.
One return on investment might even be helping publishers and bloggers increase advertising revenue yield by enabling greater relevance between tagged banner or text ads and associated terms.
“We’re trying to jumpstart some of the tough parts of getting going [on the semantic web,” he says.
So far, “The technology has been unwieldy, and the applications are unclear, … [but] we’ve got this asset.”
As a step toward spurring adoption among publishers and bloggers, Reuters is sponsoring a contest for application developers to build a Calais-based plug in for Automattic’s WordPress Personal Publishing Platform, that will include automatic blog content scanning; support rich meta-tagging; create and maintain a semantic tag cloud for each blogger to post; and embed the related Calais URI.
“We expect the application developers will come up with things that are broadly deployable, that are easy for people to drop on their pages,” says Campbell. “That’s what we are aiming for.”
Campbell also says there’s a lot of room to grow, based on the extremely powerful ClearForest technology behind the service, which has been simplified for this application to the basics of text and event extraction.
He says that ClearForest’s technology was a leader in what was the first and most heavily developed round of semantic technologies to date.
“There are a lot of companies talking about semantic web technologies,” he says, “but the benefit of [basing the service on an] older and more tried and true technology is there are so many bells and whistles, so we have a great base to extend on as developers take it forward. There is a depth and ability to adapt and explore.”
There are already plans early this year to unveil Release 2, which will increase the robustness of linking back and forth and throughout content; and Release 3, which will add support for more languages.
Release 4 will give developers a richer and more interactive online community environment where they can create and share new semantic extraction capabilities.
“By a few months into this we’ll have a really good idea of what is working and not working, and we’ll throw a lot of horsepower behind that,” says Campbell. “We’ll understand its impact on community, on navigation, on monetization, all those kinds of things.”