Many will point out that much of Sentinel’s technology overlaps already existing products. Copyscape already provides algorithm matching of text, Feedburner already helps detect RSS scraping and Google Alerts can provide automated checking for duplicate content.
It seems, on the surface at least, that much of Sentinel’s functionality has already been filled. However, the potential for Sentinel, and why I am excited about it, isn’t because it can replace those services, but because they can fill in holes that they leave behind.
First off, as I discussed previously, Copyscape is ill-targeted at bloggers. It’s reliance on the Google database gives it only limited usefulness in the rapid-fire world of blogging. Delays in updates to the Google database blunt its effectiveness. Also, since many splogs and scrapers are blacklisted from Google, some of the worst offenders may not show up in Google at all.
Though Sentinel will be limited in that it will only check for plagiarism once every so often, it’s checks will be for the content immediately available, using RSS feeds to pull the latest versions of all blogs. Also, Copyscape searches are not automated and will only provide the top ten results. Considering the high rate of false positives with the service, that could leave the vast majority of misuse undetected unless you pay for the Copysentry service, which only protects ten pages at the most basic level.
(Note: There is no Google API for its blog search tool so, as of yet, no outside service can search through it. There is, however, a Technorati API.)
Google Alerts, while automated, will share many of the same problems. Also, setting up a GA for each blog entry is a time-consuming process that doesn’t mesh well with the nature of blogging. Since the automatic generation of Google Alerts is prohibited at this time, there’s no way to integrate GAs into blogging products. Also, GAs only (reliably) detect full fledged copy and paste jobs and have no ability to detect partial reuse and/or modified content.
Finally, Feedburner, though providing valuable feed statistics and some impressive tools to deal with RSS scraping, has severe limitations. Since it can only detect reuse of your feed, it’s possible for scrapers to grab from other sources, such as Technorati watchlists and your site’s original, unprotected feed, without detection. I’ve noticed at least a few sploggers scrape some or all of my content without Feedburner noticing.
In that regard, Feedburner might be seen as a compliment to Sentinel. Feedburner detects most traditional scraping immediately and Sentinel, hopefully, will be able to pick up the rest.
The bottom line is that, while Sentinel may overlap existing technologies, it also fills gaping holes that they’ve left behind, offering a layer of protection unlike anything seen bloggers have seen before.
In the end, bloggers will have to decide whether or not they want to use Sentinel. However, since the basic version of Sentinel will be free, there will be little reason not to.
If it goes as it appears to be, it will likely service the merely curious, the protectors of copyright and the copyleft crowd alike. Anyone who is remotely interested, for any reason, about where their content is being reused will likely find something to smile about when using Sentinel.
But in terms of pure copyright protection, Sentinel will likely be very hard to beat. both for its brain and for its immediacy. It will be very interesting to see if and how Sentinel affects content reuse, both legitimate and illegitimate, after it is released.
Until then though, anyone who is interested in Sentinel should visit the Blogwerx site and add your email address to their list of potential beta testers.
It should be a very interesting launch.