Three Strikes & You’re A Splog!

Since Sentinel, when parsing RSS feeds, ignores all punctuation and most extremely short words, it can easily see through most simple text manipulations such as restructuring sentences and introducing false paragraph breaks. However, Blogwerx took things a step further and built in a thesaurus to Sentinel’s algorithm, making it capable of detecting copies that have been rewritten in minor ways and, potentially, even articles that have been “spun” by synonymizing software.

If this works as planned, it will put Sentinel a generation ahead of other plagiarism searching techniques, most of which require the use of a “dumb” search engine that only detects exact matches.

A drawback to the synonym checking feature of Sentinel is that, most likely, it will not be available to users of the free product. Though Blogwerx’s programmers have been able to add the feature to the service without hurting speed, the latest version of the software can process up to ten million feeds per day with the potential for many more as the service expands, the added burden of the service still prohibits it from being freely available at this time.

However, if early signs from Blogwerx are any indication, the paid versions of its service will begin at approximately five dollars a month, making it comparable to Feedburner and the most basic versions of Copysentry, the paid version of Copyscape.

One of the more interesting side features to Sentinel is the ability for users to mark infringing blogs as spam blogs (splogs). After three strikes of confirmed plagiarism, the blog is officially listed as a splog and moved into a database that will be publicly available via an API.

This could, potentially, be used to create applications that work to prevent scraping or aid search engines in blacklisting useless blogs. It can also make an excellent addition to other splog databases, such as SplogSpot, that work to catch all junk blogs but may not spot outright plagiarized blogs.

Blogwerx Sentinel

I rarely get excited about upcoming anti-plagiarism products. Most seem to be overpriced, underfeatured and virtually useless to your average blogger or Webmaster. I also rarely feel the need to tread on the territory of such notable sites as Techcrunch and Mashable by covering upcoming Web 2.0 startups.

However, Blogwerx is an exception to both rules.

Blogwerx’s main product, Sentinel, which is still under development, has the potential to forever change the way blogs detect plagiarism and content theft by automatically checking for duplicate content and reporting its findings back to the site’s creator.

It’s a powerful tool that, if it works as planned, could easily change the plagiarism game for good.

What Sentinel Does

The basic idea behind Sentinel is pretty simple. Take your RSS feed along with archived entries from it, compare that content to that of other RSS feeds available on the Web and point out any large blocks of matching text. This then allows the user, and blog owner, to investigate any similarities between his feed and others for potential misuse.

According to Blogwerx, their matching algorithm works similarly to those used by Copyscape or Turnitin and is able to match partial blocks of text. This not only helps stop plagiarists that steal only a paragraph or two of writing, but also lets you see where you’re being quoted or otherwise legitimately reused. It can also help you see other sites that share quotes and other information that you wrote around, letting you see who’s talking about the similar issues in a way most search engines can not.

These checks will take place at regular intervals. Free users will have to wait the longest, up to two weeks between checks, while paying customers will get more frequent checks based upon their account level. After the checks are done, the user will then be able to view the results and follow up on any similarities that interest them.

In short, it’s a powerful system that has a variety of uses beyond just plagiarism fighting. However, the most interesting and potentially useful feature lies in the search algorithm itself and how the Blogwerx team gave Sentinel more than a pair of eyes, but also a brain.

More On The Copyright Dilemma….

I have continued my discourse with Jonathan Bailey on the topic of copyright issues in relation to blog content- because I believe it to be of crucial importance, especially going into the future…..

Some of the comments made on this topic seem to treat this subject with disdain- but the fact remains, if one is going to put so much time and effort into creating and maintaining a blog, there sure as hell is nothing wrong with checking exactly where one stands as far as your blog’s content is concerned, based on existing laws.

With his permission I am publishing Jonathan‘s email to me-:

“As I see it, there are two cases one could make in favor of keeping Zarrella‘s work on the site. The first is that it the blog was a work of joint authorship. However, for a work to be considered a joint work, the contributions must be inseparable. Clearly, with a blog, that isn’t the case. The fact you can remove one person’s contribution without taking down the whole site shows that.

Second, what Duncan seemed to hit at, was the idea of an implied license. Though some implied licenses do exist in copyright law, they are licenses that only go so far as what is required to use the work in the manner intended. Implied licenses are rarely, if ever, indefinite and are certainly not transferable. The courts limit implied licenses as much as possible. The classic such license is the license to cache a Web page, which is considered an implied license of posting a work on the Web. Without caching, the page would be almost unusable so the courts figure there is an implied license for the user to “copy” the page temporarily to their computer.

It would be an uphill battle trying to show that Zarrella gave an indefinite implied license that could be transferred to third parties via an agreement he was not a part of. A good contract, however, would have handled these potential problems gracefully.

It’s a similar issue that comes up when one posts a comment to a blog. The commenter effectively “owns” the comment though there is an implied license to display the comment. However, most lawyers seem to agree that if a commenter returns later and objects to the comment being there and requests its removal, that is his or her right. This would be a matter for the courts to resolve, most likely after a long, expensive and bitter legal fight, but all of this can be resolved by either

A) Requiring one post their work underneath a CC license when posting or

B) writing another TOS and requiring posters to heed to that.

It directly relates back to the problems with guest bloggers. One thing you need to consider is what you’re paying for when you acquire a blog. If you are actively paying the blogger to write, then its probably that the blog entries would be considered a work for hire. If that’s the case, then copyright of the work reverts to you, the person who paid for it. However, if you aren’t paying specifically for the work and they aren’t under your ongoing employ, there could be problems.”

So, if I’m reading this correctly- if one pays someone to write posts on one’s site and you have a written contract in place, all is hunky dory. But if you’re dreaming up one of those pyramid schemes based on revenue share and you do happen to hit the jackpot, then things could get very nasty indeed….

So How Do You Protect Your Blog Content?

I asked Jonathan Bailey of Plagiarism Today, assuming his expertise, how one does actually go about protecting online content.

For people like myself, who employs bloggers on over 50 blogs and has nothing in writing, his comments are a worry.

To check for plagiarism, I have always used Copyscape- which is quick and easy. Obviously, this only searches for online plagiarism, not offline. To protect copyright- not just for my own stuff, but also on behalf of my clients, I have recommended the service at CopyrightDeposit.com. You can see a sample of one of my certs here (weird- I just noticed that page has a PR3, so I guess I got a backlink at least!). For $13 USD, they will allow you to submit all your files, be they written or graphic, and then notarize the online copyright cert.- which is guaranteed to stay online for your lifetime, plus 50 years (unless someone buys out their host or fails to notify them of your death!). They then allow you further monthly updates of 10 meg of content for free, as long as you maintain the described link between your website and the corresponding copyright certificate.

But Jonathan says that they “appear to be a snake oil salesman” and that “the only solution really is the U.S. Copyright Office“….As not only does it enable you to sue in Federal court for copyright infringement, but it also provides greater proof and legal protection“. He recommends Numly.com as the best solution out there right now- although also points to many new and exciting things on the horizon, which will hopefully make all our lives a little simpler.

As for me, I’m preparing a contract for all my bloggers to sign immediately…

“Zarrella Had The Right To Request Removal….”

The following was written by Jonathan Bailey of Plagiarism Today and is published here on JOAB with his written permission:

There is a problem here. Copyright law is very clear on this matter, it says that there can be no transfer of copyright unless “an instrument of conveyance, or a note or memorandum of the transfer, is in writing and signed by the owner of the rights conveyed or such owner’s duly authorized agent.”

This means that Zarrella, no matter the intent, holds the copyright to the posts he created (copyright automatically reverts to the author) unless there is a written contract saying otherwise.

You can read the full statute here.

I can not say that what Zarrella did was right, I do not know the full story and don’t claim to, but without a written contract it would appear that Zarrella had the right to request removal. After all, they were his copyrighted works and, even if he allowed one person to use them in a oral contract, which is completely non-binding in the area of copyright law, that doesn’t apply to another person and it doesn’t stop him from backing out at any time.

What this whole incident really underscores is the importance of getting these matters in writing. In matters of copyright, without a written contract, there is simply too many ways for things to go sour. In truth, this was resolved rather peacefully.

Though well-versed in copyright law, I am not a lawyer. However, I agree with the decision to remove the works and would encourage you to keep it that way. Not only does it save drama, but Zarrella’s case is not as weak as one might think.