Protecting Your Blog Content

I’m a big fan of good writers. I especially admire bloggers who write exceptionally well. I mean, with all the crap out there that people try to pass of as writing, you could easily distinguish the good from the bad. Writing on a free, open platform such as a blog does not give one the license to throw grammar school lessons out the window.

Yes, some people are lazy, and write gibberish for all the world to read. Okay, they’re forgiveable. But when there are some people who are lazy and steal their way to riches, then that’s probably the time to ring the church bells and round up the people with their knives and pitchforks.

I didn’t realize blog content theft was so rampant until I visited the Stop Bitacle.Org blog.

The people behind bitacle.org steal content from other’s weblogs and place it on their own website. Their practices are criminal and/or abusive, because these people violate the copyrights on the original content, of their holders. Not only copyrights are violated, licenses such as those of the Creative Commons are not respected as well.

Stolen content from weblogs is placed on bitacle.org’s website, between commercial messages for which the people behind binacle.org are being paid for by advertisers. At this moment Google places commercial messages on bitacle.org, but this company is requested to reject bitacle.org as their client, because of bitacle.org’s criminal/abusive behaviour.

DMCA? They’re in Spain, for crying out loud. So a DMCA won’t be their silver bullet. There are other–legal–ways to fight back, like writing Google to cancel their AdSense accounts, or inserting notices on your blog posts (so people reading their site would know it’s your content. But it seems Bitacle is just the tip of the splog iceberg. Just checking my referrer stats, I come to stumble upon backlinks from doubtful sources. Guess what I see when I visit their site? My content, with AdSense splattered all over.

Lazy people. Tsk!

[tags]plagiarism,copyright,content theft,pirate bay[/tags]

Can a 3rd party issue a DMCA?


As promised, I’m following up my previous post on the DMCA- trying to make sense out of the madness, as I perceive it. So I asked Jonathan Bailey of Plagiarism Today a question regarding my main concern-:

Are 3rd parties like this allowed to write to a host and request such drastic action without going through some sort of formal procedures? Otherwise anyone can just do this as an act of sabotage?

Here is his answer to this-:

“The honest answer to your question is yes, no and maybe.

The DMCA itself only allows two groups of people to file a notice. The copyright holder or a “designated agent” to act on their behalf. That agent is usually an attorney that has a legally signed document declaring them to be an agent on file.

So yes, it is possible for a third party to file a notice- but only with a valid contract to do so. Being a designated copyright agent is a fairly big deal and not something you can assume you have.

The exception to the rule is if your site happens to be in the EU. The EU has a similar notice and takedown provision to the U.S. but there is no specific requirement as to who can file the notice. Theoretically at least, anyone, even a perfect stranger, can file the
notice.

That, potentially, makes situations like this dangerous in the EU. You and I can reach a pact to allow reuse of some of my work, someone else notice the infringement and then file a complaint with your host, getting the work removed. This hurts both of us.

The only way I know to guard against that is add a tag line saying that it is “used with permission from” and then give the site name.

Still, even that is no guarantee.

This is something that the EU is going to have to work out.”

For those of you with sites in the EU that this may concern, I followed up with another question to Jonathan-:

When you say “in the EU”- would that mean where the site is hosted or the location of the domain registrant? And doesn’t this get confused with hosting resellers being located outside the US, while the main server is in the United States?

To which he responded-:

“EU deals with the host. The domain registrant has nothing to do with it.

Ponder a scenario here. If a Russian plagiarizes an Australian author but uses a U.S. Web host, the Australian man would use a DMCA notice to get the work removed. Similarly, if the Russian chose a host located in the EU, the Australian would go through those procedures.

It’s a matter of where the data is stored physical and which country “owns” the server. It is interesting when you get to matters of collocation, which can put a single site across many different countries, but in those cases you focus on the main one.

ThePirateBay has used that rub to keep their site alive, despite multiple copyright threats.”

This answer from Jonathan far from put my mind at rest!!

The laws in the US need reviewing now and the EU has to attend to their chaotic interpretations even more urgently.

Watch this space.

DMCA Madness

dmca2

I tried to resist writing this. Really, I did. But the ramifications are just too great. Most of you will know the amount of time and effort that goes into building and sustaining a website- especially a blog, and the thought that it can all be taken away by some random absurdity is too much to bear. Welcome to DMCA Madness.

This all started when we purchased JOAB from David Krug last July. There was a running dispute between him and Dan Zarrella about the site’s ownership- and, more relevant to us, the content on the blog we had just purchased. The upshot was that Dan Zarrella threatened us with a DMCA if we didn’t remove all content written by him and we did a post “What’s the deal with a DMCA?” exploring the consequences as far such a threat was concerned.

One upside to that episode was gaining valuable insights from an expert in this area- Jonathan Bailey from Plagiarism Today. Apart from leaving a lot of useful comments, emailing me thorough explanations (& advising that we did indeed take need to take down Dan Zarrella’s content), he also authorized us to use his content on some new anti-plagiarism software coming onto the market called Blogwerx. These included posts like “The Need For Sentinel” and were posted under Jonathan’s own username “copyspy”, which was a link back to Plagiarism Today.

All well and good. Water under the bridge. And then what? Unbelievably, out of the blue- we get an email from Blogwerx:

There are several blog posts that seem to have the exact same content as other locations. This content has been scraped and I would ask that you take it down in accordance with the DMCA.

Okay…fair enough…you think that they might have been pleased with the plug, but they probably don’t know that we had Jonathan Bailey’s permission- so we’ll just write back to them and let them know. No harm done.

But here is where the madness sets in. We get another email from Blogwerx, minutes later (allowing no time for the “personal request” or “warning” to be responded to)- but this time it’s to our email at imandhost.com -:

All of the content located on www.jackofallblogs.com has been scraped from other locations. This is an infringement of US copyright law and the DMCA. I would ask that this site be removed from hosting or the entire account for this user be removed. If you need further information please feel free to contact me.

Notice the change in language. From “several posts”, we now have “all the content”. And “locations” in the plural…

So let’s just recap. Having received permission from an author to publish his content reviewing a particular product, we receive a request from the developer of this product to take the content down as it is “scraped”. And allowing zero time for a response to clear the matter up, this third party- who is not the owner of the content and has no rights or claim to it, is writing to our server- not only asking that this blog be taken down, but for our whole user account (what- 200 sites?) be removed?

Seriously dangerous stuff as a precedent. As it turned out, the matter was resolved quickly and amicably. Jonathan Bailey confirmed with the people at Blogwerx that he had given his permission for us to publish his content. He also confirmed that he had no knowledge of their threat and that he had not instructed them to act on his behalf in such matters. And kudos to Blogwerx, they came back with a sincere and genuine apology. So no hard feelings there.

Still, it completely freaked me out that, as a precedent, a third party- who does not own or have any rights to the content in question, does not act for or have the permission of the content’s owner- can issue a DMCA threat to a server (which must, by law, be acted upon) and that this be within the boundaries of the current laws governing the internet. Where the hell is the common sense there?

So I’m going to be revisiting this matter in the coming days to clarify exactly where one stands with this DMCA madness- as I’m sure that you, like me, would like to protect yourselves not just from plagiarism, but also baseless claims against your website which can, regardless of the merits, cause you a great deal of trouble.

Thoughts On Piracy

Reading Dread Pirate Yarr‘s articles here always makes me think about Pirates of the Carribean. I loved that movie and its sequel. I dig the great action scenes, the story twists and of course that charming Keira Knightley girl (who doesn’t?). There’s something about Pirates that the entertainment industry has romanticized. And to some extent, there is something about being a pirate of that kind appealing. Even one of my favorite literary characters, the Count of Monte Cristo, had dealings with pirates in his time.

But there’s another kind of piracy today that can be considered a real pain in the ass. And that’s piracy of software and piracy of content (a.k.a. copyright infringement).

Software companies are taking a hardline stance against piracy. But try as they may, pirates still get to find ways to work around copyright protection schemes. Microsoft, for example, has tried time and again to enforce restrictions on Windows, but each attempt has been foiled by patches that can be applied in 30 seconds and serial-numbers that are easily obtained from the Web. Music labels have been campaigning against music sharing, to the extent of suing everyone and his uncle for downloading “free” music online, and locking down their digital music such that people can only play them on a limited set of devices.

I hear Windows Vista will be so protected that the moment Microsoft detects you’re using a pirated copy, you’ll lose your OS’s functionality a bit at a time (like being automatically logged off after 60 minutes, or losing the ability to print a document, and the like).

The losers here in the end are the users.

Software makers keep prices high to compensate for losses, and this leads to users turning to bootlegged versions to save. Music sits put in too much copy restriction, and users will just find the “free” versions online so they can use it in more than one MP3 player or computer.

I think the best way to fight piracy of this kind is to look for alternative business models, like how some companies offer their products or services for free but ad-supported, or like how websites offer applications for free online, but with some ads. Therefore, even if a piece of software or content is distributed and redistributed, the author does not lose anything. In fact, the author (and the advertiser) gain with the advertising getting better mileage.

It’s not pretty, but I think it’s a good way to go.

Is Nothing Sacred?

What’s your privacy worth to you?

Let’s start further back: does your personal information have a value? Sure it does, or you wouldn’t protect it. You wouldn’t want another unauthorized person to have copies of your social security number, your drivers license, your home address.

How about your sex life?

How do you feel about people knowing everything about it? Some people, like me, really don’t care. But others are a trifle uncomfortable with it.

Here’s the thing: while everyone seems to get upset about the government keeping tabs on us in any way (which I don’t get too upset about – nothing to hide really), we don’t seem to be paying much attention to how fellow net-users might be able to abuse our privacy.

Last week, an unpleasant little fellow named Jason Fortuny decided that he’d have a little fun on Craig’s List by posing as a woman seeking sex from men. I won’t go more into detail because it gets pretty explicit.

After getting close to two hundred replies within 24 hours, he posted each and every one, with all personal details including addresses and phone numbers and email addresses, and with the explicit pictures some men sent, on another site and advertised his remarkable feat.

Now, I really don’t care what these men wanted or thought they’d get out of their shallow relationships. I also don’t want to know who they were, what they did, or what their anatomy looked like. I worry about the fact that a lot of people loved getting this information.

I also worry that there doesn’t seem to be a law against this, certainly not a clear one.

Which brings me to a topic I’ve obsessed about for the last ten years: who owns your information?

Credit card companies can easily pass your information to Equifax and other companies that subsequently sell your information to companies interested in your credit rating.

Criss-cross directories and the Yellow Pages, hundreds of information-providing companies, your own bank and grocery store, may be in the business of selling your information to others.

In some cases, outright criminals do whatever they can to get your information.

Is it right that your personal data should be bringing others money and/or amusement?

Who really owns your data? And should you know whenever anyone receives it, and for what purpose it’s being used? Is it an invasion of your privacy if someone sells pictures of you on the beach, or your old drivers license picture? We’re clear on public record data being accessible to all, but what about data you give with a reasonable expectation of exclusivity? Who owns that data?

I think we own our own information. And I also think anyone using it, selling it, or giving it away should be required by law to at least notify us.

This, much more than the government’s dabbling with Big Brotherism, worries me. I don’t trust the government totally. I trust corporations a whole lot less. And I really don’t trust malicious liars.

For now, our best bet: hold our personal information close. Once it’s out there, you can’t reel it back in.

Why Monetization Can Be Difficult

We all love this concept of “Web 2.0″ don’t we? Come on, even if the buzzword is overused, I am amazed by how much the Web today has empowered the user by letting individuals upload, publish, and showcase their content online.

And it’s not only about blogs, but podcasts and videocasts are starting to rise in popularity. Sure, it’s not as simple as blogging (given the costs and effort needed in producing audio and video content). But there are a multitude of tools today that let users upload their recordings and video for sharing with the rest of the world.

With most people, writing content for posting online would be all about self-expression. If one is passionate enough about writing (or speaking or making videos), then the fact that you’ve shared your art with the rest of the world is satisfaction enough. More so, if you actually get to have an audience!

However, it gets interesting when it comes posting about your interests online. You can be a fan of a certain actor or a certain show, and you might want to post videos online, such as on YouTube. Or you can be a blogger, and you might want to republish photos, images or snippets of text from another blogger. A big issue here is copyright.

Usually, it is within fair use to repost or link to others’ work on your own blog if it’s for personal purposes. A lot of people use creative commons licences, and many of these licenses allow derivative works and republishing as long as there is adequate citation. Even reposting parts of works with closed copyrights are generally still within fair use, as long as there is a citation, and as long as the original author’s work is not prejudiced.

But when it gets commercial, that’s when things get a bit messed up. When you infringe on a person’s capacity to earn from his work, or when you earn from another person’s work without giving his fair share, then definitely something is wrong. And this is one reason why you cannot always expect earning online to be easy. Once you go “pro” you would have to worry about all those copyright issues that can come about.

One example is YouTube. Right now, they don’t have a solid business model, but they’re the most popular video sharing site around. A lot of copyrighted material has been posted there, and most of the time, the copyright owners don’t mind, since it gives them exposure and potentially enhances their business. But once, actually- when, YouTube decides to cash in, they might have a bit of trouble when it comes to intellectual property.

As Jason Calacanis puts it:

The second YouTube starts putting ads in front of content they are gonna get sued. Howard Stern was talking about YouTube today and was upset that they had his stuff up there. The reason YouTube has dodged a bullet to date is because of three factors: low quality video, lack of monetization on their part, and the 10 minute length. They put a pre-roll ad in front of Lazy Sunday or the Emmys (which I watched on YouTube exclusively!) and they are done–like done, done.

The moment you start to monetize, people will be asking for their fair share of the cake. Lawsuits (or threatening emails, at least) will start pouring in. You’ll start spiralling down DMCA hell.

Fair use? When it comes to monetization, “fair” means $$$$$.

The Need For Sentinel

Many will point out that much of Sentinel’s technology overlaps already existing products. Copyscape already provides algorithm matching of text, Feedburner already helps detect RSS scraping and Google Alerts can provide automated checking for duplicate content.

It seems, on the surface at least, that much of Sentinel’s functionality has already been filled. However, the potential for Sentinel, and why I am excited about it, isn’t because it can replace those services, but because they can fill in holes that they leave behind.

First off, as I discussed previously, Copyscape is ill-targeted at bloggers. It’s reliance on the Google database gives it only limited usefulness in the rapid-fire world of blogging. Delays in updates to the Google database blunt its effectiveness. Also, since many splogs and scrapers are blacklisted from Google, some of the worst offenders may not show up in Google at all.

Though Sentinel will be limited in that it will only check for plagiarism once every so often, it’s checks will be for the content immediately available, using RSS feeds to pull the latest versions of all blogs. Also, Copyscape searches are not automated and will only provide the top ten results. Considering the high rate of false positives with the service, that could leave the vast majority of misuse undetected unless you pay for the Copysentry service, which only protects ten pages at the most basic level.

(Note: There is no Google API for its blog search tool so, as of yet, no outside service can search through it. There is, however, a Technorati API.)

Google Alerts, while automated, will share many of the same problems. Also, setting up a GA for each blog entry is a time-consuming process that doesn’t mesh well with the nature of blogging. Since the automatic generation of Google Alerts is prohibited at this time, there’s no way to integrate GAs into blogging products. Also, GAs only (reliably) detect full fledged copy and paste jobs and have no ability to detect partial reuse and/or modified content.

Finally, Feedburner, though providing valuable feed statistics and some impressive tools to deal with RSS scraping, has severe limitations. Since it can only detect reuse of your feed, it’s possible for scrapers to grab from other sources, such as Technorati watchlists and your site’s original, unprotected feed, without detection. I’ve noticed at least a few sploggers scrape some or all of my content without Feedburner noticing.

In that regard, Feedburner might be seen as a compliment to Sentinel. Feedburner detects most traditional scraping immediately and Sentinel, hopefully, will be able to pick up the rest.

The bottom line is that, while Sentinel may overlap existing technologies, it also fills gaping holes that they’ve left behind, offering a layer of protection unlike anything seen bloggers have seen before.

Conclusions

In the end, bloggers will have to decide whether or not they want to use Sentinel. However, since the basic version of Sentinel will be free, there will be little reason not to.

If it goes as it appears to be, it will likely service the merely curious, the protectors of copyright and the copyleft crowd alike. Anyone who is remotely interested, for any reason, about where their content is being reused will likely find something to smile about when using Sentinel.

But in terms of pure copyright protection, Sentinel will likely be very hard to beat. both for its brain and for its immediacy. It will be very interesting to see if and how Sentinel affects content reuse, both legitimate and illegitimate, after it is released.

Until then though, anyone who is interested in Sentinel should visit the Blogwerx site and add your email address to their list of potential beta testers.

It should be a very interesting launch.

Kevin Rose Wants You to Avoid Using the Name “Digg”

Still on intellectual property, I didn’t know you could trademark common-sounding names and prevent other people from using variations. This is much like the variations of “Google” (which may include “Go Ogle”), the owners of which Google has apparently been suing successfully. However, in this case, it’s not some obscure made-up name. “Digg” sounds very much like “dig.” And yes, that’s a dictionary word to you illiterate scum.

Kevin Rose (poster boy for Web 2.0) doesn’t want you to use the word “digg” on your website names or domains.

We don’t want to shut anyone down (not even the clone sites), all we ask is that you avoid using the name ‘digg’ in your website names/domains. We’re looking to see if we have any other options.

No matter how nicely you try to put it, Kevin, it sounds like “lay off our name” to us.

So I guess if we trademark the name “Jack of all blogs,” we could legally prevent anyone from setting up a site named “Jack off all blogs”.