Protect images from copyright theft
If you have valuable images on your website, you may be wondering how to prevent other people from saving and using your images. Scraping, leeching and general reuse of images without permission is a problem when you've invested time and energy in creating those images.
Unfortunately, protecting website images from copyright theft is almost impossible. By its nature, the web is an open platform for sharing content. Once an image has been received by a persons browser, it exists in their browser cache and there are ways to access it.
However, there are steps you can take to make it harder for people or bots to get hold of your copyrighted images.
1. Use rate limiting to slow down scraping bots
Scraping is a technique performed by bots to go through a website page by page and save information from it. To protect your images from being scraped by a bot, identify the bots and block them from accessing your files.
A very determined scraper could try and avoid detection of its activity, but you can successfully discourage bots by making it hard for them. It's important to do this in a way that distinguishes between genuine traffic and scraping traffic, to avoid blocking real users and search engines.
Rate limiting is something that can be configured on your server, to limit the number of web pages served to IPs displaying suspicious activity. Learn more in this Guide to preventing Webscraping.
2. Block scraping bots
Sirv will block all known scraping bots automatically. However, new bots can be created and existing bots can camouflage their activity. To stay ahead of the bots, check the top referral domains shown on your Analytics page, to identify any IPs and user agents that are requesting an abnormally high number of files. Investigate their authenticity and report them to Sirv if you believe there may be misuse.
Your Top 10 referral domains table looks like this:
3. Block good bots
Bots should follow the rules in your websites robots.txt file. Bad bots won't but good bots will.
Tell bots not index any of your images by disallowing specific file type extensions in your robots.txt file:
User-agent: * Disallow: *.jpg Disallow: *.png Disallow: *.gif
If you would like major search engines such as Google Images to index your images, you can disallow all bots by default and specifically allow certain bots, like so (Slurp is Yahoo's bot):
User-agent: Googlebot Allow: / User-agent: Googlebot-image Allow: / User-agent: Bingbot Allow: / User-agent: msnbot Allow: / User-agent: Slurp Allow: / User-agent: * Disallow: /
4. Watermark your images
Brand your images by placing a visible logo or text watermark on your images. This is easy with Sirv. The location, style, opacity and size are all customizable. Watermarks can be applied either via the URL or a profile (recommended if you have many images).
Credit: Will Copelake & Sidetracked
5. Invisibly watermark your images
To avoid obscuring your images with a visible watermark, you can apply a hidden watermark. Digimarc provides image watermarking that humans cannot see but software can. Sirv can automatically apply Digimarc to your images, which will then be identified by Digimarc in future, if they are served to another website. Digimarc is a paid, third-party service starting at $99/year. Create a Digimarc account.
6. Disable meta stripping
One of the many optimizations Sirv makes to serve images as fast as possible is to remove meta data. If your meta data contains copyright or other important information, you can set Sirv to keep the meta in your optimized images. It won't stop people using images without your permission but it can help people find out who owns the image so they can ask to license it from you.
Simply set the "Strip image metadata" option to "Disabled" in your Default profile: