Most data offered online is for public consumption. Data scraping on such sites is legal. However, not all online data falls in that category. The law has restrictions on accessibility and use of information on censored sites, copyright infringements, and geo-specific information.
Data scraping is an automatic method of using scripts or software to search for websites and extract data from them. You may worry that the information you need to scrap for your business could land you in trouble. However, some website owners find the thought of others scraping their sites unacceptable.
Table of Contents
When is Data Scraping Unethical but not illegal?
Sometimes, the manner of web scraping becomes offensive, like when the scraper sends too many requests to the site. How many are too many? Well, if the number surpasses the normal expected for a human user, the hits become similar to a bot attack. Such attacks overload websites and compromise their efficiency and security.
Data scraping enables prowling companies to procure the data they need. The intent for this is financial gain. Website owners find it unfair that other users should scrap their data and make money from it without paying for it.
Also, some websites have security and privacy measures that prohibit scraping. Some users find ways of going around such measures and acquiring the information. They may use VPNs and proxies to take care of such scraping tasks. It may not be that the data is entirely private, but the company may fear that it may be misused.
In more aggressive situations, a user can infringe on copyright rules by accessing and downloading prohibited data. The caveat is that publicly available data that’s not copyrighted should not be exempted from scraping.
Nevertheless, many website owners engage in data scraping, but wisdom lies in knowing the associated legal implications.
Court Ruling on Data Scraping
Data scraping basically happens without the explicit approval of the data owner. In September 2019, the 9th US Circuit Court of Appeals in San Francisco ruled that web scraping is not in contravention of the Computer Fraud and Abuse Act (CFAA). The CFAA comprises the country’s anti-hacking law.
This follows a suit where a data analytics company, hiQ Labs, was liable for the crime of scraping Microsoft’s LinkedIn site. Although Microsoft had written a letter asking hiQ labs to stop scraping its website, the court ruled that hiQ’s activities were legal. And the data collected was in the public domain.
This ruling for an open internet clarified that the CFAA prohibited computer hacking that involved intrusion into another computer without permission. The court further ruled that the data was owned by the respective users and not by LinkedIn.
Illegal Data Scraping
Regardless of the many legitimate reasons for web scraping, many reasons are neither harmless nor legal.
These practices comprise illegal data scraping that could bring legal problems to your business:
- Accessing data that a company has copyrighted and using it for commercial purpose
- When your business scraping intentionally ignores regulations laid down in the owner’s Robot.txt, or you fail to ask for permission from the owner to scrap the site
- When you disobey CFAA’s law by accessing data in an abusive way and using it for commercial gain
- When you disregard using a reasonable data scraping rate. Here your business sends too many hits to a website or server and the frequent and numerous requests are similar to a bot attack
- When you indulge in web crawling on a site whose Terms of Service clearly prohibit it
- If you access other data in a prohibited area that does not exist in the public domain and use it for financial gain or republish it
- If you use different API besides the one provided by the server owner, and through it, you contravene copyright law, or you damage the website
Avoiding legal Problems When Data Scraping
Your business needs data scraping to grow its reach, develop marketing strategies, and accomplish some of its daily management roles like managing employees and inventory.
To avoid liability, you should only scrap sites whose owners don’t prohibit web crawling. Similarly, you should seek permission from the server’s owner if your organization’s needs will require scraping beyond the information offered to the public.
If you access information on a server belonging to a third party and spam that owner, hack passwords, or harvest email addresses, that will attract legal action.
Discreetly use VPNs and API within the stipulated regulations to avoid damaging other companies’ websites. If the data procured could give your business financial advantage, contact the owner to discuss compensation where applicable.
Getting hit with a legal suit because of subversive business practices messes up your company’s image. Worse still, success in business on the internet requires integrity as a sign of credibility. Play clean since the internet never forgets.