HN2new | past | comments | ask | show | jobs | submitlogin
OpenAI and Anthropic are ignoring robots.txt (businessinsider.com)
18 points by Handy-Man on June 21, 2024 | hide | past | favorite | 6 comments



robots.txt is a suggestion not a rule


But terms of service are a rule, and robots.txt are usually a machine readable representation of the terms of service.


What do you mean by "rule"? Depending on the exact circumstances, violating terms of service or ignoring robots.txt may not be a violation of criminal law or create any civil liability. In particular, scraping public data is generally legal under the CFAA regardless of robots.txt content.

https://newmedialaw.proskauer.com/2022/05/24/doj-revises-pol...

As a practical matter, if web site owners don't like particular HTTP requests then they can just ignore them or return errors or junk responses.


That's a cool article to read. It explicitly wonders whether a robots.txt is enough to revoke authorization. And it seems like the DoJ does allow itself to consider the different blocking mechanisms used (including robots.txt) on whether to prosecute.

The DoJ is explicit in saying that something like a Cease and Desist is enough, so if for example the NYTs found OpenAI's bot then that would likely be prosecutable.


Title editorialized due to being too long




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: