A lot of news websites restrict any crawler other than Google. And this does not...

simias · on March 26, 2021

Indeed, years ago I had scripts to automatically fetch URLs from IRC and I quickly realized that if I didn't spoof the user agent of a proper web browser many websites would reject the query. Googlebot's UA worked just fine however.

judge2020 · on March 26, 2021

> Googlebot's UA worked just fine however

They obviously don't care enough then - Google says you should use rdns to verify that googlebot crawls are real[0]. Cloudflare does this automatically now as well for customers with WAF (pro plan).

0: https://developers.google.com/search/docs/advanced/crawling/...