Being Google or Microsoft (or microsoft affiliated)has its perks.
Laws around scraping content and using that data for derivative works is incredibly nuanced. This article is the best up-to-date overview of the state of the industry [1].
TL;DR - IANYL. if you have enough money for legal defense, and you are scraping publicly available, not behind login-gate, content, it's probably fine and defensible, but will cost an unbelievable amount of time and money to defend.
I wonder how they licensed all those websites that had no license information, making them by default copyrighted.