Misinformation sites have an open-door policy for AI scrapers

Fast Company
AI models have a voracious appetite for data. Keeping up to date with information to present to users is a challenge. And so companies at the vanguard of AI appear to have hit on an answer: crawling the web—constantly.

But website owners increasingly don’t want to give AI firms free rein. So they’re regaining control by cracking down on crawlers.

To do this, they’re using robots.txt, a file held on many websites that acts as a guide to how web crawlers are allowed—or not—to scrape their content. Originally designed as a signal to search engines as to whether a website wanted its pages to be indexed or not, it has gained increased importance in the AI era as some companies allegedly flout instructions.

MORE

ADDITIONAL NEWS FROM THE INTEGRITY PROJECT