Misinformation sites have an open-door policy for AI scrapers
Getty Images
Fast Company
AI models have a voracious appetite for data. Keeping up to date with information to present to users is a challenge. And so companies at the vanguard of AI appear to have hit on an answer: crawling the web—constantly.
But website owners increasingly don’t want to give AI firms free rein. So they’re regaining control by cracking down on crawlers.
To do this, they’re using robots.txt, a file held on many websites that acts as a guide to how web crawlers are allowed—or not—to scrape their content. Originally designed as a signal to search engines as to whether a website wanted its pages to be indexed or not, it has gained increased importance in the AI era as some companies allegedly flout instructions.
ADDITIONAL NEWS FROM THE INTEGRITY PROJECT