Close Menu
Daily Guardian EuropeDaily Guardian Europe
  • Home
  • Europe
  • World
  • Politics
  • Business
  • Lifestyle
  • Sports
  • Travel
  • Environment
  • Culture
  • Press Release
  • Trending
What's On

The EU-Mercosur deal takes effect — but the fight over it goes on – POLITICO

May 1, 2026

Israel arrests man filmed attacking Catholic nun near Jerusalem’s Old City

May 1, 2026

World Cup travel demand rises, but not all host cities will get the economic win

May 1, 2026

Magyar defends appointing brother-in-law as justice minister – POLITICO

May 1, 2026

Why news publishers are blocking AI from accessing internet archives

May 1, 2026
Facebook X (Twitter) Instagram
Web Stories
Facebook X (Twitter) Instagram
Daily Guardian Europe
Newsletter
  • Home
  • Europe
  • World
  • Politics
  • Business
  • Lifestyle
  • Sports
  • Travel
  • Environment
  • Culture
  • Press Release
  • Trending
Daily Guardian EuropeDaily Guardian Europe
Home»Lifestyle
Lifestyle

Why news publishers are blocking AI from accessing internet archives

By staffMay 1, 20264 Mins Read
Why news publishers are blocking AI from accessing internet archives
Share
Facebook Twitter LinkedIn Pinterest Email

Around 245 global news organisations across nine countries are attempting to block the Internet Archive’s crawlers. These are automated software bots that capture, display and archive content from web pages in the Internet Archive’s public-facing interface, the Wayback Machine.

The Archive holds over one trillion web pages dating all the way back to 1996, making it one of the biggest collective public information resources in the world. This includes past articles from major news organisations such as CNN, The New York Times, The Guardian, and USA Today.

These web pages are used for a variety of purposes, for example, as primary sources for historians, or to prove changes after publication.

Several news organisations are now pushing to block the crawlers as AI companies are now using the contents of the Archive to train Large Language Models (LLMs) without offering fair payment or acquiring permission.

More than 20 major news organisations already block ia_archiverbot, the main web crawler the Internet Archive uses for the Wayback Machine, according to an analysis by AI-detection company Originality AI.

However, at least one of the Archive’s four crawling bots is blocked by 241 global news sites. A major chunk of these blocked sites is owned by USA Today Co, the US’s biggest newspaper publisher. This means that hundreds of local publications have been practically removed from historical records.

The risks of archival content being used to train AI

Archival news content provides massive quantities of high-quality text and images to train large-scale AI models in more human writing. This is available through URL and API interface, which allows different software to communicate with each other and request data, acting as a bridge between systems.

This makes it even easier for AI companies to access archived data and train models.

Another advantage is that content in the Internet Archive is already structured, attributed and dated.

Much of the Internet Archive’s data has already been found in key AI-training datasets. However, this is a major weakness for news organisations, which are already suing AI companies such as Perplexity and OpenAI for potential copyright violations.

“The issue is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us,” Graham James, a spokesperson from The New York Times newspaper, said, as cited by The Next Web.

“The Times invests an enormous amount of resources in producing original journalism, and that work should not be used without our permission.”

Other organisations, such as The Guardian, have taken a more conservative approach by limiting, rather than completely blocking the Archive’s access.

Internet Archive maintains that it is “collateral damage”

The Wayback Machine’s director, Mark Graham, has maintained that they are merely “collateral damage” and that the real culprits are the AI companies which access past content through the Archive’s interfaces.

However, the Archive has taken measures of its own to limit this. This includes preventing large downloads of some site materials and limiting automated extraction in certain cases.

Graham highlighted that the Archive functions as a key method of preservation. Without this, articles which are not archived can be edited without authorisation or accountability. This can be anything from changing or removing quotes, amending mistakes or redirecting claims and official statements.

Currently, these changes are tracked by the Wayback Machine.

This has led to some news organisations attempting to work with the Internet Archive to find acceptable compromises or workarounds which involve limiting access rather than hard blocks.

Similarly, non-profit digital rights advocacy group Fight for the Future has also launched a petition, already signed by 100 current journalists, to protest against this blocking. This is especially at a time when public records and history are increasingly contested.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Keep Reading

Elon Musk clashes with OpenAI lawyer on third day of trial over ChatGPT maker

New debate over Pluto: Is the dwarf set to become a planet again?

‘Virtual rape’: AI and deepfakes are silencing women in public life, UN report

EU finds Meta in breach of digital rules over children on Instagram and Facebook

Nearly half of London jobs at risk of AI disruption and women will be hardest hit, new report finds

Inside Woven City: Japan’s real-life sci-fi town where robots share the streets with humans

China blocks Meta from buying AI startup Manus

OpenAI just changed its principals. Here’s what’s changing

Which country in Europe has the most data centres driving the AI boom?

Editors Picks

Israel arrests man filmed attacking Catholic nun near Jerusalem’s Old City

May 1, 2026

World Cup travel demand rises, but not all host cities will get the economic win

May 1, 2026

Magyar defends appointing brother-in-law as justice minister – POLITICO

May 1, 2026

Why news publishers are blocking AI from accessing internet archives

May 1, 2026

Subscribe to News

Get the latest Europe and world news and updates directly to your inbox.

Latest News

Explained: The rules around package holiday surcharges – and which operators won’t be adding them

May 1, 2026

Video. Kiwi birds return to New Zealand’s capital after a century away

May 1, 2026

Direct Prague to Copenhagen train returns after more than a decade

May 1, 2026
Facebook X (Twitter) Pinterest TikTok Instagram
© 2026 Daily Guardian Europe. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.