Close Menu
Daily Guardian EuropeDaily Guardian Europe
  • Home
  • Europe
  • World
  • Politics
  • Business
  • Lifestyle
  • Sports
  • Travel
  • Environment
  • Culture
  • Press Release
  • Trending
What's On

Europe’s airlines face jet fuel shift as safety concerns grow

May 8, 2026

Video. Vigils held across Australia after death of 5-year-old girl

May 8, 2026

Oil tanker arrives in South Korea after leaving the Strait of Hormuz

May 8, 2026

Dublin to Belfast train journeys set to drop below two hours in €700m rail overhaul

May 8, 2026

Music meets wellness: Edinburgh Fringe 2026 will have theatre inside UK’s biggest sauna

May 8, 2026
Facebook X (Twitter) Instagram
Web Stories
Facebook X (Twitter) Instagram
Daily Guardian Europe
Newsletter
  • Home
  • Europe
  • World
  • Politics
  • Business
  • Lifestyle
  • Sports
  • Travel
  • Environment
  • Culture
  • Press Release
  • Trending
Daily Guardian EuropeDaily Guardian Europe
Home»Lifestyle
Lifestyle

Why news publishers are blocking AI from accessing internet archives

By staffMay 1, 20264 Mins Read
Why news publishers are blocking AI from accessing internet archives
Share
Facebook Twitter LinkedIn Pinterest Email

Around 245 global news organisations across nine countries are attempting to block the Internet Archive’s crawlers. These are automated software bots that capture, display and archive content from web pages in the Internet Archive’s public-facing interface, the Wayback Machine.

The Archive holds over one trillion web pages dating all the way back to 1996, making it one of the biggest collective public information resources in the world. This includes past articles from major news organisations such as CNN, The New York Times, The Guardian, and USA Today.

These web pages are used for a variety of purposes, for example, as primary sources for historians, or to prove changes after publication.

Several news organisations are now pushing to block the crawlers as AI companies are now using the contents of the Archive to train Large Language Models (LLMs) without offering fair payment or acquiring permission.

More than 20 major news organisations already block ia_archiverbot, the main web crawler the Internet Archive uses for the Wayback Machine, according to an analysis by AI-detection company Originality AI.

However, at least one of the Archive’s four crawling bots is blocked by 241 global news sites. A major chunk of these blocked sites is owned by USA Today Co, the US’s biggest newspaper publisher. This means that hundreds of local publications have been practically removed from historical records.

The risks of archival content being used to train AI

Archival news content provides massive quantities of high-quality text and images to train large-scale AI models in more human writing. This is available through URL and API interface, which allows different software to communicate with each other and request data, acting as a bridge between systems.

This makes it even easier for AI companies to access archived data and train models.

Another advantage is that content in the Internet Archive is already structured, attributed and dated.

Much of the Internet Archive’s data has already been found in key AI-training datasets. However, this is a major weakness for news organisations, which are already suing AI companies such as Perplexity and OpenAI for potential copyright violations.

“The issue is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us,” Graham James, a spokesperson from The New York Times newspaper, said, as cited by The Next Web.

“The Times invests an enormous amount of resources in producing original journalism, and that work should not be used without our permission.”

Other organisations, such as The Guardian, have taken a more conservative approach by limiting, rather than completely blocking the Archive’s access.

Internet Archive maintains that it is “collateral damage”

The Wayback Machine’s director, Mark Graham, has maintained that they are merely “collateral damage” and that the real culprits are the AI companies which access past content through the Archive’s interfaces.

However, the Archive has taken measures of its own to limit this. This includes preventing large downloads of some site materials and limiting automated extraction in certain cases.

Graham highlighted that the Archive functions as a key method of preservation. Without this, articles which are not archived can be edited without authorisation or accountability. This can be anything from changing or removing quotes, amending mistakes or redirecting claims and official statements.

Currently, these changes are tracked by the Wayback Machine.

This has led to some news organisations attempting to work with the Internet Archive to find acceptable compromises or workarounds which involve limiting access rather than hard blocks.

Similarly, non-profit digital rights advocacy group Fight for the Future has also launched a petition, already signed by 100 current journalists, to protest against this blocking. This is especially at a time when public records and history are increasingly contested.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Keep Reading

Elon Musk faces criminal probe in France as prosecutors escalate X’s AI Investigation

Meta asks judge to overturn verdict in landmark social media addiction trial

Holocaust denial is creeping into Dutch classrooms via social media, survey shows

Star Wars in real life? Astronomers find 27 possible twin-sun worlds

Europe is hungry for AI data centres — but its energy grid cannot feed them

Made in China, engineered in Germany: Inside Xiaomi’s EV push ahead of planned 2027 Europe entry

Russia launches new Soyuz-5 rocket from Kazakhstan cosmodrome in first test flight

Elon Musk clashes with OpenAI lawyer on third day of trial over ChatGPT maker

New debate over Pluto: Is the dwarf set to become a planet again?

Editors Picks

Video. Vigils held across Australia after death of 5-year-old girl

May 8, 2026

Oil tanker arrives in South Korea after leaving the Strait of Hormuz

May 8, 2026

Dublin to Belfast train journeys set to drop below two hours in €700m rail overhaul

May 8, 2026

Music meets wellness: Edinburgh Fringe 2026 will have theatre inside UK’s biggest sauna

May 8, 2026

Subscribe to News

Get the latest Europe and world news and updates directly to your inbox.

Latest News

‘We will not be intimidated’ by attack on D66 party offices, says Dutch PM – POLITICO

May 8, 2026

Video. There was ‘no real ceasefire’ in Lebanon, says Lebanese MP

May 8, 2026

Exclusive: Iranians risk arrest at Iraq border to escape Tehran’s total internet blackout

May 8, 2026
Facebook X (Twitter) Pinterest TikTok Instagram
© 2026 Daily Guardian Europe. All Rights Reserved.
  • Privacy Policy
  • Terms
  • Advertise
  • Contact

Type above and press Enter to search. Press Esc to cancel.