A new study suggests that vast amounts of internet content are disappearing, challenging the common perception of the web as a permanent repository of information. The research, published by the Pew Research Center, indicates that significant swathes of web pages and online content have been lost over the past decade.
According to the study, 38 percent of the web pages that existed in 2013 are no longer available. Even more recent content is prone to disappear; eight percent of pages from 2023 are already inaccessible. This trend occurs mainly when pages are deleted or moved within otherwise operational websites rather than when entire websites go offline.
The loss of content has affected various sectors, including news and government sites. Notably, 23 percent of news pages analyzed contain at least one broken link, and 21 percent of government domains show similar issues. Wikipedia, a widely used reference website, also suffers from this problem, with 54 percent of its pages having at least one dead link in their reference sections.
The vanishing content is not confined to web pages alone; social media platforms are also affected. The study found that approximately 20 percent of tweets disappear from the platform within a few months of being posted.
Researchers compiled their findings by examining nearly a million web pages using Common Crawl, a service that archives significant portions of the internet. They then tracked the availability of these pages from 2013 to 2023. Their analysis revealed that 25 percent of archived pages within that timeframe are no longer accessible. Of these, 16 percent originated from existing websites, while 9 percent came from websites that are now defunct.