Imagine a world with unlimited access to any and all kinds of information. There’d be no barriers to learning. Books, music, video, software and other media would be freely available to everyone. Brewster Kahle dreams about creating such a world. He believes it’s possible and is taking steps toward that goal.
“Universal Access to All Knowledge” was the title of Kahle’s keynote presentation at the Dodging the Memory Hole 2017 forum Nov. 15-16 at the Internet Archive in San Francisco. As the Internet Archive’s founder and digital librarian, Kahle spoke to archivists, journalists, librarians, scholars and other stakeholders who had gathered to focus on how best to save online news. Providing long-term access to all kinds of digital information, including past versions of websites, is vital to news organizations and society at large, Kahle noted.
This was the fifth event in the DTMH series, made possible by a grant from the Institute of Museum and Library Services and funding from the Donald W. Reynolds Journalism Institute. Since its beginnings in 1996, the Internet Archive has become an epicenter of digital preservation. The driving force behind the organization is Kahle’s vision to “fulfill the promise of the web.”
Kahle reflected on that ideal: “The goal of the internet that I signed onto was to try to provide universal access to all knowledge. Could we build the Library of Alexandria version two? Could we make it so that all the published works of humankind could be available to anybody curious enough to want to have access to it? The answer is — technologically — absolutely.”
That ambitious goal would include news content, and Kahle encouraged and called for ideas that would strengthen journalism and the communities that depend on news organizations for accurate information. “Often the best information is not there, or if it is there you can’t find it, or if it’s there, it’s not there tomorrow,” he said.
A critical part of additional infrastructure Kahle was talking about is archiving the pages of the web, including and especially sites that offer news content. Since 1996, more than 308 billion online pages have been captured and made accessible on Internet Archive’s Wayback Machine, which made its debut in 2001. In addition, the Internet Archive launched the TV News Archive in 2012 as a searchable collection of video clips from TV news programs now numbering some 1.4 million, gathered from 60 channels, reaching back as far as the year 2000. Short segments of the clips can be inserted into news articles and streamed directly from the archive servers, making it relatively easy to share TV news online.
Universal access to information online would benefit journalism, according to Kahle. Doing so would allow reporters greater access to information and make it easier for them to cite the sources from their research. The use of more citations could give greater transparency and credibility to news stories, one possible way to distinguish quality journalism from fake news.
“The goal of the internet that I signed onto was to try to provide universal access to all knowledge. Could we build the Library of Alexandria version two? Could we make it so that all the published works of humankind could be available to anybody curious enough to want to have access to it? The answer is — technologically — absolutely.”
The Wayback Machine’s snapshots of webpages can also serve journalists. These archived pages can be quite useful as permanent historical records of an ever-changing web environment. For example, if a politician or government agency changes their position on an important or highly charged issue, they may simply change or eliminate the problematic information on those pages, leaving no evidence of making such a shift. “We want to make it so that you can’t just blink things off the net and put it down the memory hole,” said Kahle.
Journalists can also use the Wayback Machine to see how online information evolves over time. For example, during the DTMH forum, New York Times writer Katie Hafner described how she used archived pages from the Wayback Machine to document ways some dermatology clinics have shifted their business model toward questionable skin cancer treatments for seniors over a period of several years.
A major impediment to Kahle’s vision of universal access to all knowledge is copyright. He calls this “getting past the 1923 problem,” a reference to the year that currently divides works in the public domain – if published prior to 1923 – from those published after that date that may still be protected under copyright law.
For the past six years, the Internet Archive’s Open Library initiative has been using the “digitize and lend” model as a way to accommodate copyright while still providing access to books that they have scanned and made available online. “It’s basically trying to be respectful to the publishers, the authors and their lawyers. The idea is to wave a wand over all libraries and to turn their book collections into digital books. Then a patron can go and request either the physical book or the digital book – one or the other and one reader at a time. Since we’ve been sort of keeping it to one reader at a time, nobody has gotten mad,” said Kahle.
Even so, there’s still a long way to go before Kahle and the Internet Archive achieve the goal of universal access to all knowledge. Although the Google Books lawsuit established that scanning of books is legal under the Fair Use clause of copyright law, Google can only show text snippets from the books. As of 2015, the Google Books project had digitized about 25 million out of approximately 130 million titles worldwide. Even if all those volumes were scanned, it seems highly unlikely that publishers, authors and other content creators will allow their copyrighted works to be published online for all to access for free. After all, that’s how they make their living.
On the other hand, the progress Kahle has made toward his goal is impressive. Through the Internet Archive, the Wayback Machine, the TV News Archive and the Open Library initiative, an incredible amount of information — totaling 35 petabytes and growing — is currently available from the Internet Archive at no cost. In addition, Kahle’s organizations are currently digitizing millions of items including radio programs, movies on VHS tapes and even old 78 rpm records in order to make more of them publicly accessible. Kahle has also made it easy for anyone with an internet connection to contribute content to the Internet Archive collections. If people want to help, he said, “go to archive.org, find the upload button and hit it. Add something.”