Digital Dark Age: Information Explosion and Data Risks
“Old formats of documents that we’ve created or presentations may not be readable by the latest version of the software because backwards compatibility is not always guaranteed,” says Vint Cerf, Google’s Vice President and one the fathers of the Internet.
Digital dark age describes the belief that the rapid evolution of technology will eventually make storage formats obsolete, and data will not be accessible to generations to come. It’s easy to assume that the data we store will somehow be preserved forever. Vint Cerf calls this phenomenon as ‘bit rot‘.
Evolution of Digital Storage
Magnetic tape was the first storage medium that revolutionized the digital industry. It was first introduced in the year 1928. Over the years, magnetic tape can suffer from deterioration called sticky-shed syndrome, caused by absorption of moisture into the binder of the tape, rendering the tape unusable. Storage of 1024 bits of information was successfully implemented in 1948 using electrostatic cathode ray tubes called William tubes. Magnetic tapes were first used to record computer data in 1951 on the UNIVAC-1.
IBM 350 was the first disk drive introduced by IBM during 1956, and was the size of a large wardrobe. It had a storage capacity of 3.75MB. It was during 1966 that Robert H. Dennard invented DRAM cells. Dynamic Random Access Memory technology (DRAM), or memory cells that store bits of information as an electric charge in a circuit. Eventually the size of these storage devices started diminishing. The world’s first 2.5-inch HDD was introduced in 1988, and the first USB flash drive was introduced by the year 2000. Though its capacity was only 8MB, within the next 13 years, we had world’s first 1TB USB flash drive. The storage capacity is increasing exponentially.
With continuous technological development and the introduction of cloud and wireless storage, data storage and transfer has become more convenient than we would have ever imagined a few decades ago.
Data Storage at Risk
The digital dark age is actually not new to the storage industry. Roughly 10 years ago, for example, the Storage Networking Industry Association started talking about the notion of a “100-year archive” to address such issues, though, for the most part, such efforts have “fallen flat”.
Simon Robinson, a Research Vice President with 451 Research says, “In this sense, it’s great that Google and others are starting to at least talk about the risks and potential paths forward.” Lynne Brindley, former British library’s chief executive says, “There were more than 150 websites relating to the 2000 Olympics in Sydney, but they vanished instantly at the end of the games and are stored only by the National Library of Australia. Historians of the future, citizens of the future, will find a black hole in the knowledge base of the 21st century.”
Many digital libraries and professional digital archivists like British Library and Internet Archive are making sure that some of today’s most important piece of data are being preserved for future generations. NASA suffered a digital dark age problem with their early space records. For over a decade, magnetic tapes from the 1976 Viking Mars landing were unprocessed because the data was stored in an unknown file format. It took NASA many months to solve the problem by analyzing the recording machine’s functionality. The BBC’s Doomsday Project of 1986, intended to record the state of the nation for posterity, was recorded on two 12inch videodisks. By 2000 it was obsolete and unreadable, since the computer capable of reading the format had become rare. However, the system was then emulated in 2002 by a specialist team.
Deciphering Long Term Storage
With increase in encryption and cryptographic storage mechanisms, decoding the data requires knowledge of the encryption algorithm or mechanism used, which means the encryption system must also be preserved for years to come. What kind of ‘code’ will be most easily accessible to future generations, and what technologies will they have available to help them decrypt a message from the past? If you lose the keys to your house, you can call a locksmith to open the door, or you can break a window in the back. But if you lose the encryption keys to your data, the data could be irretrievable. The danger of data becoming inaccessible even to those with clearance is quite real. A lost encryption key could bring about a disaster on the level of a failed hard drive or a corrupt database. Protecting and securing encryption keys is very vital for data protection and preservation for the future.
Big Data and Information Explosion
Two and a half quintillion bytes: that’s the amount of data generated every day across the globe. We are living in the era of data explosion with every Internet activity, including social networks, wikis, blogs, emails, traffic systems, airplanes, satellite and weather sensors. The challenges we face are in storage, processing, analysis, searching and visualization of these vast amounts of data. Due to this data explosion, 90% of data available today on Earth have been created in the last two years. This can lead to an information overload, where managing and understanding too much data will become very difficult.
As of now, there are approximately 4.4 trillion gigabytes of information in the digital universe. By 2020, this will expand to 44 trillion gigabytes and more. As per statistics collected by DOMO, every minute on Internet, the following data is generated:
- Facebook users share nearly 2.5 million pieces of content.
- Twitter users Tweet nearly 300,000 times.
- Instagram users post nearly 220,000 new photos.
- YouTube users upload 72 hours of new video content.
- Apple users download nearly 50,000 apps.
- Email users send over 200 million messages.
Amazon generates over $80,000 in online sales.
Standards to Follow
“When you think about the quantity of documentation from our daily lives that is captured in digital form, like our interactions by email, Tweets, and all of the World Wide Web, it’s clear that we stand to lose an awful lot of our history,” says Vint Cerf. For those unsure of what VHS and Beta mean, it’s akin to CDs giving way to DVDs, the Polaroid giving way to the digital camera, the home stereo and vinyl records morphing into the Walkman then into the iPod Touch and finally today’s smartphones that provide all those movie, music and photo capabilities and more. Imagine a future with rapidly growing new technologies when Microsoft Word, Excel programs, and PDFs may fade away by replacement of other alternative apps.
Industry experts are suggesting the embrace of open standards and open programs. The irony of digital obsolescence, of course, is that digital technology was supposed to make personal effects and memories less ephemeral and more secure.
Bit rots could be due to bits in the files being silently flipped as data traverses computer memory, and being written to disk after. Tiny errors are impossible to correct in files with lots of random data, like pictures. Bit rot is insidious and can cause even backups to fail.
As our data archives become bigger, we will need to pay more attention to bit rot or end up with large numbers of corrupted files.
- Physical media damage. Hard drives wear out and go bust and optical discs scratch and grow fungus.
- Digital rights management – Some content is licensed only, and needs a key to unlock it. If the key’s gone, then so’s the access to the content.
- Fat-fingering – That “oops” moment when you press the wrong key, or unplug the computer when it’s running, and lose everything.
- Keep up to date with storage technology, and in the process, migrate data to the new technology. Idaho’s Ada County, for example, is just now migrating from Zip disks, which were introduced in 1995 and stopped being made in 2003, leaving the county scouring eBay and Craigslist to find replacement drives.
- While migrating data to new storage formats, save it in newer data formats as well. “A longer term option may be to store your data in the cloud with giant Internet corporations (e.g. Google, Amazon, Microsoft),” as those companies will update storage technology for you, writes Michael Zhang at PetaPixel.
- Organizations such as the Library of Congress and the Internet Archive are storing data such as Twitter posts and web pages.
- The industry is working on open data formats, rather than proprietary ones, which will make it easier to read data files even if the vendor goes out of business, which adds that the biggest problem the future may have is trying to pick the wheat from the chaff of voluminous digital files.
On the positive side though, with a bit of effort, personal and shared digital culture could be safer than any time in history, as there is no single point of failure. Ultimately, the solution may be what Cerf calls “digital vellum,” or projects intended to preserve data and the means to read it for long periods to come. Stored under the right conditions, vellum documents can reportedly last for more than 1,000 years. Cerf has been promoting this concept for the past year or so.
Effective digital preservation relies, to some extent, on the activities of the creator as well as the archivist. Today those decisions include providing context, using standard and open file formats, organizing material sensibly, and making provisions for rights issues to avoid the problem of orphan works. Though we are not going to face digital dark age soon, we should be aware of the predictions and the damage that might happen in the future. Digital preservations are and will be a topic of discussion in the near future. The digital preservation communities like The National Archives UK, The British Library, DRI, The UK Data Archive and many others have raised concerns over the digital dark age. But unfortunately the truth is, though these communities have been working tirelessly for years to preserve data, there is still lot of data that has not been archived. Prevention of the digital dark age can be achieved only by proper awareness and training by organizations and individuals in the field.