Thursday, 31 December 2015

DNA as Storage for Mankind’s Permanent Record

In this era of cloud storage and ever-recoverable user accounts, the idea of data just “disappearing” can seem downright odd. The EU has had to pass Right to be Forgotten legislation just to require companies to work to make it possible for data to go away. Yet given the sheer volume of data being generated and made available on the Internet these days, can that trend possibly persist? Tweets already pass out of easy access through search in just a few weeks’ time. The Internet is beginning to buckle under the weight of user-generated video. Can digital storage media progress fast enough to keep up with mankind’s ability to generate ones and zeroes?

Perhaps it doesn’t have to. In DNA, evolution has come up with a highly specialized form of storage: physically compact and unusually durable. DNA is nature’s hard drive, and although it’s certainly not perfect, it also has some cool features that beat even the most advanced digital technology. Recent advances could take DNA’s abilities in data storage from theory to practice, bringing molecular memory into the mosaic of technologies that let mankind store knowledge outside the brain.

THE DATA “CRISIS”
At the end of the day, it’s a good problem to have: From the Internet to genomic sequencing, too many people want to use this new world’s rich, innovative features. It’s also a potentially debilitating problem that reduces user interest in the Internet, and puts the integrity of potentially important data at risk. If we have so much data to store and we can’t afford multiple redundant backups, then eventually power surges and hardware failures will lead to knowledge that fundamentally disappears. Consider the fact that despite everything we know today—about topics ranging from nuclear fusion to black holes to genetic engineering—we still don’t know, and never will know, just what knowledge was lost in the burning of the Library of Alexandria. You can’t reinvent the thoughts of ancient people, nor can you rediscover the historical insights of unique documents and ledgers once they’ve become ash. It might seem trivial now, but if a tweet passes on to be forgotten and never recovered, isn’t that an equivalent sort of loss? The Library of Congress tried to step up and manage the full archive of Twitter posts a few years ago, but at close to half a trillion messages, the project has stalled and may still never see the light of day. YouTube execs have claimed the video platform is putting up something like 400 new hours of video every minute—a figure that, if accurate, makes it clear why Google has struggled to make the wildly successful business even modestly profitable. With wearables enabling such detailed tracking of personal metrics, this upward trend in data generation is not going to change anytime soon.

NEXT-GENERATION DATA STORAGE
In its March 2013 issue, PC Magazine published an article on an amazing breakthrough in DNA science: Harvard University researchers had managed to store 700TB of information on just a single gram of material. It was an incredible proof of concept, and a reminder of how biology is really just genetic data given form. Yet, in the wake of that discovery, there was a surprising reaction: serious interest. It turns out that long-term storage of a whole lot of data is a more pressing concern than the researchers had anticipated. Since then, they’ve set up a commercial business based on the idea. The basic appeal is twofold. DNA can store dizzying amounts of information in an extremely small physical volume, and it has the capacity to last longer than any magnetic or optical signal could ever hope to. The first of these advantages is hard to overstate: DNA can hold a lot of data. That 700TB achievement is astonishing, but it is in no way the limit of what nucleic acids could achieve; in theory, one gram of DNA could hold up to 455 exabytes (EB) of information—more than all the current digital data in the world, by a huge margin. Even if we only ever achieve 1 percent of this theoretical capacity, due to inefficiencies and the necessity of having multiple redundant copies, that’s still 4.5EB per gram, the equivalent of 4.5 million 1TB hard drives. On the other hand, DNA can also be long-lived. This is a bit counterintuitive, as DNA is actually quite fragile and notorious for breaking while you’re trying to work with it. DNA isn’t durable, given that you have to keep it in fairly peaceful conditions, but it is stable, in that if you do care for it properly it could remain intact for millions of years. Fossilized bone has managed to keep samples safe for tens and even hundreds of thousands of years, so scientists working with high-quality glass and vacuum tubes should be able to come up with something as well. Making and replicating DNA data has also never been easier, with automated systems for creating a tailored DNA molecule from a digital code, and highthroughput replication techniques that can create thousands of copies in just an hour or two. Credit biological evolution, of course, but also the scientists who have managed to make use of biology’s highly specialized solutions.

DNA’S DOWNSIDE
On the other hand, DNA isn’t perfect. It’s good for use as a long-term library, but not as an interactive archive to be accessed quickly and often. In the case of a Twitter archive, DNA may be able to keep us from getting into a Library of Alexandria situation, but it couldn’t keep the archive searchable. Not only would the sequencing DNA isn’t perfect. It’s good for use as a long-term library, but not as an interactive archive to be accessed quickly and often. process be too slow for modern users, but the process of reading DNA introduces some small danger to the molecule itself—and the whole point is to keep this data safe. That’s why most people are talking about DNA for use as a time capsule. In addition, it’s recently been pointed out that DNA’s very facility with data storage could be our undoing—we didn’t invent it, after all. There’s an almost unimaginable amount of DNA data out there in the biological world, not counting anything extra we derive from analysis of that information, and sequencing more and more of it is becoming mankind’s primary source of new, raw data. Even YouTube can’t keep up with the biomedical and pure science research sectors in terms of the volume of new data created and in need of storage on a daily basis. DNA has more than enough storage capacity to fulfill our needs for the nearand mid-term future of data science—but storage isn’t the only thing we’re interested in doing with data. DNA likely has a part to play in keeping our knowledge and history alive for the coming decades, centuries, and millennia, but you’re not going to be running your operating system off of DNA memory anytime soon.

NEW FRONTIERS
long-term storage of information with relatively low accessibility, and shortterm storage of searchable, easily available data that provides admirable speed but unimpressive permanence. Nonetheless, to the people of the future, it may seem odd that we were ever willing to trust our digital heritage to the transient electrical states of silicon transistors, rather than the hard-nosed reliability of chemistry.

No comments:

Post a Comment