In this era of cloud storage and ever-recoverable user accounts, the idea of
data just “disappearing” can seem downright odd. The EU has had to pass
Right to be Forgotten legislation just to require companies to work to make
it possible for data to go away. Yet given the sheer volume of data being
generated and made available on the Internet these days, can that trend
possibly persist?
Tweets already pass out of easy access through search in just a few weeks’
time. The Internet is beginning to buckle under the weight of user-generated
video. Can digital storage media progress fast enough to keep up with
mankind’s ability to generate ones and zeroes?
Perhaps it doesn’t have to. In DNA, evolution has come up with a highly
specialized form of storage: physically compact and unusually durable. DNA is
nature’s hard drive, and although it’s certainly not perfect, it also has some cool
features that beat even the most advanced digital technology. Recent advances
could take DNA’s abilities in data storage from theory to practice, bringing
molecular memory into the mosaic of technologies that let mankind store
knowledge outside the brain.
THE DATA “CRISIS”
At the end of the day, it’s a good problem to have: From the Internet to genomic
sequencing, too many people want to use this new world’s rich, innovative
features. It’s also a potentially debilitating problem that reduces user interest in
the Internet, and puts the integrity of potentially important data at risk. If we
have so much data to store and we can’t afford multiple redundant backups,
then eventually power surges and hardware failures will lead to knowledge that
fundamentally disappears.
Consider the fact that despite everything we know today—about topics
ranging from nuclear fusion to black holes to genetic engineering—we still don’t
know, and never will know, just what knowledge was lost in the burning of the
Library of Alexandria. You can’t reinvent the thoughts of ancient people, nor
can you rediscover the historical insights of unique documents and ledgers once
they’ve become ash. It might seem trivial now, but if a tweet passes on to be
forgotten and never recovered, isn’t that an equivalent sort of loss?
The Library of Congress tried to step up and manage the full archive of
Twitter posts a few years ago, but at close to half a trillion messages, the project
has stalled and may still never see the light of day.
YouTube execs have claimed the video platform is
putting up something like 400 new hours of video every
minute—a figure that, if accurate, makes it clear why
Google has struggled to make the wildly successful
business even modestly profitable. With wearables
enabling such detailed tracking of personal metrics, this
upward trend in data generation is not going to change
anytime soon.
NEXT-GENERATION DATA STORAGE
In its March 2013 issue, PC Magazine published an
article on an amazing breakthrough in DNA science:
Harvard University researchers had managed to store
700TB of information on just a single gram of material.
It was an incredible proof of concept, and a reminder of
how biology is really just genetic data given form. Yet,
in the wake of that discovery, there was a surprising
reaction: serious interest. It turns out that long-term
storage of a whole lot of data is a more pressing concern
than the researchers had anticipated. Since then,
they’ve set up a commercial business based on the idea.
The basic appeal is twofold. DNA can store dizzying
amounts of information in an extremely small physical
volume, and it has the capacity to last longer than any
magnetic or optical signal could ever hope to.
The first of these advantages is hard to overstate:
DNA can hold a lot of data. That 700TB achievement is
astonishing, but it is in no way the limit of what nucleic
acids could achieve; in theory, one gram of DNA could
hold up to 455 exabytes (EB) of information—more
than all the current digital data in the world, by a huge
margin. Even if we only ever achieve 1 percent of this
theoretical capacity, due to inefficiencies and the
necessity of having multiple redundant copies, that’s
still 4.5EB per gram, the equivalent of 4.5 million 1TB
hard drives.
On the other hand, DNA can also be long-lived. This
is a bit counterintuitive, as DNA is actually quite fragile
and notorious for breaking while you’re trying to work
with it. DNA isn’t durable, given that you have to keep it
in fairly peaceful conditions, but it is stable, in that if
you do care for it properly it could remain intact for
millions of years. Fossilized bone has managed to keep
samples safe for tens and even hundreds of thousands
of years, so scientists working with high-quality glass
and vacuum tubes should be able to come up with
something as well.
Making and replicating DNA data has also never been
easier, with automated systems for creating a tailored
DNA molecule from a digital code, and highthroughput
replication techniques that can create
thousands of copies in just an hour or two. Credit
biological evolution, of course, but also the scientists
who have managed to make use of biology’s highly
specialized solutions.
DNA’S DOWNSIDE
On the other hand, DNA isn’t perfect. It’s good for use
as a long-term library, but not as an interactive archive
to be accessed quickly and often. In the case of a Twitter
archive, DNA may be able to keep us from getting into a
Library of Alexandria situation, but it couldn’t keep the
archive searchable. Not only would the sequencing
DNA isn’t
perfect. It’s
good for use as
a long-term
library, but
not as an
interactive
archive to be
accessed
quickly and
often.
process be too slow for modern users, but the process of reading DNA
introduces some small danger to the molecule itself—and the whole point is to
keep this data safe. That’s why most people are talking about DNA for use as a
time capsule.
In addition, it’s recently been pointed out that DNA’s very facility with data
storage could be our undoing—we didn’t invent it, after all. There’s an almost
unimaginable amount of DNA data out there in the biological world, not
counting anything extra we derive from analysis of that information, and
sequencing more and more of it is becoming mankind’s primary source of new,
raw data. Even YouTube can’t keep up with the biomedical and pure science
research sectors in terms of the volume of new data created and in need of
storage on a daily basis.
DNA has more than enough storage capacity to fulfill our needs for the nearand
mid-term future of data science—but storage isn’t the only thing we’re
interested in doing with data. DNA likely has a part to play in keeping our
knowledge and history alive for the coming decades, centuries, and millennia,
but you’re not going to be running your operating system off of DNA memory
anytime soon.
NEW FRONTIERS
long-term storage of information with relatively low accessibility, and shortterm
storage of searchable, easily available data that provides admirable speed
but unimpressive permanence. Nonetheless, to the people of the future, it may
seem odd that we were ever willing to trust our digital heritage to the transient
electrical states of silicon transistors, rather than the hard-nosed reliability
of chemistry.
No comments:
Post a Comment