From genes to GIFs: Can DNA be used as a hard drive?
Scientists recently inserted a GIF file into the DNA of living bacteria – creating the first official ‘organic GIF’ and opening the doors to a range of possibilities for the future of data storage. While this development isn’t likely to give us the ability to literally transform into walking, talking hard drives, the prospects in this realm of research are still pretty wild. But why would we want to use DNA to store our digital data?
We are now at a point where there is so much data in the world that there are literally no words to describe it. In fact, any terminology that is used to quantify it needs to be approved by the International Committee for Weights and Measures. With this exponential creation of data, scientists have been searching for more reliable means of storing it all safely over a long period of time – a data equivalent of the Global Seed Vault in Norway.
The concept of using biology’s data-storage system to store our own data has been discussed for a few decades now. In terms of capacity, it’s currently understood that 1 gram of DNA has the ability to store 215 petabytes (215 million gigabytes) of data. That’s close to 10 times the amount of image data that Google maps has – meaning that in principle, we could store all of the world’s data in one room using DNA. And given that we are able to retract DNA from animals that lived tens of thousands of years ago, longevity is certainly no issue. Never mind its molecular size. But how on earth would we convert something digital into something biological? I mean, given that a lot of current tech research is focused on integrating us more with computers, doesn’t this process seem like a bit of a race in the opposite direction?
How do you turn DNA into storage?
Much like taking inspiration from the human brain to develop machine learning algorithms, this area of study uses the traits of something biological to develop an artificial product. Prior to this recent GIF insertion, all research has been undertaken using synthetic DNA that replicates real DNA strands.
All data - digital images, videos, audio files and text are reduced to strings of zeros and ones, while all DNA is made of four different bases that make up a molecule. Despite one being a two digit code and one being a four letter code, the principles are very similar.
A computer is used to translate the data code into a DNA sequence by assigning the four character DNA code to zeros and ones. It does so based on how the data is designed— whether it is a colour or a letter in text copy etc. After a given period of storage within the DNA strand, a computer is then used to decode the data.
The first notable effort was in 2002 when a group of Harvard University geneticists encoded a 52,000-word book in thousands of snippets of DNA. In 2013, the European Bioinformatics Institute copied 739 kilobytes of sound, images, and text, including a 26-second audio clip of Martin Luther King, Jr.’s "I Have a Dream" speech into DNA, and back in 2016 Harvard Medical School reported storing and retrieving 22 megabytes that included French silent film ‘A Trip to the Moon’. While the concept is very much a wild, alternate, future possibility for data storage there is already some buy- in from some of the world’s largest data owners.
The research has certainly sparked the interest of Bill Gates and co. at Microsoft who have purchased ten million strands of DNA from biology startup Twist Bioscience. In fact, in partnership with the University of Washington the company has successfully stored 202 megabytes of data into their new DNA – including the music video for OK Go’s ‘This too shall pass’. At least they’ve got good music taste at MS HQ.
What does this mean for the future of data storage?
Unfortunately we’re not quite at the point of tearing down the world’s data centres in favour of a snow buried arctic bunker. In fact, one of the greatest challenges scientists need to overcome in this space is the sheer time that it takes to write data into DNA. For example, the time that it takes for your phone to store a photo after you’ve taken it, would take several hours to store on DNA. We currently produce roughly 25 quintillion (that’s 18 zeros) bytes of data per day, and DNA can write about 20,000 bytes a minute. So while the format offers the goods in terms of longevity and capacity, at its current speed we would never be able to store the entire world’s data in a single room.
Regardless, any attempts aimed at preserving 21st century memes for the people of the future to enjoy and relate to is worthy of research.