Tape in the Age of Big Data Growth

Topics: Data Archive | Information Management: New Thinking

Data growth and tape capacity growth are two converging lines in 2015. In its latest EMC-sponsored Digital Universe study, the International Data Corporation (IDC) speculates that global data volume will grow over the next six years from 4.4 zettabytes to 44 zettabytes. Much of this growth will be driven by the Internet of Things, which will include everything from jet engines and cars to mobile phones and dog collars.

In order to generate value from this data, it must be analyzed. Some of the analytics will be in near-real time, but some trends can only be seen best in longer-term studies that call for older data. One of the greatest challenges of deriving value through the analysis of big data is finding a place to store it. According to the IDC, data is outpacing storage. In 2013, the available storage capacity could hold just 33 percent of the digital universe; by 2020, it will be able to store less than 15 percent.

Fortunately, there may be an answer to that challenge: tape.

Tackling Tape

It was a big year for tape storage in 2014. IBM, in partnership with Fujifilm, and Sony both announced new tape technology that promises to increase the uncompressed capacity of a Linear-Tape Open (LTO) tape to 155–185 terabytes per cartridge. Vendors are working with the open Linear Tape File System (LTFS) standard to advance new technologies such as Spectra Logic's BlackPearl, which will significantly decrease the load and retrieval times for smaller files from LTO libraries. IBM is integrating the General Parallel File System — which is broadly used in large-scale cloud and supercomputing — with LTFS so that entire archives can be easily retrieved around the world.

The confluence of big data and new tape technologies has experts sheepishly admitting that perhaps tape is not really dead after all. In fact, it may even be the savior of big data. All they need to do is look at some of the largest big data analytics implementations in the world and see that most of them rely on LTO- and LTFS-based tape archives to manage their computing operations at a cost far below what it would cost on flash or spinning disk.

Big Data on Tape Case Studies

Consider two examples of large supercomputing analytic users who are relying on LTO- and LTFS-based storage architectures for their needs. The National Center for Supercomputing Applications (NCSA), which runs the world's largest data archive, has more than 300 petabytes of LTFS-integrated tape library-based archives. As it prepares to run simulations, required data is swapped out of the archives and into spinning disk for analysis.

The NCSA's use of its tape archive for big data analytics mirrors one of the two major use cases for big data. Big data in motion will be used for real-time analysis of streaming data. This use case is all about making actionable predictions about what will happen in the next hour, day or week based on what happened in the past hour, day or week. What is wonderful about LTFS is that it can leave a copy of the metadata describing the streaming data in an easily searchable flash or hard disk array so that when longer-term or larger analyses are run for big data at rest-type exercises such as the NCSA's, the data can be easily found and restored to the in-memory or hard disk array needed for the analysis. Indeed, IBM is now experimenting with the use of predictive analytics linked to LTFS to assist in creating polices about which type of data should be stored where based on prior patterns.

Another example of big data growth driving the use of LTO and LTFS is the physics research center at CERN, where data is growing by more than 50 petabytes annually. CERN has more than 250 petabytes of physics data experiments stored on its LTO tapes. When it wants to conduct big data at rest-type analyses, it transfers data to Hadoop for processing.

Big data is one of the most pressing issues for businesses, which is why organizations should better understand how a modern tape management architecture can help them meet their big data requirements.

Do you have questions about data management? Read additional Knowledge Center stories on this subject, or contact Iron Mountain's Data Management team. You'll be connected with a knowledgeable product and services specialist who can address your specific challenges.


Related

Offsite Tape Vaulting
Offsite Tape Vaulting

Topics: Offsite Tape Vaulting

Your organization operates in a world where hardware malfunctions, human errors, software corruption, and man-made or natural disasters are an ever-present threat to your data. And you’ve probably invested significantly in backing up your data should one of these incidents impact your operations — but that’s only one part of the story.

Preserving the World's Heritage
Preserving the World's Heritage

Topics: Data Archive

Our charitable partner CyArk is out to digitally preserve world heritage sites like Mount Rushmore using 3D-laser scanners. To preserve these sites, they require a long-term, cost-effective solution for protecting and managing the data. Read this case study for the surprising answer to this important challenge.