Archiving and Backup Best Practices

Topics: Data Archive

Download PDF

Introduction: Data Is Growing

According to research conducted by ESG, both primary storage and secondary/tertiary storage are growing at aggregated rates of approximately 35% year over year (see Figure 1).1 That’s an interesting phenomenon. One would typically presume that for every 1TB of active production data created, anywhere from 2.5TB to 6TB worth of secondary copies would be created in an effort to protect the information for traditional backups, snapshots, BC/DR, regulatory compliance, testing, and so on.

So, not only is there more secondary storage than primary storage (though higher-performing primary storage is typically more expensive), but also, all of it is growing.

The fact that both “production” storage and “protection” storage have roughly the same year-over-year growth rate highlights a problem of unsustainability. Think of an IT environment that starts out with 1TB of production data and 4TB of associated secondary data. Next year, thanks to a 35% growth rate, that 1TB of production data will become 1.35TB, while the secondary data store will grow to 5TB. The year after that, the data center will be storing 1.82TB of production data, resulting in 7.3TB of secondary storage needing management. The situation will continue to get worse.

Every year, ESG’s annual IT spending intentions survey finds that improving data backup and recovery is one of the most often mentioned priorities among IT organizations. Storage infrastructure (which includes backup/recovery) is also one of the rare IT areas getting budget increases. In 2015, 61% of respondents indicated they expected to see their overall storage/backup spending rise, compared with 9% of respondents expecting decreases, and the rest anticipating their storage spending would remain flat.

Unfortunately, even the increased spending activity doesn’t come close to matching the 35% year-over-year rate of storage growth. To boil it down, storage is growing a lot. IT organizations view backup and recovery as a perennial priority. But storage expenditures aren’t keeping pace with data proliferation. Accordingly, technology decision makers must pursue a different approach.

Archiving Is as Important as Backup, Period

Backup has been a fact of life for decades. It’s a task that practically all IT admins have dealt with on occasion, whether they like it or not.

Archiving, on the other hand, is often perceived not as a “need to” responsibility, but as a “should do” effort—with IT managers citing a lack of time, an insufficient budget, or some other semi-legitimate excuse to avoid the task.

But the reality is that organizations need archiving as much as they need backup. After all, archiving data can take considerable pressure off of backup operations. Specifically, when you preserve a file or record in an archive, you have protected it—simply by using a different yet complementary method. (It’s similar to the way that snapshots are different from backups, yet complementary to them.) After data is archived, you don’t need to back it up hourly or nightly anymore. And you don’t need to keep storing even more duplicates of it within your backup storage pool.

To visualize how an archival solution can benefit a backup environment, imagine again that fast-growing one-to-four ratio of production to protection storage (or whatever your environment’s ratio might be)

Imagine being in a data center that is holding one terabyte of production storage and four times that amount of associated protection storage (4TB) – totaling 5TB of storage, with each type growing at about 35% a year.

Now imagine this data center employing “grooming” as an archiving technique, whereby older or less-used data is removed from the production array and put into an archive—so, your 1TB of production storage gets smaller.

If your archiving solution groomed 30% of production data that was old/stagnant, your organization only needs 700GB of production storage and (based on the 1:4 ratio) 2.8TB of protection storage.

Even if both storage pools grow at 35%, the next year will require 945GB of primary storage, 3.8TB of protection storage, and 300GB of archival storage—totaling 5.025TB of storage. Now that is sustainable!

Why Are You Keeping the Data?

Archiving is just another way of preserving data, and the decision to back up data or archive it can boil down to motivation:

The motivation to back up data relates to achieving a positive operational impact. For example, an organization would be able to retrieve data from its backups to resume operations fast in the event of disaster.

The motivation to archive data often is “nobler.” For example, an organization keeps inactive data because it believes the data itself has intrinsic long-term strategic value, as determined by the organization or some outside regulatory body.

Any IT decision maker or architect who is serving in a data protection oversight role should be able to explain why his or her organization is holding onto data, including the business requirements that are satisfied by doing so. Maybe a backup copy would help them restore a server after a bad patch was uploaded, or maybe an important database could be recovered. The trick is to match up the requirement with the solution.

Recovery matches up with backup. When an active data set loses viability due to accidental deletion, corruption, inaccuracy, etc., the IT organization will need to restore a previous version. And a previous version, by definition, means a backup

Retention matches up with archival preservation. The organization may retain data because the data has inherent long-term relevance internally or externally to auditors. This situation calls for archival preservation, in which data will be stored for a long time in lieu of backing it up for a long time.

Reclamation matches up with archival grooming. In this case, the requirement is to remove stagnant data from expensive primary storage to permit more efficient use of those expensive arrays. The stagnant data is then held only within the archiving platform. Moving data, rather than copying it, is the common approach to long-term data retention. Nearly two-thirds of the organizations that retain some portion of their data for at least three years say they move data off primary production storage onto secondary or tertiary resources to reduce consumption of those expensive resources, according to ESG research.4

The task for IT is to figure out which of those three “Rs” (recovery, retention, or reclamation) is the goal, and then pick the relevant tool set. In other words, start with the desired outcome, then determine what “nuts and bolts” will make it happen.

Who Is Doing What?

Determining the goal and tying it to a tool set to use is one factor. Another consideration: Who will be doing the backup and/or archiving?

Backup is almost always done by IT operations. Thus, the same people who are responsible for the production servers and production storage are also the people with a vested interest in making sure that they can recover those production assets. Many enterprise IT operations personnel tend to focus more on the physical boxes than on the electronic “zeroes and ones” inside them. Their job is to keep an eye on the metal boxes with the blinky lights that need to be kept running.

Conversely, application owners and business unit leaders tend to be more like archivists. They think more about the digital assets within those boxes—about the information for its own sake. This group has a different motive: While the backup admin’s primary motivation is to ensure resiliency or recoverability of the boxes, the app owner/business unit manager is motivated to preserve the data to support the compliance, legal, or business functions of the organization.

In reality, the backup specialists and the archiving-oriented application owners or business unit leaders need to work together. The backup specialist may be frustrated that stagnant data is sucking up capacity. He wants it gone, but he can’t exactly clean it out himself because he does not know what it is. So, he must partner with the application owner and business unit manager—i.e., someone who actually understands the data. Together, they can decide what to send to the archive.

Again, the motive dictates “backup versus archive” or “grooming versus preservation.” That motive is defined by people. So, the question of “who wants to protect the data” drives the practical matter of “how it will be done”

How Long Will You Keep It?

At a typical company, it isn’t often that a server has to be rebuilt and restored using backups from a whole year ago. In fact, that scenario may never have occurred. However, needing data from years ago is commonplace. So it’s important to use the right kind of tool to retain it. If an employee needs to access a copy of data from one, two, or seven years ago—to reference an old version of a spreadsheet, for instance—then an archive, not a backup, is the appropriate technology.

But confusion over what constitutes a backup versus an archive abounds. A backup is created as a recovery tool. But as time goes on, that copy may remain the only copy available. Even after five or seven years, that backup remains the place to go to get an older reference copy of data. And because of that situation, people tend to treat backups as archives.

The problem with a backup-as-archive approach is that a lot of data contained on those five-year-old tapes has no reference-able value. Unlike the files held in a properly indexed, searchable, state-of-the-art archive, “backup pools” are large chunks of typically deduplicated data designed primarily for backup software platforms, not workers, to read.

Ideally, an organization should be able to enjoy the best of both worlds. They’d archive an appropriate subset of production data and maintain it longer with much greater efficiency. Their backups could then expire sooner, saving space and money. Remember, backups just don’t have the intrinsic long-term value that archives do.

Using Backups as Your Archive

Although one might presume that the answer is to invest in a dedicated archival solution, many IT environments don’t have that latitude for myriad reasons. Instead, as ESG has found, a combined 83% of IT organizations are using archival features within their backup products as part of, if not their whole, archive solution.

This is not to say that they are “treating a backup as an archive.” It means they are using various legitimate archiving features that their backup vendors included in their backup products. Those features are often a subset of an archival product’s feature set, positioned as a good-enough archival solution inside a backup technology.

Managed Backup Archival Services OR Managing Archival Backups

While using modern backup software with built-in archival features can help organizations deploying new solutions today, the daunting task of reining in previous backup and archive media requires a different approach. A “different take” on backups-as-an-archive does exist: It centers on using long-term backup tapes as the storage medium while leveraging a comprehensive suite of services to ingest, catalog, and manage those tapes so that you can actually locate and restore the data when you need it. Archival backup tapes can give an organization a new (and perhaps dramatically more effective) way to take advantage of the advanced data management capabilities that exist today.

One solution that offers this approach is the Restoration Assurance Program from Iron Mountain, a managed service that makes data stored on long-term backups and archives easily accessible by managing all tape system configurations, tape catalogs, and backups and restoring data on demand in an efficient, repeatable manner.

According to Iron Mountain, the solution can help organizations:

  • Reduce license/maintenance costs and reclaim data center space by decommissioning on-premises tape backup systems.
  • Manage legal/compliance information requests effectively with guaranteed restoration capability for all of your tape formats.
  • Protect information long term (in a secure, environmentally controlled facility).
  • Prioritize IT skillsets on forward-looking strategic initiatives.

What to Consider When Planning Your Data Protection Strategy

Organizations protect business data to accomplish a variety of goals:

  • Resiliency—creating a durable infrastructure so that end-users rarely or never experience a disruption.
  • Recovery—if a data-loss crisis occurs, a secondary copy is available to roll back a server or array to a previous point in time representing a known good state.
  • Restoration—similar to recovery, restoration converts selected data back to a previous point in time.
  • Retention—preserving data for long periods to comply with regulatory mandates or satisfy long-term operational objectives.
  • Reclamation—removing (grooming out) stagnant data from primary storage so those expensive assets can be better utilized. Reclamation can also boost a production environment’s I/O performance.

If the goal is:

  • Resiliency, then it’s best achieved using availability technologies.
  • Recovery and near-immediate resumption of functionality, then it’s best achieved through snapshotting of primary storage.
  • Restoration, then it’s best achieved through backup to a secondary system.
  • Retention, then it’s best achieved through archiving by copying data for long-term preservation.
  • Reclamation, then it’s also best achieved via archiving, but in this case, it involves not copying, but rather moving data off primary storage and into a protection silo (a.k.a. grooming the primary arrays).

Quantification

It’s important to recognize the financial risk of not pursuing the data protection processes listed. The status quo isn’t free. Ignoring those measures will incur a cost. The burden is on IT to be willing to invest pennies in a service or platform today to avoid spending thousands of dollars later. After all, most of the “R” activities described are not explicitly forecast as IT budget line items. The cost of apathy does not appear as a budget line item either. But bear in mind that:

  • Failure to address resiliency (using availability technologies), recovery (using snapshotting mechanisms), or restoration (through backup) will lengthen the amount of time an organization is down due to data-related or system-state-related issues. Downtime translates into lost productivity, and lost productivity becomes lost profitability, increased hard costs, and damaged end-user satisfaction.
  • Failure to deploy archiving technologies to manage data retention in an agile, cost-effective way will drive costs up because the organization will experience higher labor- and process-centric costs during any e-discovery or regulatory exercise. Financial penalties also may result if the data isn’t even locatable at all.
  • Failure to pursue reclamation by grooming stagnant data from production arrays will result in inefficient resource utilization and may even slow down production storage and server performance.

Those activities might appear as intermixed or interdependent, depending on your shop. The trick is to (1) financially quantify the cost of failing to implement them, and (2) calculate the CapEx and OpEx savings each activity would contribute to the bottom line when properly pursued.

The Bigger Truth

There are many reasons to have copies of data. But not all of those reasons are best achieved by “backup.” Snapshotting, availability, archiving—all of those techniques are legitimate reasons to have copies of data. An IT organization should think carefully about why it is keeping particular data sets … and who will use the copies. That way, managers will be in a better position to understand which tools to deploy (and what the strategic ramifications are) in using a broader backup-plus-archiving IT strategy.

Yes, IT professionals must grasp the “why” and the “whom” of backup and archiving. But they also need to translate a lack of action into quantifiable business impact. If they do not, they just may be confining their organization to a “backup-only” world. Instead:

  • 1. Start with involving the business owners, application/data stakeholders, legal/compliance team, records management group, and IT staff in a shared conversation about how those who use or depend on the data choose what should be kept.
  • 2. Next, quantify the ramifications of doing nothing (e.g., unfettered storage growth and the inability to retrieve data based on regulatory requests), and assess what your current IT capabilities are.

Only after you complete those two tasks (broader people conversations and honest capability assessments) are you ready to define a broader data protection and management strategy that will almost certainly include both archival and backup technologies, and assuredly utilize on-premises recovery and off-premises retention mechanisms. Your new broader data protection and management strategy will help the organization moving forward, but you’ll still have the burden of previous backups and archives to reign in, which is where solutions like the Restoration Assurance Program from Iron Mountain come in.


Related

Offsite Tape Vaulting
Offsite Tape Vaulting

Topics: Offsite Tape Vaulting

Your organization operates in a world where hardware malfunctions, human errors, software corruption, and man-made or natural disasters are an ever-present threat to your data. And you’ve probably invested significantly in backing up your data should one of these incidents impact your operations — but that’s only one part of the story.

Preserving the World's Heritage
Preserving the World's Heritage

Topics: Data Archive

Our charitable partner CyArk is out to digitally preserve world heritage sites like Mount Rushmore using 3D-laser scanners. To preserve these sites, they require a long-term, cost-effective solution for protecting and managing the data. Read this case study for the surprising answer to this important challenge.