Home > Storage Channel Tips > Data Backup and Data Protection > Step-by-step: How to size a virtual tape library (VTL)
Storage Channel Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA BACKUP AND DATA PROTECTION

Step-by-step: How to size a virtual tape library (VTL)


George Crump, Contributor
05.07.2009
Rating: -4.50- (out of 5)


Storage Channel Update
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


In our last article we discussed how to determine whether your client's environment can drive an enterprise-class virtual tape library (VTL). The next issues to consider are size requirements. To determine the scalability needs for a customer's VTL, you need to know how to forecast the capacity demands of the backup environment. In general, the more scalable the VTL, or any storage system for that matter, the more expensive the system is, so it's important to be as accurate as possible in determining scalability needs. But today there are a lot of variables that make an accurate calculation tricky: the size of the initial backup data set, daily change rate, retention period, as well as data reduction rates from deduplication and compression.

The use of compression and dedupe, in particular, really complicates the sizing calculation. The data reduction rates from those techniques will be different at different stages of the backup process and on different types of data. Beyond that, decreasing disk costs, plus these data reduction techniques, mean that customers expect to keep data on disk for a significantly longer time than in the past; those expectations also need to be considered when determining the optimum VTL size.

Here's a process to follow to determine how big your customer's virtual tape library needs to be:

  1. Determine how big the existing data set intended for backup is. For the sake of this exercise, we'll assume a 10 TB data set to be backed up.
  2. Determine what percentage of that data is made up of databases and messaging environments and what percentage is made up of other types of files. Databases have to be treated specially. Even though most backup applications can back databases up "hot," with most, the entire database is still backed up every night, so there is a lot of redundancy within backup sets; beyond that, databases, as well as messaging systems, compress really well.
  3. Determine the weekly change rate. You should be able to determine the size of the weekly change rate from the customer's backup application. The simplest way to determine this would be to have the customer execute a differential backup job the day before the next full job starts. In most backup applications, a differential is a backup of all the data that has changed since the last full; executing such a job the day before a full provides a fairly accurate estimation of what has changed during the week. Our example uses a 10% ...

    Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



    RELATED CONTENT
    Data Backup Strategy
    Tape backup vs. disk backup at customer sites
    VMworld 2009: Vendors showcase new vSphere4 products; Admins grapple with growing VMware deployments
    Tandberg Data looks to boost ranks of Strategic-level resellers
    NetApp and Emulex prep cloud storage push; FalconStor enters storage hardware market with its NSS HC
    Tape drive autoloader vs. cloud backup
    NetApp/Data Domain deal: The impact on VARs
    VAR resources: Backup best practices collection
    Enterprise virtual tape library (VTL) decision: Performance issues
    Oracle-Sun deal a storage game changer?; VMware extends storage features with vSphere 4
    Vembu launches backup through Amazon cloud

    Data Backup and Data Protection
    Two inroads to cloud data backup services
    Storage encryption: Leaving compliance out of the discussion
    Using Perl to script backup jobs
    How to resell cloud storage services
    How to become a cloud storage services provider
    Backup design: Source-side considerations
    How to secure primary storage for life outside the data center
    How to develop a backup data reduction strategy for customers
    EMC/Data Domain deal: How should VARs react?
    Tools for virtual machine-based disaster recovery

    RELATED RESOURCES
    2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
    Search Bitpipe.com for the latest white papers and business webcasts
    Whatis.com, the online computer dictionary


    weekly data change rate.

  4. Based on the types of data to be backed up and deduplication/compression, calculate the amount of data resulting from the first full backup. The first full backup won't see significant gains from deduplication, but there will be some. For example, if your customer is backing up 20 Windows servers, the core OS files are likely to be similar across those servers. Using a relatively small deduplication factor, 2X, is generally a safe starting point. All the data is eligible for compression, but all data does not compress to the same degree; for example, databases and text files can be compressed by up to 90% or more, whereas JPEG and Office 2007 documents may not compress at all. A good rule of thumb: Figure on a compression rate of 50% of the data set. The goal is to be relatively conservative with your calculations. While you don't want to oversize the solution, undersizing it is worse. With most systems, compression happens before deduplication, so using the above numbers on a 10TB full backup would result in compression down to 5 TB and deduplication down to 2.5 TB.
  5. Calculate the amount of data resulting from the daily incremental backups. With the big exception of databases and messaging system data, the data in this job will mostly be net-new data and, like the initial full, will not deduplicate at a high rate, so plan on about 2X reduction with deduplication. Compression will follow the same guidelines as above, but be aware that in many cases, this data is likely to be Office 2007 documents and as a result may not compress well. Databases and messaging systems are a different animal and need to be treated separately. Most backup applications, while they can do hot backups, still create a full copy of the database/messaging system to the enterprise VTL. Not only does all this data compress very well, beyond our 50% number above, it also will deduplicate very well. The majority of a database is identical to the previous backed-up copy so the level of redundancy is very high. Take guidance from the customer, but in general, database growth is relatively small on a daily basis. The data resulting from daily incremental backups will be covered in the step below since the weekly backup is a roll-up of the daily.
  6. Calculate the amount of data resulting from the weekly backups. Using the example of a 10% weekly change rate with a 10 TB data set, we'd have 1TB from the weekly backup. Of this 1TB, it's not uncommon for at least 250 GB to be from databases, and those can be essentially factored out of the calculation by about 90%, or 25 GB of real net-new database growth. The remaining 750 GB will likely compress by 50%, down to 375 GB, and with a similar deduplication rate as the initial full of 2X, down to 187.5 GB of net-new data per week after compression and deduplication, for a total of 212.5 GB of data from the weekly backup in our example.
  7. Calculate the amount of data resulting from subsequent fulls. Subsequent full backups will have a very high level of redundancy to the backups already run; 90% of the full backup didn't change by definition and the 10% weekly change was picked up by the daily backup jobs. As a result, the full backup will have very minor changes to it and should be calculated like an additional weekly job. In our example, weekly, that's another 212.5GB in a worst-case scenario.
  8. Determine how long the customer intends to keep the data on disk. The longer the customer keeps data on disk the more the deduplication ratio should improve: There will be more full backups and therefore a greater chance of duplicate data.
  9. Add everything up. Using our example factors (2.5 TB from the first full backup, 212.5 GB from the weekly backups and 212.5 GB per week from the subsequent fulls) a 20 TB VTL should be able to store about 8 months' worth of backups on disk, assuming no abnormal data growth.

Once you've determined the sizing requirements for your customer, look for a VTL that can scale easily without backup process interruption. Essentially, this means that the virtual tape library you should pick can be configured at the initial size based on your initial forecast, scale in granular increments and have the ability to quickly take advantage of new drive technologies as they become available.

About the author

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.


Rate this Tip
To rate tips, you must be a member of SearchStorageChannel.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

HomeNewsTopicsITKnowledge ExchangeTipsMultimediaWhite PapersBlogsEvents
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2006 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts