Home > Five questions to ask in a data deduplication project
Channel Checklist:
EMAIL THIS

Five questions to ask in a data deduplication project

29 May 2008 | SearchStorageChannel.com

Storage Channel Update
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

By Martha Young, Contributor

Service provider takeaway: Service providers should explore five important questions with customers interested in implementing data deduplication, to help gain a better understanding of project scope.

Data deduplication is as much a business consideration as it is a technical concern. From a business perspective, deduplicating data adds value by improving in-line performance and data integrity, adding value and intelligence to a business's intellectual property; reducing the amount of time required for backup and recovery, an important consideration for customers looking at business continuity and disaster recovery solutions; and reducing the cost associated with physical storage, including hardware acquisition, management and administration, and energy consumption.
Learn more
Search our library of expert answers to storage channel questions, or ask the channel experts on IT Knowledge Exchange.
With technology budgets coming under intense scrutiny, data deduplication is an obvious area worth investing in and implementing for a near-term return on investment.

There are several considerations your customers must take into account when first investigating data deduplication options. Here are the questions to be explored.

What types of files need to be stored?

In today's business world, users are generating vast amounts of intellectual property across a wide variety of mediums. Firms need to address the unique file storage requirements for voice, video, data, electronic mail, instant messaging, mobile computing and other types of files. File type is important in the data deduplication equation because it can indicate differences in file size. For instance, a streaming video file would require substantially more storage and, consequently, bandwidth to transfer to storage than email documents. If a service provider is supporting a lot of video, a localized solution will make more economic sense.

How long do the files need to be stored?

The answer to this question rests within the regulations your customer needs to comply with. Data storage and accessibility regulations include the Sarbanes-Oxley (SOX) Act, the Healthcare Insurance Portability and Accountability Act (HIPAA) and the Gramm-Leach-Bliley Act (GLBA). In general, there is a mountain of regulations requiring data backup, recovery, accessibility and security. Each regulation has its own framework and objectives that your customers must be able to meet. If all of the varieties of communication need to be stored in excess of 50 years, then data deduplication is mandatory, if only from a manageability and retrieval perspective.

Where will data deduplication be conducted?

There are only two places where deduplication can be conducted: at the source or in a storage appliance. Data deduplication at the source offers the key benefits of reducing the amount of disk space needed to store the backups and reducing the impact on network bandwidth required to back up a given set of data. The drawback to deduplication at the source is the impact on the server. It takes a significant number of compute cycles on each server.

Some companies have opted to transfer the compute cycle requirements to a storage appliance and conduct their data deduplication at the appliance. This eliminates the agent footprint on the storage server and CPU cycle impact, but it does add another device or set of devices to the network that will need to be monitored, maintained and managed.

When deciding where deduplication should take place, it's important to consider the geographical distribution of the company. For a customer with numerous branch offices, it makes economic sense to deduplicate on a local level and reduce the overall impact on the WAN. For a customer that leverages a data center, deduplicating within an appliance makes sense since it allows customers to continue using existing backup methods and procedures, reducing the server performance impact.

Which deduplication approach is preferred: software-based or hardware-based?

Data deduplication can be performed using either a software-based solution or a hardware-based solution. A software-based solution enables companies to eliminate data redundancy directly at the source. As noted, a software-based solution does carry the burden of installing an agent on each server, as well as a substantial CPU cycle impact. Software-based solutions are relatively inexpensive to deploy compared with hardware solutions, but they do require ongoing maintenance to keep the clients and agents up to date. A software-based solution would be ideal in small and medium-sized businesses (SMBs), as well as within large enterprises that are geographically distributed.

Deduplication appliances, on the other hand, are ideal for a data center environment. An appliance solution offloads the transactional processing and subsequent CPU impact of the server. Deduplication appliances have a reputation of high performance and scalability, but companies considering using an appliance-based solution need to consider the bigger-picture impact of bandwidth utilization as well as increased network complexity. A hardware-based data deduplication solution is optimized for the data center environment: In addition to offloading server CPU cycles, an appliance in the data center can be integrated with other storage platforms to maximize storage usage.

Will data be encrypted and, if so, when?

When it comes to encryption, compression and data deduplication, the order of execution is critical. Compression eliminates redundancy in files (thereby reducing file size). Deduplication eliminates redundant files. Encryption converts data into a random data stream. If a company encrypts its data prior to transmission, it may become impossible to compress or deduplicate it, which would unnecessarily inflate the amount of storage required, as well as the associated costs. To optimize your customer's storage infrastructure, advise them to compress, deduplicate and then encrypt their files. By following this order of operation, it becomes clear that compression and deduplication must take place at the server, then encrypted prior to being transmitted.

As companies seek to achieve data storage and retrieval regulatory compliance at the lowest possible cost, these five questions should be addressed during the data deduplication decision process. And once a solution is chosen, you should help your clients evaluate whether the implementation will meet their business goals and objectives.

About the author
Martha Young is co-founder and CEO of Nova Amber LLC, a business consulting company specializing in business process virtualization. She has co-authored three books on virtual business processes: The Case for Virtual Business Processes, The Virtual Worker's Handbook and iExec Enterprise Essentials Companion Guide.



Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Data Backup Strategy
Tape backup vs. disk backup at customer sites
VMworld 2009: Vendors showcase new vSphere4 products; Admins grapple with growing VMware deployments
Tandberg Data looks to boost ranks of Strategic-level resellers
NetApp and Emulex prep cloud storage push; FalconStor enters storage hardware market with its NSS HC
Tape drive autoloader vs. cloud backup
NetApp/Data Domain deal: The impact on VARs
VAR resources: Backup best practices collection
Step-by-step: How to size a virtual tape library (VTL)
Enterprise virtual tape library (VTL) decision: Performance issues
Oracle-Sun deal a storage game changer?; VMware extends storage features with vSphere 4

Data Backup and Data Protection
Two inroads to cloud data backup services
Storage encryption: Leaving compliance out of the discussion
Using Perl to script backup jobs
How to resell cloud storage services
How to become a cloud storage services provider
Backup design: Source-side considerations
How to secure primary storage for life outside the data center
How to develop a backup data reduction strategy for customers
EMC/Data Domain deal: How should VARs react?
Tools for virtual machine-based disaster recovery

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


HomeNewsTopicsITKnowledge ExchangeTipsMultimediaWhite PapersBlogsEvents
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2006 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts