Home > Storage Channel Tips > Storage Service Provider Concerns > Storage compression and data deduplication tools
Storage Channel Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

STORAGE SERVICE PROVIDER CONCERNS

Storage compression and data deduplication tools


Brian Peterson, Contributor
04.02.2007
Rating: --- (out of 5)


Storage Channel Update
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


Since the first bytes were written to magnetic media, the desire to store data has outpaced our ability to store it affordably. Magnetic tape, the first true mass-storage media, sought to solve the problem with storage compression mechanisms built into the drives. But as time has passed, the amount of information to be stored has expanded logarithmically. Old compression mechanisms are no longer enough.

Fortunately more processing power has become available to compress storage. Compressing more data into smaller spaces is no fad. Storage is often the largest expense in the data center; it just makes sense to use modern processing power to reduce the amount of storage required to meet the needs of the business. In this tech tip, I'll discuss storage compression options to present to your customers, including traditional compression and data deduplication tools.

There are two basic methods of compression: lossy and lossless. Lossy compression reduces the size of a file by literally deleting bits in the file that will not drastically affect the quality of the information as perceived by a human. Examples include mp3 audio files and jpeg images. Lossy compression is commonly used at the application layer, and works well. Data owners may elect to exchange the integrity of the original information for reduced storage space, but infrastructure people rarely have the liberty to trade quality of the data entrusted to them for disk space. Therefore, your customer may only have lossless compression techniques at their disposal.

Traditional storage compression

The oldest and most prolific storage compression technique is traditional compression. This method works best on plain text, raw images and database files. The compression engine examines a relatively small segment of data looking for patterns in the data that can be reduced. For example, the ASCII string "aaaaabbb" could be reduced to "a5b3" saving a few bytes. The compression engine can be implemented in hardware or software.

Traditional hardware compression uses a dedicated microprocessor designed specifically to handle the compression workload without restricting throughput. In almost all circumstances, hardware compression will provide better performance, both in speed and compacting ability, as compared to software compression. Hardware compression has been a staple of tape drive technology for quite some time, and all modern physical tape drives have it built in. Network Appliance recently announced that its NearStore Virtual Tape Library now supports hardware compression. The high overhead of software compression means that most VTLs take a 50% hit in throughput performance when compression is enabled. The Network Appliance device, however, is actually reported to run faster when hardware compression is enabled!

Storage compression and data deduplication resources
Getting started with data deduplication 

Leveraging data deduplication

Compression, deduplication and encryption: What's the difference? 

Traditional software compression uses a server or storage controller's main processor to compact data. This is generally slower and less efficient than hardware compression but offers a significant advantage: It's cheaper to implement and update. Software compression is everywhere: You can find it in backup software clients, which compress backup data at its source, saving network bandwidth. Some server file systems like NTFS, JFS and ZFS also support compression, increasing the usable capacity of the filesystem at the expense of IO performance. In some cases, applications also use compression mechanisms. For example, IBM's DB2 database engine claims a 50% savings in disk space when compression is enabled. Most VTLs also support software compression exchanging density for throughput performance.

When you have a choice, in almost all cases it will be favorable to use hardware based compression over software compression. It is important to note that one should avoid using both hardware and software compression on the same data stream. It will yield little or no capacity improvement, but will certainly slow throughput.

Data deduplication

Data deduplication is really just like traditional compression, except it operates on much larger datasets, eliminating all duplicate chunks of data under management. The deduped data is then often compressed using more traditional pattern-based compression techniques. The amount of space required to store deduped data is highly dependant upon the amount of redundancy in the data. Some dedupe vendors like Data Domain claim that their Global Compression dedupe technology, can obtain an average of 20:1. Until recently, data deduplication was only available at the file level. As more processing power becomes available, deduplication mechanisms work on smaller chunks of data, down to the byte level.

Many deduplication tools are already on the market. They are implemented as standalone software products and embedded directly into storage hardware. Symantec's PureDisk is a software product that bolts on to Netbackup deduplicating backup streams. It is often leveraged to reduce to total bandwidth required to back up remote offices. EMC recently purchased Avamar Technologies, which produces a deduplication software product called Axion. It runs on the host and allows storage shops to write deduplicated archives to any existing disk technology, enabling long-term cost-effective archive storage.

EMC's Centera CAS array was one of the first storage devices to implement file-level deduplication. While the Centera's deduplication services are not as powerful as more cutting edge, byte-level dedupe products, they are a stable and innovative way to efficiently store archive data. Newer to the market is Data Domain, which implements dedupe engines into their storage arrays or alternatively, front-end any existing storage behind dedupe gateways.

Data deduplication is building momentum. The swell of information pooling up in every data center makes deduplication one of the most important developments to hit storage technology users in at least a decade. As more manufacturers scramble to capitalize on the frenzy, you'll see more VTLs, intelligent fabrics, host software and disk arrays implement sophisticated data deduplication compression mechanisms.

About the author: Brian Peterson is an independent IT infrastructure Analyst. He has a deep background in enterprise storage and open systems computing platforms. A recognized expert in his field, he held positions of great responsibility on both the supplier and customer sides of IT.


Rate this Tip
To rate tips, you must be a member of SearchStorageChannel.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


RELATED CONTENT
Data Provisioning Tools
Thin provisioning
Data management tools heading toward integration
How does thin provisioning impact capacity planning?
Storage Capacity Fast Guide: Storage technologies
Data storage management software offers solutions
Data management systems vendors add thin provisioning, resellers capitalize
File area network (FAN) podcast -- trends in content management and delivery
Network-attached storage file virtualization appliances
Thin provisioning and wireless network security
Thin provisioning can save money

Storage Service Provider Concerns
Storage certification and training: Big gaping holes
Our top five storage tips -- so far
Top five storage channel tips of 2007
How to improve data backup time
Disk libraries: Picking the right one for data backup
Email classification, search and discovery for FRCP litigation
Defining data security vs. data protection
Data security services: Physical and logical data security strategies
Storage virtualization technology for the SMB
Top five data storage services tips

Additional resources
What is thin provisioning? Which applications benefit most?
iSCSI storage vs. Fibre Channel storage: A SAN tutorial
Choosing a Fibre Channel array or iSCSI SAN storage for SMBs
Thin Provisioning Prep Guide
Enhanced network-attached storage (NAS) data backup services

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.

HomeNewsTopicsITKnowledge ExchangeTipsAsk the ExpertsMultimediaWhite PapersBlogsEvents
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2006 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts