Replication: The Other Half of the Data Deduplication Equation
The enterprise data center continues to evolve, driven by ever-growing amounts of data and new demands for data availability - local and remote. These demands are driving companies to identify alternatives to existing data protection methods with deduplicating disk-based storage systems, such as Quantum's DXi Series, becoming a preferred backup target. However deduplicating data is only half the equation. To fully deliver on enterprise data protection, companies need efficient, cost-effective options so they can move this deduplicated data off-site for long term compliance and disaster recovery, or centralize and consolidate data from remote offices.
To achieve this, companies either need to copy data from disk to tape or replicate it from one disk-based system to another. It is this second option that is catching the attention of more companies. Technologies like Symantec's NetBackup Open Storage Option are making it easier for companies to centrally manage replication between two different disk-based storage systems while deduplication reduces the amount of data that companies need to store. This data reduction process enables customers to replicate their data over bandwidth-constrained corporate LANs and WANs.
Used in this context, it becomes obvious that the topics of replication and deduplication are not separate topics or one-off conversations, they are now part of the same discussion. The key to delivering on this functionality is quantifying how well these two technologies work with one another. However that is invariably dependent upon the strengths of the disk-based storage system to manage its deduplication and replication features.
Deduplicating and replicating all backup data isn't always the right answer for every company and, as companies move forward with disk-based data protection, they need to step back to evaluate how to best implement these complementary technologies. Here are some key areas companies need to consider as they start to look to incorporate deduplication and replication into their overall data protection strategy:
- How much data does each site need to deduplicate? (All, some or none)
- Are there any options to deduplication methods and can the deduplication itself be turned off?
- Which replication frequency is right for your environment (Continual, once-day, once-a-week, etc.)
- Geographical distance between replication sites? (5 km, 100 km, 200 km , 1000 km)
- How much data needs to be replicated? And what deduplication ratios should I expect?
- How much bandwidth is needed and available? (DSL, Fractional T-1, full T-1, T-3, etc)
- What granularity of control does the disk-based system offer in the replication process? (By application, partition, bandwidth monitoring, etc.)
- How easy is it to manage replication processes?How secure is the data that's being replicated?
- What about redundant data sitting in various offices. Will this data need to be replicated?
Deduplication and replication are now part of the total data protection equation but companies cannot assume "1+1=2" where these two technologies are concerned. There are numerous factors to consider as companies develop their overall data protection strategy, and treating deduplication and replication as two separate topics is simply out of the question. In upcoming blog entries, we'll take a closer look at how companies can address these specific questions.
Leave a comment