We have recently been visiting Windows 8 and Windows Server 2012 IT Camps. We will be highlighting some of the new and exciting features in this blog ove the coming months. This post is about the new Windows Server 2012 Data Deduplicationfeature which is part of the fileserver role in Microsoft Windows Server 2012.
This DeDuplication technology works on a number of levels, from files through to disk clusters:
“Deduplication segments files into variable-sizes (32-128 kilobyte chunks) using a new algorithm developed in conjunction with Microsoft research. The chunking module splits a file into a sequence of chunks in a content dependent manner. The system uses a Rabin fingerprint-based sliding window hash on the data stream to identify chunk boundaries. The chunks have an average size of 64KB and they are compressed and placed into a chunk store located in a hidden folder at the root of the volume called the System Volume Information, or “SVI folder”. The normal file is replaced by a small reparse point“
Microsoft have provided a tool that will give an indication of potential savings on file shares that already exist. The tool is called DDPEval.exe and can be found in the following location on a server 2012 installation:
We have run this against a filestore which mainly consists of large ISO files in a software repository:
As can be seen above the potential savings are 122GB on a 255GB volume which considering that these are already compressed is impressive. we are currently testing further on a selection of volumes and file types and have achieved savings of up to 80% which when considering the price of redundant disk arrays defiantly has potential savings.
If you are going to use Data Deduplication there are a number of things to consider:
Backup, as always, is a vital consideration. If you have 2TB of data on a 1TB volume will it fit if you have to restore it? So we really need a backup application that is aware of deduplication and that can backup, store and restore the data in its optimised state. Unsurprisingly Microsoft have this covered with System Centre Data Protection Manager 2012 SP1.
Consider where to use deduplication, file shares with infrequently changed files or software repositories are the best candidates.
Research the technology as there are many options to optimise the deduplication based on the data that is being optimised and its usage patterns.
There can be issues with file access on deduplicated volumes that are almost full so they need to be monitored and have clean-up jobs run / scheduled.
Copying single large files of a deduplicated volume can take longer.
It is also worth testing this thoroughly in any environment before implementing is a live system.