Data deduplication is the process by which redundant data is eliminated to improve storage utilization. Using data deduplication, you can reduce the amount of storage required for storing user and application data. It is most effective in use-cases where many copies of very similar or even identical copies of data are stored. The deduplication feature in Veritas Access provides storage optimization for primary storage (storage of active data).
Each file in the configured file system is broken into user-configurable chunks for evaluating duplicates. The smaller the chunk size, the higher the percentage of sharing due to better chances of matches.
The first deduplication of a file system is always a full deduplication of the entire file system. This is an end-to-end deduplication process that identifies and eliminates duplicate data. Any subsequent attempt to run deduplication on that file system results in incremental deduplication.
Deduplication with a small chunk size increases the deduplication time and load on the system.
Veritas Access deduplication is periodic, that is, as per the user-configured frequency, redundant data in the file system is detected and eliminated.
Use cases for deduplication
The following are potential use cases for Veritas Access file system deduplication:
Microsoft Exchange mailboxes
File systems hosting user home directories
Virtual Machine Disk Format (VMDK) or virtual image stores.
Relationship between physical and logical data on a file system
Table: Relationship between physical and logical data on a file system for two billion unique fingerprints with various deduplication ratios
Fingerprint block size
Deduplication ratio
Unique signature per TB
Physical file system data size
Effective logical file system data size
4 K
50%
128 M
16 TB
32 TB
4 K
65%
90 M
23 TB
65 TB
4 K
80%
51 M
40 TB
200 TB
8 K
50%
64 M
32 TB
64 TB
8 K
65%
45 M
46 TB
132 TB
8 K
80%
25 M
80 TB
400 TB
16 K
50%
32 M
64 TB
128 TB
16 K
65%
22 M
93 TB
266 TB
16 K
80 %
13 M
158 TB
800 TB
Overview of the deduplication workflow
Figure: Overview of the deduplication workflow
The Storage> dedup commands perform administrative functions for the Veritas Access deduplication feature. The deduplication commands allow you to enable, disable, start, stop, and remove deduplication on a file system. You can also reset several deduplication configuration parameters and display the current deduplication status for your file system.
Some configuration parameters can be set as local (specific to a file system) and or global (applicable to all deduplication-enabled file systems). Local parameters override the value of a global parameter.