Searching...
Filters
SmallMediumLarge
Home Print Show Topic URL Previous Next

About data deduplication

Veritas Access Administrator's Guide

Data deduplication is the process by which redundant data is eliminated to improve storage utilization. Using data deduplication, you can reduce the amount of storage required for storing user and application data. It is most effective in use-cases where many copies of very similar or even identical copies of data are stored. The deduplication feature in Veritas Access provides storage optimization for primary storage (storage of active data).

Each file in the configured file system is broken into user-configurable chunks for evaluating duplicates. The smaller the chunk size, the higher the percentage of sharing due to better chances of matches.

The first deduplication of a file system is always a full deduplication of the entire file system. This is an end-to-end deduplication process that identifies and eliminates duplicate data. Any subsequent attempt to run deduplication on that file system results in incremental deduplication.

Deduplication with a small chunk size increases the deduplication time and load on the system.

Veritas Access deduplication is periodic, that is, as per the user-configured frequency, redundant data in the file system is detected and eliminated.

Use cases for deduplication

The following are potential use cases for Veritas Access file system deduplication:

  • Microsoft Exchange mailboxes

  • File systems hosting user home directories

  • Virtual Machine Disk Format (VMDK) or virtual image stores.

Relationship between physical and logical data on a file system

Table: Relationship between physical and logical data on a file system for two billion unique fingerprints with various deduplication ratios shows an estimated file system data size that can be supported for a Veritas Access deduplicated file system.

Table: Relationship between physical and logical data on a file system for two billion unique fingerprints with various deduplication ratios

Fingerprint block size

Deduplication ratio

Unique signature per TB

Physical file system data size

Effective logical file system data size

4 K

50%

128 M

16 TB

32 TB

4 K

65%

90 M

23 TB

65 TB

4 K

80%

51 M

40 TB

200 TB

8 K

50%

64 M

32 TB

64 TB

8 K

65%

45 M

46 TB

132 TB

8 K

80%

25 M

80 TB

400 TB

16 K

50%

32 M

64 TB

128 TB

16 K

65%

22 M

93 TB

266 TB

16 K

80 %

13 M

158 TB

800 TB

Overview of the deduplication workflow

Figure: Overview of the deduplication workflow

dedup-flow-diagram-user.png

The Storage> dedup commands perform administrative functions for the Veritas Access deduplication feature. The deduplication commands allow you to enable, disable, start, stop, and remove deduplication on a file system. You can also reset several deduplication configuration parameters and display the current deduplication status for your file system.

Some configuration parameters can be set as local (specific to a file system) and or global (applicable to all deduplication-enabled file systems). Local parameters override the value of a global parameter.