Compression is a
data reduction technology which aims to store a data set using less physical
space. In DataDomain systems (DDOS), we do dedupe and local compression to
compress user data. De-duplication, or "dedupe," is used to identify
redundant data segments and store only unique data segments. Local compression
further compresses the unique data segments with certain compression
algorithm(s), such as lz, gzfast, gz, etc. The overall user data compression in
DDOS is the joint effort of dedupe and local compression. DDOS uses
"compression ratio" to measure the effectiveness of its data
compression. Generally, it is the ratio of the total user data size to the
total size of compressed data or the used physical space size.
DataDomain file
system is a "log-structured" dedupe file system. A log-structured
file system only appends data to the system and deletion by itself cannot free
physical space. Such file systems rely on garbage collection to reclaim
no-longer-needed space. The characteristics of the log-structured file system
and the dedupe technology combined together make it tricky to clearly
understand all aspects of compression in DDOS.
For compression,
there are many aspects we can measure. In this document, we will discuss the
details step-by-step to help understand DDOS compression. At first, we will
explain the overall system compression effect, which tells us the realistic
compression achieved in a DataDomain system: the amount of user data, the
amount of physical space consumed, and the ratio of them. This ratio is
referred to as "system effective compression ratio" in this document.
DDOS conducts dedupe inline and keeps track of the statistics of the original
user data segments, post-dedupe unique data segments, and the local compression
effect on the unique data segments. These inline compression statistics are
used to measure the inline compression effect. Note that inline compression
statistics may be measured for each write. Also, DDOS keeps track of the
statistics at different levels: files, Mtrees, and the entire system.
The content of
this document can be applied to all DDOS releases until publication of this
document, DDOS 5.3. There is no guarantee that all the contents are accurate
for future releases. In releases prior to 5.0, the entire system has only one
Mtree and the term Mtree is not explicitly called out.
2.
Compression: System Overall Effect
The system-wide
overall compression effect is measured by the system effective compression
ratio, which is the ratio of the user data size to the size of used physical
space. It is reported by the "filesys show compression" (FSC) CLI
command (the corresponding information is also available on GUI). A
sample output of FSC is shown at below.
From: 2012-06-07
13:00 To: 2012-06-14 13:00
Pre-Comp Post-Comp Global-Comp
Local-Comp Total-Comp
(GiB)
(GiB)
Factor Factor
Factor
(Reduction %)
---------------
-------- --------- ----------- ---------- -------------
Currently
Used: 614656.0
135747.2
-
- 4.5x (77.9)
Written:*
Last 7
days 6914.1
1393.7
3.4x
1.5x 5.0x (79.8)
Last 24
hrs
1067.7
218.7
3.4x
1.5x 4.9x (79.5)
---------------
-------- --------- ----------- ---------- -------------
* Does not
include the effects of pre-comp file deletes/truncates
since the last cleaning on 2011/03/19 16:09:04.
Key:
Pre-Comp = Data written before
compression
Post-Comp = Storage used after
compression
Global-Comp Factor = Pre-Comp / (Size after de-dupe)
Local-Comp Factor = (Size after de-dupe) / Post-Comp
Total-Comp Factor = Pre-Comp /
Post-Comp
Reduction % = ((Pre-Comp - Post-Comp) / Pre-Comp) * 100
The system
effective compression ratio is reported at row 1 of the result section in the
CLI output. The row is highlighted in red. The total user data size is labeled
by "Pre-Comp". The total consumed physical space (by both data and
metadata) is shown as "Post-Comp".
Note that the
"Pre-Comp" number and "Post-Comp" number are both read at
runtime. FSC implicitly synchronizes the entire system, then queries the two
numbers. These two numbers are measured in the same way as "filesys show
space".
System effective
compression ratio = Pre-Comp / Post-Comp
The rest of the
FSC output describes the inline compression statistics and we will discuss them
later.
There are a
number of operations that can affect the system effective compression ratio:
- Fastcopy. When a fastcopy is done from a file in the active namespace (not snapshots), it is a perfect dedupe, as no extra physical space is needed for the target file. The effect of a fastcopy is that we increase the user data size without consuming additional physical space. This will increase the system effective compression ratio. When a large number of fastcopies are done, the system effective compression ratio may become artificially high.
- Virtual synthetic. Virtual synthetic backups tend to show high system effective compression ratio. This is because virtual synthetic makes logical full backups, but only transfers changed/new data to DataDomain systems. The impact to system effective compression ratio of virtual synthetic is somewhat like the effect of fastcopy.
- Overwrites. Overwrites consume more physical space but do not increase the logical size of the data set. Thus, overwrites lower the system effective compression ratio.
- Store sparse files. Sparse files contain large "holes" that are counted in the logical size but do not consume physical space due to compression. As a result, they can make the system effective compression ratio seem high.
- Store small files. DDOS adds nearly 1-KB overhead to each file for certain internal metadata. When a system stores a significant number of very small files (sizes less than 1 kilobyte or in single-digit kilobytes), the overhead of metadata will drag the effective compression ratio down.
- Store pre-compressed/pre-encrypted files. Compression and encryption can significantly amplify the level of data change and reduce the possibility of dedupe. Such files usually cannot be well deduped and bring the system effective compression ratio lower.
- Deletes. Deletions reduce the logical size of the system, but the system does not get the corresponding unused space back until garbage collection runs. A large number of deleted files will make the compression ratio low until GC runs.
- Gabage Collection (GC). GC reclaims the space consumed by the data segments that are no longer referred to by any file. If a lot of files have been deleted recently, GC may increase the system compression ratio by reducing the physical space consumption footprint.
- Aggressively taking snapshots.When we take a snapshot of a Mtree, we do not change the logical size of the data set. However, all the data segments referenced by the snapshot need to be locked down, even if all files captured by the snapshot are deleted after the snapshot was taken. GC cannot reclaim the space that is still needed by snapshots, therefore having lots of snapshots may make the system effective compression ratio appear low. However, snapshots are very useful crash recovery facilities. We should never hesitate to take snapshots or set up proper snapshot schedules when needed.
3.
Compression: Inline Statistics
DDOS conducts
deduplication inline, as data is ingested by the system. It tracks the effect
of inline dedupe and local compression for each write, and accumulates the
statistics at the file level. Per-file inline compression statistics are
further aggregated at the Mtree level and at the system level. Compression is
measured based on 3 numbers in the inline statistics:
- The length of each write, referred to as raw_bytes;
- The length of all unique segments, referred to as pre_lc_size;
- The length of locally compressed unique segments, referred to as post_lc_size;
Based on the
above 3 numbers, DDOS defines two more fine-granularity compression ratios:
- Global Compression(g_comp). It equals (raw_bytes / pre_lc_size), and reflects the dedupe ratio;
- Local compression(l_comp). It equals (pre_lc_size / post_lc_size) and reflects the effect of the local compression algorithm.
The accumulated
inline compression statistics are part of the file metadata in DDOS and are
stored in the file inode. DDOS provides facilities to check the inline
compressions at all 3 levels: file, Mtree, and system-wide. We will detail them
in the following sections.
3.1
File(s) Compression
The file
compression can be checked by the "filesys show compression
<path>" CLI command, which reports the accumulated compression
statistics stored in the file inode. When a directory is specified, the inline
compression statistics of all the files directly under that directory are
summed up and reported. In the CLI output, raw_bytes is labeled as
"Original Bytes"; pre_lc_size is labeled as "Globally
Compressed"; post_lc_bytes is marked as "Locally Compressed";
the other overheads are reported as "Meta-data". The two examples
below are captured from an actual DDR.
Example 1:
inline compression statistics of a file
# filesys show
compression /data/col1/main/dir1/file_1
Total files:
1; bytes/storage_used: 17.0
Original Bytes:
78,968,112
Globally
Compressed:
7,805,052
Locally
Compressed:
4,625,442
Meta-data:
24,820
Example 2:
inline compression statistics of all files under a directory, including all
subdirectories
# filesys show
compression /data/col1/main/dir1
Total files:
9; bytes/storage_used: 16.6
Original Bytes:
79,563,175
Globally
Compressed:
8,081,177
Locally
Compressed:
4,769,120
Meta-data:
27,408
The system
reports the overall inline compression ratio in the above CLI output as
"bytes/storage_used". However, care must be taken in interpreting the
above information, as it can be misleading for various reasons. One reason is
that the pre_lc_size and post_lc_size are recorded at the time the data
operations are processed. When the file(s) that originally added those segments
to the system get(s) deleted, the number of the unique data segments in the
remaining file should be increased.
As an example,
assume a file sample.file is backed up to a Data Domain system and in the first
backup, the compression information of the file is: pre_lc_size=10GiB,
post_lc_size=5GiB. Next, assume the data of this file is unique with no data
sharing with any other file. In the 2nd backup of the file, further assume the
file gets an ideal dedupe, such that both pre_lc_size and post_lc_size should
be zero because all segments of the file already existed on the system. When
the first backup is deleted, the second backup of the file becomes the only
file that references the 5GiB of data segments. In this case, ideally, the
pre_lc_size and post_lc_size of the file in the 2nd backup should be updated
from both 0 to be 10GiB and 5GiB, respectively.
However, there
is no way to detect for which file(s) that should be done, so the inline
compression statistics of the existing file(s) are left unchanged. Another fact
that affects the above numbers is the accumulated statistics. When a file gets
a lot of overwrites, it is unknown to what extent the accumulated statistics
reflect the writes that introduced the live data. Thus, over a long time, the
inline compression statistics can only be treated as a heuristics to roughly
estimate the compression of a particular file.
Another fact
worth highlighting is that the inline compression of a file cannot be measured
for an arbitrary time interval. Note that the file inline compression
statistics are an accumulated result and cover all the writes that the file has
ever received. When a file receives lots of overwrite, the raw_bytes can be far
larger than the logical size of the file. For sparse files, the file sizes may
be much larger than the "Original Bytes".
3.2
Mtree Compression
We can check the
compression of a particular Mtree with the "mtree show compression"
(MSC) CLI command. It has been argued that the absolute values in the inline
compression statistics are accumulated. Given that the lifetime of an MTree can
be very long, the absolute values become less and less informative over time.
To address this issue, we use the deltas of the inline compression statistics
and report only compressions for certain time intervals. The underlying
approach is that we periodically dump the Mtree inline compression statistics
to a log. When a client queries Mtree compression with MSC CLI, we use the log
to calculate the deltas of the numbers for compression reporting. By default,
MSC reports compressions for the last 7 days and the last 24 hours. A user can
specify any time period that he/she is interested in.
Let’s
demonstrate the details by an example. Let’s assume we have the following log
for Mtree A:
3:00AM, raw_bytes=11000GB, pre_lc_size=100GB, post_lc_size=50GB
4:00AM, raw_bytes=12000GB, pre_lc_size=200GB, post_lc_size=100GB
Then the
compression of Mtree A for this hour is
g_comp = (12000-11000)/(200-100) = 10
l_comp = (200-100)/(100-50) = 2
overall compression ratio = (12000-11000)/(100-50) = 20
Clearly, the
above compression ratio calculation does nothing with the data set size. For
example, the above Mtree may only have 500GB logical data.
MSC supports the
"daily" and "daily-detailed" option. So does the
"filesys show compression" CLI command. When "daily" is
specified, the CLI reports the daily compression in a calendar fashion. It uses
the daily deltas of the raw_bytes and post_lc_size to compute the daily
compression ratio. When "daily-detailed" is specified, the CLI shows
all the 3 deltas (of the raw_bytes, pre_lc_size, and post_lc_size,
respectively) for each day; it also computes the g_comp and l_comp besides
"Total Compression Factor".
Sample outputs
from actual systems are included in the Appendix.
3.3
System Compression
Once we
understand how compression is reported on Mtrees, it is straightforward to
extend the concept to the entire system. The system-wide compression inline
statistics collection and reporting are exactly the same as with Mtrees. The
only difference is the scope, as one is in a particular Mtree, while one is
over the entire system. The results can be checked by using the "filesys
show compression" CLI. In fact, we already included an example in Section
2. The "last 7 days" and "last 24 hours" system compression
is reported in the last 2 lines of the result section in the FSC output.
4.
GDA
GDA is the
abbreviation of "Global Deduplication Array". It is a clustered
solution and can include up to 2 nodes. GDA presents a unified storage space to
the users. The compression information is aggregated from all the nodes.
Therefore nothing is treated specially for compression reporting.
Theoretically, we can treat a GDA as a single-node system when we investigate
its data compression reports.
5.
Archivers
On Archivers,
the storage is separated into two tiers: the active tier, and the archive tier.
They are two independent dedupe domains. User can only inject data to the
active tier. Later on, a user can use the data-movement facilities provided by
DDOS to migrate data from the active tier to the archive tier. Thus the space
and compression measurement and reporting are handled in each tier. But at file
a level, we do not differentiate tier and report inline compression statistics;
they are exactly the same as what we described in Section 3.1.
6.
Mysteries brought by Dedupe
The last topic
to highlight for understanding DDOS compression is the characteristics of
dedupe, which is referred to as "global compression" in many Data
Domain documents. Although the terminology contains the word
"compression", it is entirely different than the traditional concept
of compression, which is also provided by DDOS under the name "local
compression".
Local
compression simply reduces the size of a piece of data using a certain
algorithm (note that some kinds of data are not compressible and applying
compression algorithms on them may in fact slightly increase data size).
Usually, once an algorithm is decided, the data itself is the only factor of
the compression ratio.
However, dedupe
is different. It is not a local concept, it is "global". An incoming
data segment is deduped against all the existing data segments in a dedupe
domain, which includes all the data on non-archiver DataDomain systems. The
data segment itself does not matter in the dedupe procedure.
In practice, we
rarely see high dedupe ratio in the initial backup of a data set. In initial
backups, often the major data reduction comes from local compression. When the
subsequent backups land on the DataDomain systems, Dedupe shows its strength
and becomes the dominant factor for compression. The effectiveness of dedupe
relies on the fact that the change rate of a data set is generally low from
backup to backup. For this reason, data sets with high change rates cannot be
well deduped. When the backup application inserts its own metadata chunks
(referred to as markers by DataDomain) into the backup images at very high
frequency, it also may not get a good dedupe ratio. Our marker-handling
techniques can help in some cases, but not always.
Given these
observations, what shall you expect?
- Do not be surprised when the initial backups only achieve small system effective compression ratio, say 2 or 3. Dedupe usually has very little opportunity to show its strength in initial backups.
- The global compression ratio of an incremental backup is lower than the compression ratio of the corresponding full backup. This is because an incremental backup contains only changed or new files compared to the immediate earlier backup. The global compression ratio depends on the percentage of new data within the incremental backup.
- The dedupe ratio of a full backup (the non-initial ones), can also be low in a number of scenarios. Some frequently- observed scenarios include: a large percentage of data gets changed, the data set is dominated by small files, backup applications add a lot of closely spaced markers, a database backup either incrementally and/or with small block size, etc. When low compression ratio is observed in a full backup with low data change rate, we need to check if it is one of the cases we just described, or if the developers need to be involved.
- Do not think assume that the compression of a later backup image (files) is always better than the initial one. A consecutive backup image shows can show high dedupe ratio because the initial and earlier backup images already added most of the data to the system. When all the earlier backup images are deleted, the global and local compression ratio of the earliest existing backup image may be still very high, but it only tells us that it got good dedupe when it was added to the system, nothing else. So when you delete a file, which has very high global and local compression ratio and is the last backup image of a particular data set, you may release much more space than the size derived from the compression ratio.
- Do not compare the compression ratios of the same data set on different systems, regardless the way you add the data set: by copying through protocols like NFS, CIFS; or by replication. This is because that each system is an independent dedupe domain. It does not make sense to compare the dedupe ratio in different dedeupeup domains, even the interested data set is the same.
7.
Summary
Measuring
compression is difficult in dedupe file systems, but it is even harder in
log-structured dedupe file systems. We need to understand how dedupe works and
how compression statistics are tracked. Compression ratios are very useful
information to understand the behavior of a particular system. The system
effective compression ratio is the most important, reliable, and informative
onemeasure. The inline compression statistics can be very helpful too, but note
that they might be no more than heuristics in some circumstances. Clearly,
there is still room to improve on compression tracking and reporting.
Nevertheless, DDOS already does a reasonably good job in general.
Appendix.
Sample outputs of Mtree Show Compression
Assume there is
an Mtree which holds 254792.4 GiB user data. It only received 4379.3 GiB new
data in the last 7 days, and 78.4 GiB new data in the last 24 hours,
respectively. Of course, other time intervals can be specified. The
"daily" option reports the inline compression statistics for the last
33 days. When "daily-detailed" option is provided, the total
compression ratios are further detailed by separating them to global and local
compression ratios.
# mtree list
/data/col1/main
Name
Pre-Comp (GiB) Status
---------------
-------------- ------
/data/col1/main
254792.4 RW
---------------
-------------- ------
D :
Deleted
RO : Read
Only
RW : Read
Write
RD :
Replication Destination
RLE :
Retention-Lock Enabled
RLD :
Retention-Lock Disabled
# mtree show
compression /data/col1/main
From: 2012-06-07
14:00 To: 2012-06-14 14:00
No data
available for the selected interval.
Pre-Comp Post-Comp Global-Comp
Local-Comp Total-Comp
(GiB)
(GiB)
Factor Factor
Factor
(Reduction %)
-------------
-------- --------- ----------- ---------- -------------
Written:*
Last 7
days 4379.3
883.2
3.4x
1.5x 5.0x (79.8)
Last 24
hrs 784.6
162.1
3.3x
1.4x 4.8x (79.3)
-------------
-------- --------- ----------- ---------- -------------
* Does not
include the effects of pre-comp file deletes/truncates
since the last cleaning on 2011/03/19 16:09:04.
Key:
Pre-Comp = Data written before
compression
Post-Comp = Storage used after
compression
Global-Comp Factor = Pre-Comp / (Size after de-dupe)
Local-Comp Factor = (Size after de-dupe) / Post-Comp
Total-Comp Factor = Pre-Comp /
Post-Comp
Reduction % = ((Pre-Comp - Post-Comp) / Pre-Comp) * 100
# mtree show
compression /data/col1/main daily
From: 2012-05-12
12:00 To: 2012-06-14 12:00
Sun
Mon Tue Wed
Thu Fri Sat
Weekly
-----
----- ----- ----- ----- -----
----- ------ -----------------
-13-
-14- -15- -16- -17-
-18-
-19-
Date
- 432.0 405.9 284.1 438.8 347.0 272.7 331.4 2511.8 Pre-Comp
85.5
66.2 45.3 81.9 61.4
57.4 66.3 464.1
Post-Comp
5.0x
6.1x 6.3x 5.4x 5.7x
4.7x 5.0x 5.4x Total-Comp Factor
-20-
-21- -22- -23- -24-
-25-
-26-
- 478.0 387.8 450.2 533.1 386.0 258.4 393.6 2887.1
- 100.6 81.5 100.8 119.0 84.0 40.6 75.3 601.8
4.8x
4.8x 4.5x 4.5x 4.6x
6.4x 5.2x
4.8x
-27-
-28- -29- -30- -31-
-1-
-2-
27.6
1.0 0.4 470.7 467.3
517.7 641.9
2126.7
4.9
0.2 0.1 83.9 92.3
89.8 140.1
411.2
5.6x
5.6x 4.3x 5.6x 5.1x 5.8x
4.6x
5.2x
-3-
-4- -5- -6-
-7- -8-
-9-
- 539.6 495.0 652.8 658.7 537.1 398.7 305.5 3587.3
- 110.8 108.0 139.4 137.0 111.5 78.3 48.3 733.3
4.9x
4.6x 4.7x 4.8x 4.8x
5.1x 6.3x
4.9x
-10-
-11- -12- -13-
-14-
- 660.2 738.3 787.2 672.9 796.9 3655.5
- 143.9 152.5 167.6 126.9 163.3 754.2
4.6x
4.8x 4.7x 5.3x
4.9x
4.8x
-----
----- ----- ----- ----- -----
----- ------ -----------------
Pre-Comp Post-Comp Global-Comp
Local-Comp Total-Comp
(GiB)
(GiB)
Factor Factor
Factor
(Reduction %)
--------------
-------- --------- ----------- ---------- -------------
Written:*
Last 33
days 14768.3
2964.5
3.4x
1.5x 5.0x (79.9)
Last 24
hrs
784.6
162.1
3.3x
1.4x 4.8x (79.3)
--------------
-------- --------- ----------- ---------- -------------
* Does not
include the effects of pre-comp file deletes/truncates
since the last cleaning on 2011/03/19 16:09:04.
Key:
Pre-Comp = Data written before
compression
Post-Comp = Storage used after
compression
Global-Comp Factor = Pre-Comp / (Size after de-dupe)
Local-Comp Factor = (Size after de-dupe) / Post-Comp
Total-Comp Factor = Pre-Comp /
Post-Comp
Reduction % = ((Pre-Comp - Post-Comp) / Pre-Comp) * 100
# mtree show
compression /data/col1/main daily-detailed
From: 2012-05-12
12:00 To: 2012-06-14 12:00
Sun
Mon Tue Wed
Thu Fri Sat
Weekly
-----
----- ----- ----- ----- -----
----- ------ ------------------
-13-
-14- -15- -16- -17-
-18-
-19-
Date
- 432.0 405.9 284.1 438.8 347.0 272.7 331.4 2511.8 Pre-Comp
85.5
66.2 45.3 81.9 61.4
57.4 66.3 464.1
Post-Comp
3.5x
4.1x 4.3x 3.6x 3.8x
3.3x 3.4x 3.7x Global-Comp Factor
1.4x
1.5x 1.5x 1.5x 1.5x
1.4x 1.5x 1.5x Local-Comp Factor
5.0x
6.1x 6.3x 5.4x 5.7x
4.7x 5.0x 5.4x Total-Comp Factor
80.2
83.7 84.1 81.3 82.3 78.9
80.0 81.5 Reduction
%
-20-
-21- -22- -23- -24-
-25-
-26-
- 478.0 387.8 450.2 533.1 386.0 258.4 393.6 2887.1
- 100.6 81.5 100.8 119.0 84.0 40.6 75.3 601.8
3.3x
3.3x 3.0x 3.0x 3.3x
4.1x 3.6x
3.3x
1.4x
1.5x 1.5x 1.5x 1.4x
1.5x 1.4x
1.5x
4.8x
4.8x 4.5x 4.5x 4.6x
6.4x 5.2x
4.8x
79.0
79.0 77.6 77.7 78.2
84.3 80.9
79.2
-27-
-28- -29- -30- -31-
-1-
-2-
27.6
1.0 0.4 470.7 467.3
517.7 641.9
2126.7
4.9
0.2 0.1 83.9 92.3
89.8 140.1
411.2
4.4x
3.7x 2.6x 3.8x 3.5x
3.9x 3.2x
3.5x
1.3x
1.5x 1.6x 1.5x 1.4x
1.5x 1.5x
1.5x
5.6x
5.6x 4.3x 5.6x 5.1x
5.8x 4.6x
5.2x
82.1
82.2 76.8 82.2 80.3
82.7 78.2
80.7
-3-
-4- -5- -6-
-7- -8-
-9-
- 539.6 495.0 652.8 658.7 537.1 398.7 305.5 3587.3
- 110.8 108.0 139.4 137.0 111.5 78.3 48.3 733.3
3.4x
3.1x 3.2x 3.4x 3.3x
3.4x 4.1x
3.3x
1.4x
1.5x 1.5x 1.4x 1.4x
1.5x 1.6x
1.5x
4.9x
4.6x 4.7x 4.8x 4.8x
5.1x 6.3x
4.9x
79.5
78.2 78.6 79.2 79.2 80.4
84.2
79.6
-10-
-11- -12- -13-
-14-
- 660.2 738.3 787.2 672.9 796.9 3655.5
- 143.9 152.5 167.6 126.9 163.3 754.2
3.1x
3.4x 3.2x 3.7x
3.4x
3.3x
1.5x
1.4x 1.5x 1.4x
1.5x
1.5x
4.6x
4.8x 4.7x 5.3x
4.9x
4.8x
78.2
79.3 78.7 81.1
79.5
79.4
-----
----- ----- ----- ----- -----
----- ------ ------------------
Pre-Comp Post-Comp Global-Comp
Local-Comp Total-Comp
(GiB)
(GiB)
Factor Factor
Factor
(Reduction %)
--------------
-------- --------- ----------- ---------- -------------
Written:*
Last 33
days 14768.3
2964.5 3.4x
1.5x 5.0x (79.9)
Last 24
hrs
784.6
162.1
3.3x
1.4x 4.8x (79.3)
--------------
-------- --------- ----------- ---------- -------------
* Does not
include the effects of pre-comp file deletes/truncates
since the last cleaning on 2011/03/19 16:09:04.
Key:
Pre-Comp = Data written before
compression
Post-Comp = Storage used after
compression
Global-Comp Factor = Pre-Comp / (Size after de-dupe)
Local-Comp Factor = (Size after de-dupe) / Post-Comp
Total-Comp Factor = Pre-Comp /
Post-Comp
Reduction % = ((Pre-Comp - Post-Comp) / Pre-Comp) * 100