Issue #3946, branch, PR #3882: The
"anti-diff" prototype proposes an alternative approach to keeping track of
sequences (more specifically, FingerTree
s) of diffs. These diff sequences
are a component of the in-memory parts of the ledger state in UTxO-HD. Since
the consensus code often requires the cumulative diff of a sequence of diffs,
the current implementation "caches" cumulative diffs of each subtree in the
diff sequence. This caching allows relatively fast reconstruction of the total
cumulative diff, but this caching proved to incur a non-negligible cost: when
we manipulate diff sequences through splits and appends, we force re-computing
a logarithmic number of caches. This is problematic, since we often split and
append in consensus: we split when we flush diffs to a backing store or when
we roll back blocks, and we append when pushing blocks. The new approach
should reduce the overhead of this caching.
We implemented micro-benchmarks for the "anti-diff" prototype: we
first generate a sequence of commands (Forward
, Push
, Flush
, or
Rollback
) through a simulation, after which we measure the performance of
applying the commands to a diff sequence. In this context, Forward
means
forwarding of values through a diff, whereas Rollback
means switching to
a different fork by rolling back diffs/blocks and pushing new ones.
Moreover, we compare the performance for the two implementations: the
"legacy" approach, and the anti-diff approach.
Some preliminary results were positive, but we needed to revisit the
benchmark's configuration to obtain more definitive results. After a
discussion with @dcoutts and the consensus team about this configuration
(e.g., number of commands generated, choice of the security parameter k
),
the benchmarks should now be closer to the realistic setting. The following
configuration specifies the default configuration that is used in the
benchmarking code:
- Number of commands generated:
10_000
- Security parameter
k
: 2160
- Number of initial backing values:
100
- Number of key-value pairs deleted by a push:
50
- Number of key-value pairs inserted by a push:
50
- Number of key-value pairs forwarded by a forward:
50
- Probability of a large (in the range
[1000, 2000]
) rollback: 0.05
- Probability of a small (in the range
[1, 10]
) rollback: 0.95
- Order of commands:
- An equal number of forward and pushes.
1
flush every 10
pushes.1
rollback every 100
pushes
Moreover, we run four benchmark scenarios:
- Default configuration
- Without rollbacks
- With only small rollbacks
- Without rollbacks, larger flushes (
1
flush every 100
pushes)
How to read results
Note: this section uses documentation from the
tasty-bench package to
explain how to read the results of running our benchmarks.
Running a benchmark scenario gives us the following (curated) output:
...
AntiDiff: OK (18.27s)
2.527 s ± 47 ms, 2.1 GB allocated, 544 MB copied, 2.2 GB peak memory, 0.23x
LegacyDiff: OK (32.73s)
10.829 s ± 148 ms, 6.8 GB allocated, 2.3 GB copied, 2.2 GB peak memory
...
The output says that the first benchmark, which exercises the anti-diff
prototype, was repeatedly executed for 18.27
seconds (wall-clock time),
its predicted mean CPU time was 2.527
seconds and means of individual
samples do not often diverge from it further than ± 47
milliseconds
(double standard deviation). We also configure the RTS to collect GC
statistics, which enables tasty-bench
to estimate and report memory usage.
This data is reported as per RTSStats
fields: allocated_bytes
,
copied_bytes
and max_mem_in_use_bytes
. So, the output of the first
benchmark says that a total of 2.1 GB
of memory was allocated, that a
total of 544 MB
of memory were copied, and that the peak memory in usage
was 2.2 GB
. We read the output for the second benchmark in the same way.
Furthermore, the benchmark compares the mean CPU times for
both the anti-diff and legacy approaches: In this case, the mean CPU time
for the anti-diff approach is ~0.23x
the mean CPU time for the legacy
approach. Conversely, the mean CPU time for the legacy approach is
1 / 0.23 ~= 4.35x
the mean CPU time for the anti-diff approach. We will
call 0.23x
the improvement factor. We will call 4.35x
the speedup.
Note that these improvement factors (and reported results) are subject to
noise, randomness, the specific configuration parameters, and the whims
of statistics. Data reported by tasty-bench
is only of indicative and
comparative significance.
Results
For each of the 4 scenarios, we list the results of running the anti-diff and
legacy approaches 5 times. We run the benchmarks 5 times to get an indication
of whether the results are similar across multiple runs. Furthermore, we
calculate the accompanying ranges (if applicable) of improvement factors and
speedups.
Note also the decrease in total bytes allocated and total bytes copied for
the anti-diff approach compared to the legacy approach.
Default configuration
Name | Mean CPU time | 2*Stdev (CPU time) | Total bytes allocated | Total bytes copied | Peak memory |
---|
Run 1: AntiDiff | 2.533 s (0.23x) | 4.7 ms | 2.1 GB | 557 MB | 2.4 GB |
Run 1: LegacyDiff | 10.792 s | 162 ms | 6.8 GB | 2.3 GB | 2.4 GB |
Run 2: AntiDiff | 2.508 s (0.23x) | 245 ms | 2.1 GB | 515 MB | 2.2 GB |
Run 2: LegacyDiff | 10.850 s | 30 ms | 6.9 GB | 2.3 GB | 2.2 GB |
Run 3: AntiDiff | 2.562 s (0.23x) | 5.0 ms | 2.1 GB | 552 MB | 2.2 GB |
Run 3: LegacyDiff | 10.993 s | 149 ms | 6.9 GB | 2.3 GB | 2.2 GB |
Run 4: AntiDiff | 2.168 s (0.22x) | 5.3 ms | 1.8 GB | 434 MB | 2.0 GB |
Run 4: LegacyDiff | 9.976 s | 39 ms | 6.3 GB | 2.0 GB | 2.0 GB |
Run 5: AntiDiff | 2.527 s (0.23x) | 47 ms | 2.1 GB | 544 MB | 2.2 GB |
Run 5: LegacyDiff | 10.829 s | 148 ms | 6.8 GB | 2.3 GB | 2.2 GB |
- Improvement factor:
[0.22, 0.23]
- Speedup :
[1 / 0.23 ~= 4.35, 1 / 0.22 ~= 4.55]
No rollbacks
Name | Mean CPU time | 2*Stdev (CPU time) | Total bytes allocated | Total bytes copied | Peak memory |
---|
Run 1: AntiDiff | 1.638 s (0.19x) | 36 ms | 1.4 GB | 181 MB | 2.4 GB |
Run 1: LegacyDiff | 8.656 s | 207 ms | 5.7 GB | 1.5 GB | 2.4 GB |
Run 2: AntiDiff | 1.638 s (0.19x) | 75 ms | 1.4 GB | 181 MB | 2.2 GB |
Run 2: LegacyDiff | 8.654 s | 322 ms | 5.7 GB | 1.5 GB | 2.2 GB |
Run 3: AntiDiff | 1.663 s (0.19x) | 74 ms | 1.4 GB | 181 MB | 2.2 GB |
Run 3: LegacyDiff | 8.799 s | 216 ms | 5.7 GB | 1.5 GB | 2.2 GB |
Run 4: AntiDiff | 1.645 s (0.19x) | 51 ms | 1.4 GB | 181 MB | 2.0 GB |
Run 4: LegacyDiff | 8.732 s | 261 ms | 5.7 GB | 1.5 GB | 2.0 GB |
Run 5: AntiDiff | 1.639 s (0.19x) | 19 ms | 1.4 GB | 181 MB | 2.2 GB |
Run 5: LegacyDiff | 8.653 s | 234 ms | 5.7 GB | 1.5 GB | 2.2 GB |
- Improvement factor:
0.19
- Speedup :
1 / 0.19 ~= 5.25