Executive summary
- We did most of the heavy lifting required to integrate the Conway era.
- We have property tests for the UTxO HD backing store API implementations. A possible bug was identified. Work is ongoing to make sure the property-tests cover all the relevant cases.
- We implemented and benchmarked the "anti-diff" prototype to speed up the UTxO
HD functionality. Results show a rough speedup of
4x
to5.5x
across several scenarios. Note that: "Data reported bytasty-bench
is only of indicative and comparative significance.". We are investigating additional performance improvements. The "anti-diff" prototype and benchmarks are still pending code review. - We elaborated a draft specification for the Genesis implementation and
ChainSync
jumping optimization.
Workstreams
Conway
- Integration PR of the minimal Conway era (Issue #3963, PR #3971).
- Discussions with Ledger revealed possible sources of confusion about which data should be changed in the Conway era. As a result, a new technical debt issue was raised, which does not block the integration of the Conway era (Issue #3976).
UTxO HD
Issue #3954, branch: The functionality of a backing store, which is the interface to the on-disk part of ledger state in UTxO-HD, is tested at a high level through the
OnDisk
tests. However, some functionalities remain untested, e.g., reads of ranges of keys. As such, we have implementedquickcheck-state-machine
tests that exercise backing stores directly. The tests are reusable for different backing store implementations because the tests are implementation-agnostic: Any backing store that conforms to the backing store interface can be plugged into the tests. Work is still ongoing to label/monitor the tests, such that we can verify that interesting cases are being tested. Furthermore, a possible bug has been identified in the LMDB backing store with respect to range reads, though the bug has not been resolved yet.Issue #3946, branch, PR #3882: The "anti-diff" prototype proposes an alternative approach to keeping track of sequences (more specifically,
FingerTree
s) of diffs. These diff sequences are a component of the in-memory parts of the ledger state in UTxO-HD. Since the consensus code often requires the cumulative diff of a sequence of diffs, the current implementation "caches" cumulative diffs of each subtree in the diff sequence. This caching allows relatively fast reconstruction of the total cumulative diff, but this caching proved to incur a non-negligible cost: when we manipulate diff sequences through splits and appends, we force re-computing a logarithmic number of caches. This is problematic, since we often split and append in consensus: we split when we flush diffs to a backing store or when we roll back blocks, and we append when pushing blocks. The new approach should reduce the overhead of this caching.We implemented micro-benchmarks for the "anti-diff" prototype: we first generate a sequence of commands (
Forward
,Push
,Flush
, orRollback
) through a simulation, after which we measure the performance of applying the commands to a diff sequence. In this context,Forward
means forwarding of values through a diff, whereasRollback
means switching to a different fork by rolling back diffs/blocks and pushing new ones. Moreover, we compare the performance for the two implementations: the "legacy" approach, and the anti-diff approach.Some preliminary results were positive, but we needed to revisit the benchmark's configuration to obtain more definitive results. After a discussion with @dcoutts and the consensus team about this configuration (e.g., number of commands generated, choice of the security parameter
k
), the benchmarks should now be closer to the realistic setting. The following configuration specifies the default configuration that is used in the benchmarking code:- Number of commands generated:
10_000
- Security parameter
k
:2160
- Number of initial backing values:
100
- Number of key-value pairs deleted by a push:
50
- Number of key-value pairs inserted by a push:
50
- Number of key-value pairs forwarded by a forward:
50
- Probability of a large (in the range
[1000, 2000]
) rollback:0.05
- Probability of a small (in the range
[1, 10]
) rollback:0.95
- Order of commands:
- An equal number of forward and pushes.
1
flush every10
pushes.1
rollback every100
pushes
Moreover, we run four benchmark scenarios:
- Default configuration
- Without rollbacks
- With only small rollbacks
- Without rollbacks, larger flushes (
1
flush every100
pushes)
How to read results
Note: this section uses documentation from the tasty-bench package to explain how to read the results of running our benchmarks.
Running a benchmark scenario gives us the following (curated) output:
...
AntiDiff: OK (18.27s)
2.527 s ± 47 ms, 2.1 GB allocated, 544 MB copied, 2.2 GB peak memory, 0.23x
LegacyDiff: OK (32.73s)
10.829 s ± 148 ms, 6.8 GB allocated, 2.3 GB copied, 2.2 GB peak memory
...The output says that the first benchmark, which exercises the anti-diff prototype, was repeatedly executed for
18.27
seconds (wall-clock time), its predicted mean CPU time was2.527
seconds and means of individual samples do not often diverge from it further than± 47
milliseconds (double standard deviation). We also configure the RTS to collect GC statistics, which enablestasty-bench
to estimate and report memory usage. This data is reported as perRTSStats
fields:allocated_bytes
,copied_bytes
andmax_mem_in_use_bytes
. So, the output of the first benchmark says that a total of2.1 GB
of memory was allocated, that a total of544 MB
of memory were copied, and that the peak memory in usage was2.2 GB
. We read the output for the second benchmark in the same way.Furthermore, the benchmark compares the mean CPU times for both the anti-diff and legacy approaches: In this case, the mean CPU time for the anti-diff approach is
~0.23x
the mean CPU time for the legacy approach. Conversely, the mean CPU time for the legacy approach is1 / 0.23 ~= 4.35x
the mean CPU time for the anti-diff approach. We will call0.23x
the improvement factor. We will call4.35x
the speedup.Note that these improvement factors (and reported results) are subject to noise, randomness, the specific configuration parameters, and the whims of statistics. Data reported by
tasty-bench
is only of indicative and comparative significance.Results
For each of the 4 scenarios, we list the results of running the anti-diff and legacy approaches 5 times. We run the benchmarks 5 times to get an indication of whether the results are similar across multiple runs. Furthermore, we calculate the accompanying ranges (if applicable) of improvement factors and speedups.
Note also the decrease in total bytes allocated and total bytes copied for the anti-diff approach compared to the legacy approach.
Default configuration
Name Mean CPU time 2*Stdev (CPU time) Total bytes allocated Total bytes copied Peak memory Run 1: AntiDiff 2.533 s (0.23x) 4.7 ms 2.1 GB 557 MB 2.4 GB Run 1: LegacyDiff 10.792 s 162 ms 6.8 GB 2.3 GB 2.4 GB Run 2: AntiDiff 2.508 s (0.23x) 245 ms 2.1 GB 515 MB 2.2 GB Run 2: LegacyDiff 10.850 s 30 ms 6.9 GB 2.3 GB 2.2 GB Run 3: AntiDiff 2.562 s (0.23x) 5.0 ms 2.1 GB 552 MB 2.2 GB Run 3: LegacyDiff 10.993 s 149 ms 6.9 GB 2.3 GB 2.2 GB Run 4: AntiDiff 2.168 s (0.22x) 5.3 ms 1.8 GB 434 MB 2.0 GB Run 4: LegacyDiff 9.976 s 39 ms 6.3 GB 2.0 GB 2.0 GB Run 5: AntiDiff 2.527 s (0.23x) 47 ms 2.1 GB 544 MB 2.2 GB Run 5: LegacyDiff 10.829 s 148 ms 6.8 GB 2.3 GB 2.2 GB - Improvement factor:
[0.22, 0.23]
- Speedup :
[1 / 0.23 ~= 4.35, 1 / 0.22 ~= 4.55]
No rollbacks
Name Mean CPU time 2*Stdev (CPU time) Total bytes allocated Total bytes copied Peak memory Run 1: AntiDiff 1.638 s (0.19x) 36 ms 1.4 GB 181 MB 2.4 GB Run 1: LegacyDiff 8.656 s 207 ms 5.7 GB 1.5 GB 2.4 GB Run 2: AntiDiff 1.638 s (0.19x) 75 ms 1.4 GB 181 MB 2.2 GB Run 2: LegacyDiff 8.654 s 322 ms 5.7 GB 1.5 GB 2.2 GB Run 3: AntiDiff 1.663 s (0.19x) 74 ms 1.4 GB 181 MB 2.2 GB Run 3: LegacyDiff 8.799 s 216 ms 5.7 GB 1.5 GB 2.2 GB Run 4: AntiDiff 1.645 s (0.19x) 51 ms 1.4 GB 181 MB 2.0 GB Run 4: LegacyDiff 8.732 s 261 ms 5.7 GB 1.5 GB 2.0 GB Run 5: AntiDiff 1.639 s (0.19x) 19 ms 1.4 GB 181 MB 2.2 GB Run 5: LegacyDiff 8.653 s 234 ms 5.7 GB 1.5 GB 2.2 GB - Improvement factor:
0.19
- Speedup :
1 / 0.19 ~= 5.25
- Number of commands generated:
Only small rollbacks
Name | Mean CPU time | 2*Stdev (CPU time) | Total bytes allocated | Total bytes copied | Peak memory |
---|---|---|---|---|---|
Run 1: AntiDiff | 1.833 s (0.18x) | 36 ms | 1.5 GB | 185 MB | 2.4 GB |
Run 1: LegacyDiff | 10.362 s | 867 ms | 5.8 GB | 1.6 GB | 2.4 GB |
Run 2: AntiDiff | 1.696 s (0.19x) | 30 ms | 1.5 GB | 185 MB | 2.2 GB |
Run 2: LegacyDiff | 8.822 s | 106 ms | 5.8 GB | 1.5 GB | 2.2 GB |
Run 3: AntiDiff | 1.702 s (0.19x) | 44 ms | 1.5 GB | 186 MB | 2.2 GB |
Run 3: LegacyDiff | 8.906 s | 147 ms | 5.8 GB | 1.5 GB | 2.2 GB |
Run 4: AntiDiff | 1.701 s (0.19x) | 47 ms | 1.5 GB | 185 MB | 2.0 GB |
Run 4: LegacyDiff | 8.949 s | 197 ms | 5.8 GB | 1.5 GB | 2.0 GB |
Run 5: AntiDiff | 1.677 s (0.19x) | 55 ms | 1.5 GB | 186 MB | 2.2 GB |
Run 5: LegacyDiff | 8.856 s | 177 ms | 5.8 GB | 1.5 GB | 2.2 GB |
Improvement factor:
[0.18, 0.19]
Speedup :
[1 / 0.19 ~= 5.25, 1 / 0.18 ~= 5.55]
No rollbacks, larger flushes (every 100 pushes)
Name Mean CPU time 2*Stdev (CPU time) Total bytes allocated Total bytes copied Peak memory Run 1: AntiDiff 1.643 s (0.25x) 21 ms 1.5 GB 196 MB 2.4 GB Run 1: LegacyDiff 6.591 s 351 ms 4.0 GB 1.4 GB 2.4 GB Run 2: AntiDiff 1.616 s (0.25x) 47 ms 1.5 GB 196 MB 2.2 GB Run 2: LegacyDiff 6.520 s 232 ms 4.0 GB 1.4 GB 2.2 GB Run 3: AntiDiff 1.640 s (0.25x) 34 ms 1.5 GB 196 MB 2.2 GB Run 3: LegacyDiff 6.540 s 150 ms 4.0 GB 1.4 GB 2.2 GB Run 4: AntiDiff 1.635 s (0.25x) 76 ms 1.5 GB 196 MB 2.0 GB Run 4: LegacyDiff 6.589 s 131 ms 4.0 GB 1.4 GB 2.0 GB Run 5: AntiDiff 1.628 s (0.25x) 19 ms 1.5 GB 196 MB 2.2 GB Run 5: LegacyDiff 6.490 s 5.9 ms 4.0 GB 1.4 GB 2.2 GB Improvement factor:
0.25
Speedup :
1 / 0.25 ~= 4
Genesis
- We elaborated a draft of the specification of the Genesis implementation and
the ChainSync Jumping optimization. In particular, this includes a proof
sketch that the latter preserves liveness and safety in all cases (Issue
3964).
- @nfrisby's main realization during this sprint was that he had been focusing so far on the case where the selected chain is an extension of the intersection of our peers' ChainSync candidates.
- This is the main case, ie an "absorbing" state, but it's not the only case.
- The new proof sketch begins by case splitting on that predicate, and that made the sketch quite a bit easier to follow.
- We continued working on the "happy path"
ChainSync
Jumping prototype (Issue 3960).
Technical debt
- We started working on the issues required to re-enable nightly CI runs.. Nightly CI runs have far more lax time constraints, which gives the option to run significantly more property tests than in our regular CI. To this end, we merged a PR to easily adapt the number of tests globally (PR #3947).