Skip to main content

56 posts tagged with "consensus"

View All Tags

· 9 min read
Damian Nadales

Executive summary

  • We did most of the heavy lifting required to integrate the Conway era.
  • We have property tests for the UTxO HD backing store API implementations. A possible bug was identified. Work is ongoing to make sure the property-tests cover all the relevant cases.
  • We implemented and benchmarked the "anti-diff" prototype to speed up the UTxO HD functionality. Results show a rough speedup of 4x to 5.5x across several scenarios. Note that: "Data reported by tasty-bench is only of indicative and comparative significance.". We are investigating additional performance improvements. The "anti-diff" prototype and benchmarks are still pending code review.
  • We elaborated a draft specification for the Genesis implementation and ChainSync jumping optimization.

Workstreams

Conway

  • Integration PR of the minimal Conway era (Issue #3963, PR #3971).
  • Discussions with Ledger revealed possible sources of confusion about which data should be changed in the Conway era. As a result, a new technical debt issue was raised, which does not block the integration of the Conway era (Issue #3976).

UTxO HD

  • Issue #3954, branch: The functionality of a backing store, which is the interface to the on-disk part of ledger state in UTxO-HD, is tested at a high level through the OnDisk tests. However, some functionalities remain untested, e.g., reads of ranges of keys. As such, we have implemented quickcheck-state-machine tests that exercise backing stores directly. The tests are reusable for different backing store implementations because the tests are implementation-agnostic: Any backing store that conforms to the backing store interface can be plugged into the tests. Work is still ongoing to label/monitor the tests, such that we can verify that interesting cases are being tested. Furthermore, a possible bug has been identified in the LMDB backing store with respect to range reads, though the bug has not been resolved yet.

  • Issue #3946, branch, PR #3882: The "anti-diff" prototype proposes an alternative approach to keeping track of sequences (more specifically, FingerTrees) of diffs. These diff sequences are a component of the in-memory parts of the ledger state in UTxO-HD. Since the consensus code often requires the cumulative diff of a sequence of diffs, the current implementation "caches" cumulative diffs of each subtree in the diff sequence. This caching allows relatively fast reconstruction of the total cumulative diff, but this caching proved to incur a non-negligible cost: when we manipulate diff sequences through splits and appends, we force re-computing a logarithmic number of caches. This is problematic, since we often split and append in consensus: we split when we flush diffs to a backing store or when we roll back blocks, and we append when pushing blocks. The new approach should reduce the overhead of this caching.

    We implemented micro-benchmarks for the "anti-diff" prototype: we first generate a sequence of commands (Forward, Push, Flush, or Rollback) through a simulation, after which we measure the performance of applying the commands to a diff sequence. In this context, Forward means forwarding of values through a diff, whereas Rollback means switching to a different fork by rolling back diffs/blocks and pushing new ones. Moreover, we compare the performance for the two implementations: the "legacy" approach, and the anti-diff approach.

    Some preliminary results were positive, but we needed to revisit the benchmark's configuration to obtain more definitive results. After a discussion with @dcoutts and the consensus team about this configuration (e.g., number of commands generated, choice of the security parameter k), the benchmarks should now be closer to the realistic setting. The following configuration specifies the default configuration that is used in the benchmarking code:

    • Number of commands generated: 10_000
    • Security parameter k: 2160
    • Number of initial backing values: 100
    • Number of key-value pairs deleted by a push: 50
    • Number of key-value pairs inserted by a push: 50
    • Number of key-value pairs forwarded by a forward: 50
    • Probability of a large (in the range [1000, 2000]) rollback: 0.05
    • Probability of a small (in the range [1, 10]) rollback: 0.95
    • Order of commands:
      • An equal number of forward and pushes.
      • 1 flush every 10 pushes.
      • 1 rollback every 100 pushes

    Moreover, we run four benchmark scenarios:

    • Default configuration
    • Without rollbacks
    • With only small rollbacks
    • Without rollbacks, larger flushes (1 flush every 100 pushes)

    How to read results

    Note: this section uses documentation from the tasty-bench package to explain how to read the results of running our benchmarks.

    Running a benchmark scenario gives us the following (curated) output:

    ...
    AntiDiff: OK (18.27s)
    2.527 s ± 47 ms, 2.1 GB allocated, 544 MB copied, 2.2 GB peak memory, 0.23x
    LegacyDiff: OK (32.73s)
    10.829 s ± 148 ms, 6.8 GB allocated, 2.3 GB copied, 2.2 GB peak memory
    ...

    The output says that the first benchmark, which exercises the anti-diff prototype, was repeatedly executed for 18.27 seconds (wall-clock time), its predicted mean CPU time was 2.527 seconds and means of individual samples do not often diverge from it further than ± 47 milliseconds (double standard deviation). We also configure the RTS to collect GC statistics, which enables tasty-bench to estimate and report memory usage. This data is reported as per RTSStats fields: allocated_bytes, copied_bytes and max_mem_in_use_bytes. So, the output of the first benchmark says that a total of 2.1 GB of memory was allocated, that a total of 544 MB of memory were copied, and that the peak memory in usage was 2.2 GB. We read the output for the second benchmark in the same way.

    Furthermore, the benchmark compares the mean CPU times for both the anti-diff and legacy approaches: In this case, the mean CPU time for the anti-diff approach is ~0.23x the mean CPU time for the legacy approach. Conversely, the mean CPU time for the legacy approach is 1 / 0.23 ~= 4.35x the mean CPU time for the anti-diff approach. We will call 0.23x the improvement factor. We will call 4.35x the speedup.

    Note that these improvement factors (and reported results) are subject to noise, randomness, the specific configuration parameters, and the whims of statistics. Data reported by tasty-bench is only of indicative and comparative significance.

    Results

    For each of the 4 scenarios, we list the results of running the anti-diff and legacy approaches 5 times. We run the benchmarks 5 times to get an indication of whether the results are similar across multiple runs. Furthermore, we calculate the accompanying ranges (if applicable) of improvement factors and speedups.

    Note also the decrease in total bytes allocated and total bytes copied for the anti-diff approach compared to the legacy approach.

    Default configuration

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff2.533 s (0.23x)4.7 ms2.1 GB557 MB2.4 GB
    Run 1: LegacyDiff10.792 s162 ms6.8 GB2.3 GB2.4 GB
    Run 2: AntiDiff2.508 s (0.23x)245 ms2.1 GB515 MB2.2 GB
    Run 2: LegacyDiff10.850 s30 ms6.9 GB2.3 GB2.2 GB
    Run 3: AntiDiff2.562 s (0.23x)5.0 ms2.1 GB552 MB2.2 GB
    Run 3: LegacyDiff10.993 s149 ms6.9 GB2.3 GB2.2 GB
    Run 4: AntiDiff2.168 s (0.22x)5.3 ms1.8 GB434 MB2.0 GB
    Run 4: LegacyDiff9.976 s39 ms6.3 GB2.0 GB2.0 GB
    Run 5: AntiDiff2.527 s (0.23x)47 ms2.1 GB544 MB2.2 GB
    Run 5: LegacyDiff10.829 s148 ms6.8 GB2.3 GB2.2 GB
    • Improvement factor: [0.22, 0.23]
    • Speedup : [1 / 0.23 ~= 4.35, 1 / 0.22 ~= 4.55]

    No rollbacks

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff1.638 s (0.19x)36 ms1.4 GB181 MB2.4 GB
    Run 1: LegacyDiff8.656 s207 ms5.7 GB1.5 GB2.4 GB
    Run 2: AntiDiff1.638 s (0.19x)75 ms1.4 GB181 MB2.2 GB
    Run 2: LegacyDiff8.654 s322 ms5.7 GB1.5 GB2.2 GB
    Run 3: AntiDiff1.663 s (0.19x)74 ms1.4 GB181 MB2.2 GB
    Run 3: LegacyDiff8.799 s216 ms5.7 GB1.5 GB2.2 GB
    Run 4: AntiDiff1.645 s (0.19x)51 ms1.4 GB181 MB2.0 GB
    Run 4: LegacyDiff8.732 s261 ms5.7 GB1.5 GB2.0 GB
    Run 5: AntiDiff1.639 s (0.19x)19 ms1.4 GB181 MB2.2 GB
    Run 5: LegacyDiff8.653 s234 ms5.7 GB1.5 GB2.2 GB
    • Improvement factor: 0.19
    • Speedup : 1 / 0.19 ~= 5.25

Only small rollbacks

NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
Run 1: AntiDiff1.833 s (0.18x)36 ms1.5 GB185 MB2.4 GB
Run 1: LegacyDiff10.362 s867 ms5.8 GB1.6 GB2.4 GB
Run 2: AntiDiff1.696 s (0.19x)30 ms1.5 GB185 MB2.2 GB
Run 2: LegacyDiff8.822 s106 ms5.8 GB1.5 GB2.2 GB
Run 3: AntiDiff1.702 s (0.19x)44 ms1.5 GB186 MB2.2 GB
Run 3: LegacyDiff8.906 s147 ms5.8 GB1.5 GB2.2 GB
Run 4: AntiDiff1.701 s (0.19x)47 ms1.5 GB185 MB2.0 GB
Run 4: LegacyDiff8.949 s197 ms5.8 GB1.5 GB2.0 GB
Run 5: AntiDiff1.677 s (0.19x)55 ms1.5 GB186 MB2.2 GB
Run 5: LegacyDiff8.856 s177 ms5.8 GB1.5 GB2.2 GB
  • Improvement factor: [0.18, 0.19]

  • Speedup : [1 / 0.19 ~= 5.25, 1 / 0.18 ~= 5.55]

    No rollbacks, larger flushes (every 100 pushes)

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff1.643 s (0.25x)21 ms1.5 GB196 MB2.4 GB
    Run 1: LegacyDiff6.591 s351 ms4.0 GB1.4 GB2.4 GB
    Run 2: AntiDiff1.616 s (0.25x)47 ms1.5 GB196 MB2.2 GB
    Run 2: LegacyDiff6.520 s232 ms4.0 GB1.4 GB2.2 GB
    Run 3: AntiDiff1.640 s (0.25x)34 ms1.5 GB196 MB2.2 GB
    Run 3: LegacyDiff6.540 s150 ms4.0 GB1.4 GB2.2 GB
    Run 4: AntiDiff1.635 s (0.25x)76 ms1.5 GB196 MB2.0 GB
    Run 4: LegacyDiff6.589 s131 ms4.0 GB1.4 GB2.0 GB
    Run 5: AntiDiff1.628 s (0.25x)19 ms1.5 GB196 MB2.2 GB
    Run 5: LegacyDiff6.490 s5.9 ms4.0 GB1.4 GB2.2 GB
  • Improvement factor: 0.25

  • Speedup : 1 / 0.25 ~= 4

Genesis

  • We elaborated a draft of the specification of the Genesis implementation and the ChainSync Jumping optimization. In particular, this includes a proof sketch that the latter preserves liveness and safety in all cases (Issue 3964).
    • @nfrisby's main realization during this sprint was that he had been focusing so far on the case where the selected chain is an extension of the intersection of our peers' ChainSync candidates.
    • This is the main case, ie an "absorbing" state, but it's not the only case.
    • The new proof sketch begins by case splitting on that predicate, and that made the sketch quite a bit easier to follow.
  • We continued working on the "happy path" ChainSync Jumping prototype (Issue 3960).

Technical debt

  • We started working on the issues required to re-enable nightly CI runs.. Nightly CI runs have far more lax time constraints, which gives the option to run significantly more property tests than in our regular CI. To this end, we merged a PR to easily adapt the number of tests globally (PR #3947).