Skip to main content

65 posts tagged with "consensus"

View All Tags

· 4 min read
Damian Nadales

High-level summary

During the past two weeks, the consensus team continued its work on testing the UTxO HD prototype. We completed the era-transition and backing store tests, and the mempool tests are advancing at a steady pace. Regarding our work in the Genesis design, we continued our collaboration with the research and networking teams, and we continue investigating strategies for making the chain-sync jumping prototype faster.

High-level status report

  • Finish the UTxO HD prototype: on track.
    • We worked on state-machine tests for the mempool, and spotted potential bugs in the implementation. Investigation is ongoing.
    • We have a set of property tests for the backing store. We still need to incorporate the improvements to the LMDB cursor API that these tests made possible.
    • We merged the era-transition tests PR.
  • Genesis: on track.
    • Design work around Genesis continues in collaboration with researchers and the networking team.
    • We continued trying to improve the performance of the chain-sync jumping prototype. We gained additional insight on which parameters to tweak next. In spite of the baseline still being faster, the current prototype already achieves a significant speedup when compared to the naive approach of simply running full chain-sync with all peers.
  • Tech debt: on track.
    • We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection.

Workstreams

Finish the UTxO HD prototype

We continued working on property-tests for the UTxO HD prototype. In particular we merged the era-transition tests PR.

Backing store property tests

The backing store property tests PR has been reviewed. The next steps are:

  • Improve error handling and command generation.
  • Add coverage testing to check that we are not failing to cover interesting test cases.

The monadic cursor API went through its first review round. The API is in a relatively stable state. This PR also unifies the cborg and serialise-based interfaces to LMDB operations. The next steps are:

  • Write quickcheck-dynamic state-machine tests for this API.
  • Adapt the changes in the serialisation interface in the backing store property tests. This will involve adding boilerplate code in consensus to make up for the removal of the cborg-based interface.

LSM tree implementation

We worked on the LSM tree prototype. In particular, we focused on tuning the LSM tree design to the different workloads that consensus has (eg syncing, normal node operation, etc).

Benchmarking the CSJ prototype

Work on improving the chain-sync jumping performance is ongoing. In particular we compared the performance of different jump intervals, which, somewhat surprisingly, do not make a significant difference. In particular, we are seeing periodic "plateaus" where the chain-sync tip does not progress, but they are much longer for the prototype. Our hypothesis is that this seem to be due to a combination of the garbage collector (GC) pauses, and the actual time it takes the non-dynamo chain-sync peers to jump to the tip of the slot of the dynamo fragment.

In the coming weeks we will try to shorten these plateaus via a combination of tweaking GC options and less synchronisation in the CSJ governor.

The following plot shows the performance of the chain-sync jumping prototype using different jumping intervals. It compares the syncing progress by plotting the slots of adopted blocks against time. The baseline is still faster, however it is worth noting that the current prototype already achieves a significant speedup when compared to the naive approach of simply running full chain-sync with all peers.

The second plot shows the syncing progress sliced to a chosen ~5min interval, and includes, in addition to the slots of adopted blocks, the slots of the tip of the ChainSync fragment. This allows us to see how far ahead of the selected tip the CS dynamo is, i.e. how much room we have for BlockFetch not to get stalled. It shows periodic behaviour (due to the forecasting limit), and shows that the CS fragment tip is not progressing for significant periods ("plateaus").

Technical debt

We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection. This PR involved correcting potentially misleading names of VRF-related functions, and providing context for a particular VRF value is used for tie-breaking.

· 4 min read
Damian Nadales

High-level summary

During the past two weeks, the consensus team worked on adding property test for different aspects of the UTxO HD prototype: era transitions, mempool, and backing store. Thanks to these tests we were able to uncover a bug in the prototype. On the Genesis front, we benchmarked a different version of the ChainSync jumping prototype to try to improve its performance, but this did not result in any noticeable speedup.

High-level status report

  • Finish the UTxO HD prototype: on track.
    • We focused on increasing test coverage for the UTxO-HD prototype:
      • We started implementing Cadano-eras transition property-tests.
      • We started implementing state-machine property-tests for the mempool.
      • We merged the mempool rewrite.
      • We started working on state-machine tests for the backing store. This uncovered a bug in the range-read implementation of the LMDB backing store.
  • Genesis: on track.
    • We benchmarked a version of the Genesis ChainSync Jumping prototype that spreads out the ChainSync updates over a longer period of time. This did not result in any noticeable speedup.
    • We investigated the overhead introduced by non-ChainSync components, but no conclusions could be drawn from the benchmarks we ran.

Workstreams

Finish the UTxO HD prototype

We focused on increasing test coverage for the UTxO HD prototype. We also merged the mempool rewrite.

Era transition property tests

We started implementing Cardano era transition property tests, which are needed for making sure that the ledger tables get updated in the right way when we move from one era to the next. There are at the moment two important transitions.

  • Byron to Shelley: where all the UTxO is transferred from in-memory Byron state (which has no tables) to the ledger tables of the Shelley state.
  • Shelley to Allegra: where the AVVM addresses must be deleted.

We have tests for the Byron to Shelley transitions. We are working on adding the remaining ones.

Mempool state-machine tests

We started implementing state-machine property tests for the mempool. The mempool is currently tested via pure property tests, and use a ledger state without tables. With the introduction of UTxO HD, testing the concurrent behavior of the mempool became of crucial importance (eg now we have to acquire locks to flush the backing store). In addition, we need to test a ledger state with tables. These needs led to the creation of a new set of property tests. In particular we aim to run parallel state-machine tests that exercise the mempool in a way similar to how the node would make use of it.

Backing store property tests

We started working on state-machine tests for the backing store that UTxO HD uses. The property tests uncovered errors in the range-reads implementation of the LMDB backing store. To facilitate fixing this bug, we made changes to the Haskell LMDB bindings.

Benchmarking the CSJ prototype

Prompted by previous benchmarks showing significant improvements in sync time by using more capabilities, we implemented a way to spread out the ChainSync updates over a larger period instead of firing them all at the same time. This didn't result in a noticeable speedup.

We also benchmarked the prototype with CSJ disabled (such that just the dynamo peer is running ChainSync, but e.g. BlockFetch still sees all peers) to rule out/confirm overhead by non-ChainSync (mainly BlockFetch) related components. This results in era-specific behavior (speed is like the prototype in Byron, but like the baseline in Shelley). This deserves a closer look in the future.

This diagram shows the respective syncing progress, starting at Genesis and continuing a good part into Shelley (with the dashed line indicating the Byron-to-Shelley transition).

  • Red: baseline
  • Green: CSJ prototype, 10 peers, jumps every 3000/f slots, jumps in clumps.
  • Blue: like Green, jumps are spread out.
  • Orange: variant with no jumping, to measure unrelated overhead.

· 2 min read
Damian Nadales

High level summary

During the past two weeks, the consensus team worked on improving the performance of the ChainSync jumping logic, which is needed for Genesis. We also rewrote the implementation of the mempool in the UTxO HD prototype which solved the issues that prevented us from running system level benchmarks. Also on the UTxO HD front, we have an improved implementation of the sequence-of-differences (a crucial piece of UTxO HD), and we also elaborated a test sign-off list for the UTxO HD feature.

Executive summary

  • With the latest implementation of ChainSync jumping we are closer to the baseline performance. In particular, the prototype seems to benefit from the extra concurrency provided by additional capabilities.
  • We rewrote the implementation of the mempool in the UTxO HD prototype. This rewrite was required due to performance problems we observed when running the workbench. These performance problems prevented us from running system level benchmarks. The rewrite solved these issues. After the UTxO-HD: mempool rewrite PR is merged, we will contact the Benchmarking team so that they run the system level benchmarks.
  • The implementation of sequences of differences based on anti-diffs was integrated into the UTxO HD prototype. It is pending review and we also need to run replay and syncing benchmarks to confirm that this will deliver a performance improvement, as observed in our micro-benchmarks.
  • The UTxO HD prototype inspection resulted in a list of tests needed for consensus to consider the UTxO HD prototype as fully tested.

Additional information

Genesis

Benchmarking setup: 50MBit/s, 50ms latency

  • Red: baseline
  • Green: Current CSJ prototype, 10 peers, jumps every 3000/f slots.

As ChainSync Jumping involves many concurrent network operations at every jump, we tried to run the node with 6 instead of the default 2 capabilties.

  • Orange: baseline with 6 capabilities
  • Blue: CSJ prototype with 6 capabilities

This diagram shows the respective syncing progress, starting at Genesis and continuing a good part into Shelley (with the dashed line indicating the Byron-to-Shelley transition).

Further work includes whether we can tune the prototype to better handle few capabilities, or to adapt the default number of capabilities (potentially just while syncing).

· 4 min read
Damian Nadales
  • We proposed a fix for the performance degradation observed when running distributed multi-node benchmarks in the UTxO HD feature branch. While this fixed the problems observed when running local benchmarks, it broke the ThreadNet tests due to concurrency issues. Therefore, we think it is wise to start redesigning the UTxO HD mempool integration.
  • We did several rounds of code review on the alternative implementation of diff-sequences required by the UTxO HD feature based on the idea of anti-diffs. This alternative implementation is close to being merged, and the next step is to integrate this to the UTxO HD branch, so that we can run ad-hoc replaying and syncing from scratch benchmarks and compare these with the baseline. The micro-benchmarks we elaborated for the alternative implementation show speedups of up to 4x, so we are optimistic about the performance of replaying and syncing from scratch benchmarks, however it is important to notice that due to the nature of UTxO HD we will still be slower than the baseline.
  • The final draft of the Genesis implementation specification is ready for review.
  • We implemented a prototype for the happy path of Genesis' ChainSync Jumping (CSJ). The prototype is slower than the baseline, however it is not the latest version of the prototype and the jump interval is very small.
  • Work on integrating Conway has stopped since priorities have changed.
  • We started work on benchmarking epoch-boundaries and epoch overhead pr-4014. To this end, we made use of a modified version of our db-analyser tool. We ran the new benchmarking setup using the Cardano mainnet chain, and we can see that block tick and application take substantially longer at epoch boundaries, although there are a couple of slots during an epoch in which these computations take more than normal. We notified the ledger team about these findings. We will use this modified version of db-analyser to investigate the epoch overhead.

Workstreams

UTxO HD

  • Spent quite some time investigating the root cause of the degradation in performance observed in the benchmarks. We run the make forge-stress benchmarks locally in order to debug this behavior.

    • Transaction batching doesn't make a notable difference in the outcome (considering we are using the in-memory backend).

    • The mempool batching implementation required asynchronous transaction validation which is a violation of the LocalTxSubmission protocol contract and therefore if we continued on that route, the impact would have been quite big.

    • The STM logic we implemented by using a TMVar for the mempool internal state was buggy and under certain circumstances it seemed to lock. Reverting the mempool internal state to be stored in a TVar seems to solve this problem.

    • The results we get after this change look almost identical to the ones from the baseline.

  • The anti-diff prototype (PR #3997) has been reviewed and is close to being merged.

    • A follow-up issue (issue #4010) to integrate the anti-diff prototype in the various consensus packages was created. A first version of the integration exists, and all tests pass. A next step is to get some indication of the "real" performance gain by profiling db-analyser (or cardano-node).

Genesis

  • Final draft of the Genesis implementation specification, now up for review.

  • Local benchmark setup for parameter tuning via the happy path ChainSync Jumping (CSJ) prototype (Issue 3987).

    • Context: Our Genesis design requires us to check in with a large (~20) number of servers periodically while syncing. These servers are offered jump requests via the ChainSync protocol (hence the name), which they can accept or decline. If a peer declines, the Genesis rule allows us to determine whether a node actually has a better chain.

    • The "happy path" is when no peer declines a jump. We want this to have close to no overhead compared to status quo, i.e. syncing without Genesis.

    • We implemented a prototype for this happy path, and are now starting to test in various configurations (number of peers, latency, bandwidth) to tune the performance of ChainSync jumping, i.e. how complicated our logic of choosing when to jump needs to be.

      Example:

    • Simulated connection: 50 MBit/s, 50ms latency

    • Jump interval: 3000 slots (on the low end, could be increased to up to 3k/f)

    • Red: baseline (1.35.3), one peer in topology file

    • Blue: Preliminary version of our prototype, with 10 peers.

      It is slower by about ~30%, but it is not the latest version of the prototype, and the jump interval is very small, making CSJ more of a bottleneck.

Technical debt

  • Fix flakiness in ChainDB QSM tests (PR 3990).

· 9 min read
Damian Nadales

Executive summary

  • We did most of the heavy lifting required to integrate the Conway era.
  • We have property tests for the UTxO HD backing store API implementations. A possible bug was identified. Work is ongoing to make sure the property-tests cover all the relevant cases.
  • We implemented and benchmarked the "anti-diff" prototype to speed up the UTxO HD functionality. Results show a rough speedup of 4x to 5.5x across several scenarios. Note that: "Data reported by tasty-bench is only of indicative and comparative significance.". We are investigating additional performance improvements. The "anti-diff" prototype and benchmarks are still pending code review.
  • We elaborated a draft specification for the Genesis implementation and ChainSync jumping optimization.

Workstreams

Conway

  • Integration PR of the minimal Conway era (Issue #3963, PR #3971).
  • Discussions with Ledger revealed possible sources of confusion about which data should be changed in the Conway era. As a result, a new technical debt issue was raised, which does not block the integration of the Conway era (Issue #3976).

UTxO HD

  • Issue #3954, branch: The functionality of a backing store, which is the interface to the on-disk part of ledger state in UTxO-HD, is tested at a high level through the OnDisk tests. However, some functionalities remain untested, e.g., reads of ranges of keys. As such, we have implemented quickcheck-state-machine tests that exercise backing stores directly. The tests are reusable for different backing store implementations because the tests are implementation-agnostic: Any backing store that conforms to the backing store interface can be plugged into the tests. Work is still ongoing to label/monitor the tests, such that we can verify that interesting cases are being tested. Furthermore, a possible bug has been identified in the LMDB backing store with respect to range reads, though the bug has not been resolved yet.

  • Issue #3946, branch, PR #3882: The "anti-diff" prototype proposes an alternative approach to keeping track of sequences (more specifically, FingerTrees) of diffs. These diff sequences are a component of the in-memory parts of the ledger state in UTxO-HD. Since the consensus code often requires the cumulative diff of a sequence of diffs, the current implementation "caches" cumulative diffs of each subtree in the diff sequence. This caching allows relatively fast reconstruction of the total cumulative diff, but this caching proved to incur a non-negligible cost: when we manipulate diff sequences through splits and appends, we force re-computing a logarithmic number of caches. This is problematic, since we often split and append in consensus: we split when we flush diffs to a backing store or when we roll back blocks, and we append when pushing blocks. The new approach should reduce the overhead of this caching.

    We implemented micro-benchmarks for the "anti-diff" prototype: we first generate a sequence of commands (Forward, Push, Flush, or Rollback) through a simulation, after which we measure the performance of applying the commands to a diff sequence. In this context, Forward means forwarding of values through a diff, whereas Rollback means switching to a different fork by rolling back diffs/blocks and pushing new ones. Moreover, we compare the performance for the two implementations: the "legacy" approach, and the anti-diff approach.

    Some preliminary results were positive, but we needed to revisit the benchmark's configuration to obtain more definitive results. After a discussion with @dcoutts and the consensus team about this configuration (e.g., number of commands generated, choice of the security parameter k), the benchmarks should now be closer to the realistic setting. The following configuration specifies the default configuration that is used in the benchmarking code:

    • Number of commands generated: 10_000
    • Security parameter k: 2160
    • Number of initial backing values: 100
    • Number of key-value pairs deleted by a push: 50
    • Number of key-value pairs inserted by a push: 50
    • Number of key-value pairs forwarded by a forward: 50
    • Probability of a large (in the range [1000, 2000]) rollback: 0.05
    • Probability of a small (in the range [1, 10]) rollback: 0.95
    • Order of commands:
      • An equal number of forward and pushes.
      • 1 flush every 10 pushes.
      • 1 rollback every 100 pushes

    Moreover, we run four benchmark scenarios:

    • Default configuration
    • Without rollbacks
    • With only small rollbacks
    • Without rollbacks, larger flushes (1 flush every 100 pushes)

    How to read results

    Note: this section uses documentation from the tasty-bench package to explain how to read the results of running our benchmarks.

    Running a benchmark scenario gives us the following (curated) output:

    ...
    AntiDiff: OK (18.27s)
    2.527 s ± 47 ms, 2.1 GB allocated, 544 MB copied, 2.2 GB peak memory, 0.23x
    LegacyDiff: OK (32.73s)
    10.829 s ± 148 ms, 6.8 GB allocated, 2.3 GB copied, 2.2 GB peak memory
    ...

    The output says that the first benchmark, which exercises the anti-diff prototype, was repeatedly executed for 18.27 seconds (wall-clock time), its predicted mean CPU time was 2.527 seconds and means of individual samples do not often diverge from it further than ± 47 milliseconds (double standard deviation). We also configure the RTS to collect GC statistics, which enables tasty-bench to estimate and report memory usage. This data is reported as per RTSStats fields: allocated_bytes, copied_bytes and max_mem_in_use_bytes. So, the output of the first benchmark says that a total of 2.1 GB of memory was allocated, that a total of 544 MB of memory were copied, and that the peak memory in usage was 2.2 GB. We read the output for the second benchmark in the same way.

    Furthermore, the benchmark compares the mean CPU times for both the anti-diff and legacy approaches: In this case, the mean CPU time for the anti-diff approach is ~0.23x the mean CPU time for the legacy approach. Conversely, the mean CPU time for the legacy approach is 1 / 0.23 ~= 4.35x the mean CPU time for the anti-diff approach. We will call 0.23x the improvement factor. We will call 4.35x the speedup.

    Note that these improvement factors (and reported results) are subject to noise, randomness, the specific configuration parameters, and the whims of statistics. Data reported by tasty-bench is only of indicative and comparative significance.

    Results

    For each of the 4 scenarios, we list the results of running the anti-diff and legacy approaches 5 times. We run the benchmarks 5 times to get an indication of whether the results are similar across multiple runs. Furthermore, we calculate the accompanying ranges (if applicable) of improvement factors and speedups.

    Note also the decrease in total bytes allocated and total bytes copied for the anti-diff approach compared to the legacy approach.

    Default configuration

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff2.533 s (0.23x)4.7 ms2.1 GB557 MB2.4 GB
    Run 1: LegacyDiff10.792 s162 ms6.8 GB2.3 GB2.4 GB
    Run 2: AntiDiff2.508 s (0.23x)245 ms2.1 GB515 MB2.2 GB
    Run 2: LegacyDiff10.850 s30 ms6.9 GB2.3 GB2.2 GB
    Run 3: AntiDiff2.562 s (0.23x)5.0 ms2.1 GB552 MB2.2 GB
    Run 3: LegacyDiff10.993 s149 ms6.9 GB2.3 GB2.2 GB
    Run 4: AntiDiff2.168 s (0.22x)5.3 ms1.8 GB434 MB2.0 GB
    Run 4: LegacyDiff9.976 s39 ms6.3 GB2.0 GB2.0 GB
    Run 5: AntiDiff2.527 s (0.23x)47 ms2.1 GB544 MB2.2 GB
    Run 5: LegacyDiff10.829 s148 ms6.8 GB2.3 GB2.2 GB
    • Improvement factor: [0.22, 0.23]
    • Speedup : [1 / 0.23 ~= 4.35, 1 / 0.22 ~= 4.55]

    No rollbacks

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff1.638 s (0.19x)36 ms1.4 GB181 MB2.4 GB
    Run 1: LegacyDiff8.656 s207 ms5.7 GB1.5 GB2.4 GB
    Run 2: AntiDiff1.638 s (0.19x)75 ms1.4 GB181 MB2.2 GB
    Run 2: LegacyDiff8.654 s322 ms5.7 GB1.5 GB2.2 GB
    Run 3: AntiDiff1.663 s (0.19x)74 ms1.4 GB181 MB2.2 GB
    Run 3: LegacyDiff8.799 s216 ms5.7 GB1.5 GB2.2 GB
    Run 4: AntiDiff1.645 s (0.19x)51 ms1.4 GB181 MB2.0 GB
    Run 4: LegacyDiff8.732 s261 ms5.7 GB1.5 GB2.0 GB
    Run 5: AntiDiff1.639 s (0.19x)19 ms1.4 GB181 MB2.2 GB
    Run 5: LegacyDiff8.653 s234 ms5.7 GB1.5 GB2.2 GB
    • Improvement factor: 0.19
    • Speedup : 1 / 0.19 ~= 5.25

Only small rollbacks

NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
Run 1: AntiDiff1.833 s (0.18x)36 ms1.5 GB185 MB2.4 GB
Run 1: LegacyDiff10.362 s867 ms5.8 GB1.6 GB2.4 GB
Run 2: AntiDiff1.696 s (0.19x)30 ms1.5 GB185 MB2.2 GB
Run 2: LegacyDiff8.822 s106 ms5.8 GB1.5 GB2.2 GB
Run 3: AntiDiff1.702 s (0.19x)44 ms1.5 GB186 MB2.2 GB
Run 3: LegacyDiff8.906 s147 ms5.8 GB1.5 GB2.2 GB
Run 4: AntiDiff1.701 s (0.19x)47 ms1.5 GB185 MB2.0 GB
Run 4: LegacyDiff8.949 s197 ms5.8 GB1.5 GB2.0 GB
Run 5: AntiDiff1.677 s (0.19x)55 ms1.5 GB186 MB2.2 GB
Run 5: LegacyDiff8.856 s177 ms5.8 GB1.5 GB2.2 GB
  • Improvement factor: [0.18, 0.19]

  • Speedup : [1 / 0.19 ~= 5.25, 1 / 0.18 ~= 5.55]

    No rollbacks, larger flushes (every 100 pushes)

    NameMean CPU time2*Stdev (CPU time)Total bytes allocatedTotal bytes copiedPeak memory
    Run 1: AntiDiff1.643 s (0.25x)21 ms1.5 GB196 MB2.4 GB
    Run 1: LegacyDiff6.591 s351 ms4.0 GB1.4 GB2.4 GB
    Run 2: AntiDiff1.616 s (0.25x)47 ms1.5 GB196 MB2.2 GB
    Run 2: LegacyDiff6.520 s232 ms4.0 GB1.4 GB2.2 GB
    Run 3: AntiDiff1.640 s (0.25x)34 ms1.5 GB196 MB2.2 GB
    Run 3: LegacyDiff6.540 s150 ms4.0 GB1.4 GB2.2 GB
    Run 4: AntiDiff1.635 s (0.25x)76 ms1.5 GB196 MB2.0 GB
    Run 4: LegacyDiff6.589 s131 ms4.0 GB1.4 GB2.0 GB
    Run 5: AntiDiff1.628 s (0.25x)19 ms1.5 GB196 MB2.2 GB
    Run 5: LegacyDiff6.490 s5.9 ms4.0 GB1.4 GB2.2 GB
  • Improvement factor: 0.25

  • Speedup : 1 / 0.25 ~= 4

Genesis

  • We elaborated a draft of the specification of the Genesis implementation and the ChainSync Jumping optimization. In particular, this includes a proof sketch that the latter preserves liveness and safety in all cases (Issue 3964).
    • @nfrisby's main realization during this sprint was that he had been focusing so far on the case where the selected chain is an extension of the intersection of our peers' ChainSync candidates.
    • This is the main case, ie an "absorbing" state, but it's not the only case.
    • The new proof sketch begins by case splitting on that predicate, and that made the sketch quite a bit easier to follow.
  • We continued working on the "happy path" ChainSync Jumping prototype (Issue 3960).

Technical debt

  • We started working on the issues required to re-enable nightly CI runs.. Nightly CI runs have far more lax time constraints, which gives the option to run significantly more property tests than in our regular CI. To this end, we merged a PR to easily adapt the number of tests globally (PR #3947).