Skip to main content

Consensus Team Update

· 4 min read
Damian Nadales

High-level summary

During the past two weeks, the consensus team continued its work on testing the UTxO HD prototype. We completed the era-transition and backing store tests, and the mempool tests are advancing at a steady pace. Regarding our work in the Genesis design, we continued our collaboration with the research and networking teams, and we continue investigating strategies for making the chain-sync jumping prototype faster.

High-level status report

  • Finish the UTxO HD prototype: on track.
    • We worked on state-machine tests for the mempool, and spotted potential bugs in the implementation. Investigation is ongoing.
    • We have a set of property tests for the backing store. We still need to incorporate the improvements to the LMDB cursor API that these tests made possible.
    • We merged the era-transition tests PR.
  • Genesis: on track.
    • Design work around Genesis continues in collaboration with researchers and the networking team.
    • We continued trying to improve the performance of the chain-sync jumping prototype. We gained additional insight on which parameters to tweak next. In spite of the baseline still being faster, the current prototype already achieves a significant speedup when compared to the naive approach of simply running full chain-sync with all peers.
  • Tech debt: on track.
    • We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection.

Workstreams

Finish the UTxO HD prototype

We continued working on property-tests for the UTxO HD prototype. In particular we merged the era-transition tests PR.

Backing store property tests

The backing store property tests PR has been reviewed. The next steps are:

  • Improve error handling and command generation.
  • Add coverage testing to check that we are not failing to cover interesting test cases.

The monadic cursor API went through its first review round. The API is in a relatively stable state. This PR also unifies the cborg and serialise-based interfaces to LMDB operations. The next steps are:

  • Write quickcheck-dynamic state-machine tests for this API.
  • Adapt the changes in the serialisation interface in the backing store property tests. This will involve adding boilerplate code in consensus to make up for the removal of the cborg-based interface.

LSM tree implementation

We worked on the LSM tree prototype. In particular, we focused on tuning the LSM tree design to the different workloads that consensus has (eg syncing, normal node operation, etc).

Benchmarking the CSJ prototype

Work on improving the chain-sync jumping performance is ongoing. In particular we compared the performance of different jump intervals, which, somewhat surprisingly, do not make a significant difference. In particular, we are seeing periodic "plateaus" where the chain-sync tip does not progress, but they are much longer for the prototype. Our hypothesis is that this seem to be due to a combination of the garbage collector (GC) pauses, and the actual time it takes the non-dynamo chain-sync peers to jump to the tip of the slot of the dynamo fragment.

In the coming weeks we will try to shorten these plateaus via a combination of tweaking GC options and less synchronisation in the CSJ governor.

The following plot shows the performance of the chain-sync jumping prototype using different jumping intervals. It compares the syncing progress by plotting the slots of adopted blocks against time. The baseline is still faster, however it is worth noting that the current prototype already achieves a significant speedup when compared to the naive approach of simply running full chain-sync with all peers.

The second plot shows the syncing progress sliced to a chosen ~5min interval, and includes, in addition to the slots of adopted blocks, the slots of the tip of the ChainSync fragment. This allows us to see how far ahead of the selected tip the CS dynamo is, i.e. how much room we have for BlockFetch not to get stalled. It shows periodic behaviour (due to the forecasting limit), and shows that the CS fragment tip is not progressing for significant periods ("plateaus").

Technical debt

We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection. This PR involved correcting potentially misleading names of VRF-related functions, and providing context for a particular VRF value is used for tie-breaking.