Skip to main content

5 posts tagged with "consensus"

View All Tags

· 5 min read
Damian Nadales

2023-04 -- 2023-06

Main achievements

UTxO HD

  • We finished a major prototype refactoring, which includes:
    • A better and finer grained DB lock mechanism.
    • Elimination of race conditions.
    • Support for configuring batch query size and flushing rate. This is crucial to allow node users to tweak performance.
    • Architectural simplifications and performance improvements.
  • We implemented a new package to support db-sync integration with UTxO-HD.
  • We ran another set of ad-hoc benchmarks:
    • We uncovered a performance regression on the Network component when using GHC-9.2/9.4.
    • The synchronization and replay speed are as expected.
    • However, we uncovered memory consumption issues (see figure below).
      • The in-memory backend is consuming more memory than the baseline.
      • The LMDB backend shows an unexpected memory usage peak.
      • Investigation on these issues is ongoing.
  • We integrated the latest changes in main branch.
    • This required a re-design of the mempool to include the mempool fairness improvement.

Genesis

  • The Genesis work for this PI focused on an high-priority issue from the IOG Researchers' feedback on the proposal.
    • This particular question was not anticipated when the Q2 PI was planned.
    • As a result, the chain generators work, the ChainSync Jumping performance work, and the Genesis node prototype work were deprioritized.
    • That work has accordingly been rolled over into the Statement of Work for the first Genesis vendor work package.
  • The IOG Researchers' feedback on the design was very valuable. It had two primary effects.
  • Outcome 1: We re-introduced distinct behaviors when the node is "syncing" versus when it is "caught up".
    • This eliminated a DoS vector introduced by the proposal, instead of having to argue that it was well-mitigated.
    • The additional design complexity is relatively small.
  • Outcome 2: The issue that was unanticipated is whether the Cardano chain is consistently dense enough to rely on Genesis without any checkpointing.
    • The determination so far is that---assuming the adversary never controls more than four of the seven genesis keys---the most vulnerable segment is in the pure Praos era.
    • All the preceding windows are significantly more robust, including the entire Byron and Transitional Praos eras.
    • Thus checkpointing is not necessary for the initial Genesis release, though it still may be a reasonable addition later.
    • The primary invention was a model for bounding how much benefit the adversary's long-range attack could possibly gain from Praos's natural short forks.
  • Relevant questions that the IOG Researchers are still assessing.
    • These do not block the Genesis implementation, but do affect the ultimate values of specific parameters.
    • Question 1: what is the upper bound on the duration of an eclipse that a healthy Praos node will survive?
    • Question 2: what is the upper bound on how much grinding can improve the adversary's leader schedule within some Genesis window?

Support

  • We performed an analysis on number of file descriptors used by Consensus, this information can be used by the node operators to check if the number of file descriptors they want to support are enough, thus improving the user (eg node operator) experience.
  • We implemented a mempool fairness improvement, by which transactions are guaranteed to be processed irrespective of their size.

Technical debt

  • We fixed a bug in followers logic, which was discovered by our QuickCheck property tests.
  • We created an immutable DB server. This tool allows to serve blocks from the immutable DB to a node that connects to it. This has a remarkable value for testing and benchmarking purposes. For instance, by using this component, we can benchmark the performance of different aspects of Consensus, such as syncing from scratch, without adding Network interference in the performance results.
  • We created a db-truncater tool, which can be used in disaster recovery and benchmarking scenarios.
  • We created a benchmarks comparison tool that we plan on using for comparing the performance of two Consensus releases. This will allow us to catch performance regressions early on in the process, before they make it to the node (and show in the system level benchmark tests), thus greatly saving development costs. As an example, the graph below shows the performance improvements introduced by the Ledger team in version 0.6.0.0 of Consensus wrt version 0.5.0.0.

Fostering collaboration

  • We released fs-sim as open-source repository. This lowers the barrier to entry for external contributors, which will indirectly benefit the Cardano project.
  • We migrated the consensus code to a new repository, splitting it from the ouroboros-network repository. This will save development effort for both the Network and the Consensus teams, since there will be less interference (for instance when making releases).
  • We made several improvements to our release processes, which will translate in time savings. As an example, we went from 16 to 4 packages, which makes the release process simpler and smother. Our release process now makes it easier to align versions and make releases (both for us and for our downstream users).
  • We added an explanation of the hardfork-combinator forecast horizon, that will benefit not only our team, but future external contributors.

Next steps

UTxO HD

Genesis

  • We will regularly liaise with the vendor(s) satisfying the Genesis Statement(s) of Work.

· 3 min read
Damian Nadales

Areas of focus

IssueStatus
Implement legacy mode for UTxO-HD to keep baseline performance✅ Done
Assist mainnet node release with initial Conway capabilities✅ Done
Assist with test, benchmark, and improvements to CIP 1694✅ Done
Assist with P2P IOG relay network shut down✅ Done
Assist with repo transfer to Intersect✅ Done
Support vendors to deliver contracts✅ Done
Operation serenity Q4 2023✅ Done

Highlights

Implement legacy mode for UTxO-HD to keep baseline performance

  • ✅ We managed to run a UTxO-HD capable node in legacy mode, maintaining the baseline memory usage while keeping all the ledger state in memory.
    • While the legacy mode is not production-ready (it requires further integration and testing), it remains as a plan B should the need arise to release UTxO-HD if our stakeholders so demand it.
  • ✅ We pivoted to redesigning the Ledger DB API because:
    • This is needed for integrating the LSM-tree backend.
    • The redesign opened the possibility of implementing an in-memory backend that would keep the same performance and resource requirements as the baseline version (which needs to be confirmed by benchmarks).
  • ✅ We created a more general Ledger DB API.
  • 🛠️ We are integrating (into the feature branch) the existing Ledger DB implementations with the new API.
  • 🛠️ We are implementing the new in-memory backend.

Assist mainnet node release with initial Conway capabilities, test, benchmark, and improvements to CIP 1694

  • ✅ We recognized that Conway introduces a new challenge in the versioning of NTC queries, and we resolved it (see 864 and 4770).

Assist with P2P IOG relay network shut down

  • ✅ We created a prototype for the pre-Genesis State Machine for bootstrap peers, which is currently under test (see this PR).

Assist with repo transfer to Intersect

  • ✅ We transferred the ouroboros-consensus repository to the Intersect GitHub organization.

Support vendors to deliver contracts

  • Genesis
    • ✅ Interacted with the Consensus team and addressed resulting feedback on past deliverables.
    • ✅ Finished implementation of the testing infrastructure of Genesis
    • ✅ Started to refine the Proof of Concept demo into an actual implementation of the core components of the Genesis design.
  • 💾 LSM-tree implementation. Well Typed:
    • ✅ Finished the design of the public facing API.
    • ✅ Defined the LSM-tree database file-type formats.
    • ✅ Implemented property and model-based tests.

Operation serenity Q4 2023

  • 🎉 We welcomed our newest team member @RenateEilers and assisted with her (ongoing) onboarding.
  • ✅ We implemented a simplification in the ChainSync mini-protocol that is also a step towards Ouroboros Chronos.
  • ✅ We added tests to check Consensus emits valid CBOR, which prevents the generation of invalid binary encoding.
  • ✅ We established and implemented an interface between Consensus tooling and P&T tooling, which constitutes a step towards incorporating component level benchmarks in our development process.

· 3 min read
Damian Nadales

Consensus Quarterly Update

2023-01 - 2023-03

Main achievements

UTxO HD

  • We finished the testing activities for the prototype, which involved adding new tests, and fixing and enabling temporarily disabled tests.
  • We spent a substantial amount of effort refactoring and cleaning the prototype.
  • We audited the UTxO HD prototype to make sure it can accommodate the migration of other tables (eg stake-keys registration) from memory to disk. The result of the audit was positive.
  • We ran ad-hoc benchmarks for reading keys and flushing values to disk. No unexpected costs found.
  • We ran the first system level benchmarks. The performance regressions reported were due to an unrealistic snapshotting rate. We need to re-run them again after we design a more fine grained locking mechanism.

Genesis

  • We elaborated a roadmap of the remaining work for Genesis.
  • We presented the design to the IOG Researchers and PNSol on February 20. The design was well received. We updated the Genesis design with the researcher's feedback.
    • We plugged the new DoS vector identified during the aforementioned presentation.
  • We developed a generator for adversarial leader schedules that satisfy key Ouroboros properties, which will be used to test the Genesis design.
    • The generator enables use of smaller Ouroboros parameters, which makes extrema more likely and counterxamples easier interpret.
  • We wrote up the latest design iteration.
  • We continued benchmarking the Chain Sync Jumping prototype. In particular:
    • We debugged the prototype's performance regression, and unmasked the actual cause by patching our initial theory (bad queuing behavior)
    • We identified and validated the actual cause (a pathological case in BlockFetch tiebreaker).

Support

  • We created two new tools. One for dumping CBOR encoded blocks to JSON. Ahother to serve a local immutable DB.

Conway era

  • We integrated the Conway era into consensus.

Technical debt

  • We fixed a bug with followers, which was discovered by property tests.
  • We developed a DSL for specifying and running ChainDB test cases.
  • We fixed failing tests with iterators.
  • We created micro-benchmarks for adding transactions to the mempool.

Fostering collaboration

  • We released a new technical documentation site for consensus.
  • We factored out several packages to external repositories. Some of this work originated in the UTxO HD workstream.

Next steps

UTxO HD

Genesis

Support

  • Design Consensus side of hardfork-enactment in the Voltaire phase (#4180).
  • Estimate the number of file descriptors Consensus needs #20.

Tech debt

  • Identify Quantitative Timeliness Agreements (QTAs) metrics that we can define for consensus. Pick one and implement benchmarks for it.

Fostering collaboration

  • Onboard a new team member.

· 4 min read
Damian Nadales

Consensus Quarterly Update

2022-12 - 2023-01

Main achievements

UTxO HD

The prototype is feature complete and thoroughly tested at the consensus level. In particular, we invested a lot of time in writing property-test for the mempool, and other crucial new parts of the prototype. Now we are ready to run integration tests and system-level benchmarks.

Genesis

We identified and fixed a slowdown in cross-era forecasting that was inhibiting our efforts to benchmark the ChainSync Jumping prototype. This resulted in a 7% speedup in full sync times in the baseline.

We also started prototyping a self-contained implementation of the Genesis dynamics (in particular of the parts intentionally not part of the ChainSync Jumping prototype) that furthered our understanding of subtleties and edge cases.

Support

  • We worked on designing integration of new VRF and KES crypto into consensus.
    • Crypto class was split into two parts: Crypto and HeaderCrypto.
    • With the Ledger team's help, we refactored cardano-ledger to use a proxy type for VRF.

Conway era

  • PR went through its second review round. It is about to be merged, but it got delayed due to people's availability during Christmas break.

Technical debt

  • We improved the capabilities of our io-sim library, which is key for testing and simulating Cardano components.
  • We removed thunks from epoch translations in the ledger, which is important for reducing memory consumption of the Cardano node.

Fostering collaboration

  • We added a tutorial on how to instantiate the Consensus layer to run custom ledgers. This should be a valuable resource to people looking to roll their own custom blockchain (either for commercial or research purposes).
  • We added an overview of consensus to the top level documentation of ouroboros-network. This overview describes the consensus components and adds a hyperlinked map to the modules documentation.

Next steps

UTxO HD

  • Evaluate the extensibility of the prototype. Moving the UTxO to disk is only the first step towards reducing the memory requirements of Cardano node, and ensuring its long term sustainability. In the future, we plan on moving other large maps, such as delegation maps. The prototype should be able to accommodate these changes without any major modifications.
  • Start the integration with other downstream components, such as the wallet and db-sync. The idea is to identify and address any potential pain points that might arise during this integration.
  • Run integration tests and system-level benchmarks.

Genesis

  • Finish benchmarking and tuning the fast-path ChainSync Jumping prototype
  • Expand and optimize the self-contained implementation of the Disconnect Rule (including density comparisons and the LoE)
  • Develop documentation and smoke tests for these components.
  • Start modifying the ChainSync Client for the LoP and LoR.

Support

  • Help the Network team with diagnosing performance regression in block production.

Tech debt

  • Fix property-test failures concerning iterators (#3999 and #4183).

Fostering collaboration

Risks

UTxO HD

  • Moving other parts of the ledger state to disk might require a major redesign of the prototype. For instance, if it turns out that the epoch change rules require access to the full ledger state. If this is the case, we might accept this risk and do the redesign after the initial release of UTxO-HD.
  • Integration with downstream clients might require more work than we anticipate.
  • Access to the benchmarking's team time and resources.
  • Benchmarking results might show significant performance degradation, which will require additional work if such performance degradation is not accepted by other stakeholders.
  • The prototype's performance might not be accepted by other stakeholders. Here we need to clearly communicate that this is necessary to ensure that as the blockchain size grows, the node can operate within reasonable memory constraints.

· 4 min read
Damian Nadales

Consensus Quarterly Update

2022-09 - 2022-11

Main achievements

UTxO HD

  • As a consequence of the errors observed when running distributed mempool benchmarks, we re-designed the UTxO HD mempool integration, which fixed these errors and lead to a simpler and more maintainable design.

  • We focused on increasing test coverage for the UTxO-HD prototype. In particular, we added property tests for:

    • Backing store (work ongoing)
    • Era transitions
  • The property tests we added uncovered several bugs, which is a great result given the exponential increase in the cost of finding bugs as they are closer to deployment.

  • One of the errors found by our tests required us to work on improvements in the Haskell bindings for LMDB. This work is ongoing.

  • We started working on the mempool property tests that will exercise the new code paths that UTxO HD introduced.

  • We developed, benchmarked and tested an implementation of sequences of differences based on "anti-diffs". Performance results of diff sequence operations show that we achieved a speedup of about 4x across several scenarios. Note: this speedup is taking into account diff sequence operations only, so the consensus-wide speedup is less than 4x.

  • We integrated the "anti-diff" prototype into the UTxO HD feature branch.

Genesis

  • We wrote a simulator that demonstrates soundness of an abstract implementation of the new chain selection rule.
  • We elaborated a draft specification for the Genesis implementation (currently awaiting feedback from other architects).
  • We elaborated a draft specification for the ChainSync Jumping optimization. In particular, this includes a proof sketch that the latter preserves liveness and safety in all cases.
  • With the Networking team, we co-designed the eclipse avoidance mechanism, specifically its coherence with the Genesis implementation plan's security and its dependence on the new ChainSync Jumping optimization.
  • We implemented a prototype for ChainSync Jumping. Initial benchmarks showed a performance degradation wrt the baseline. Our optimization attempts so far have brought the performance closer to the baseline, but not yet to parity.

Conway era

  • We did most of the heavy lifting required to integrate the Conway era into the Consensus layer.

Technical debt

  • We started working on enabling CI nightly tests, which revealed several test failures due to thunks being found it data structures used by the ledger and consensus. We made a lot of progress fixing those thunk errors, but some errors still remain.

  • We elaborated a db-analyser benchmark for the ledger operations. This led us to the identification of high processing time at epoch boundaries, and we could not observe any performance degradation that can be attributed to era changes.

  • We fixed a source of flakiness in the ChainDB QSM test.

  • We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection.

  • We fixed a bug in the maximum-allowed ledger major protocol version.

Fostering collaboration

  • We spent time making cardano-updates the central source of information for the core teams stakeholders.
  • We went through the Galois gap analysis and extracted actionable points to take on next.
  • Bart and Yogesh continued with their onboarding and stated making substantial contributions to consensus.

Next steps

UTxO HD

  • Finish the mempool property tests.
  • Benchmark the latest version of the prototype.
  • Elaborate a document that describes new integration test scenarios and pass it to the SDET team.
  • Bring query UTxO by address command performance on par with the baseline version.

Genesis

  • Receive and incorporate Duncan's feedback on the first draft specification for the Genesis implementation.
  • Begin prototyping the first genesis implementation, unless the first draft needs major changes.
  • Draft a second revision of the Genesis report.
  • Review the second revision with a wider audience, which includes at least Alexander Russell. That feedback will drive a third and hopefully final revision.
  • Investigate how to mitigate the ~30% slowdown we have observed so far in the ChainSync jumping prototype, and try to mitigate it. In particular, we might need to optimize the existing BlockFetch logic.

Tech debt

  • Enabling nightly CI tests.

Fostering collaboration

  • Merge the tutorial document Galois wrote; requires CI integration.
  • Come up with our own documentation improvements, many of which were suggested in the Galois gap analysis.
  • Try to hire a new team member.