Skip to main content

82 posts tagged with "consensus"

View All Tags

· 3 min read
Damian Nadales

High level summary

During the past two weeks we got the results from the system level benchmarks for UTxO HD. They showed a substantial performance regression, so we spent some time analyzing the results. We found out the frequency at which ledger snapshots were taken was too high, so we requested the benchmarking team a new run with a more realistic snapshotting policy. We continued refactoring and improving the prototype, and we released UTxO-HD related packages to CHaP.

We met with IOG researchers and networking specialists to discuss the Genesis design, which was well received. We continued working on testing and benchmarking different Genesis prototypes.

We are also working on solving a test failure related to iterators. This work derived in several improvements such as better documentation, a framework for writing unit (and regression) tests, and the possibility of debugging QuickCheck counter examples in the REPL.

Finally, we released ouroboros-consensus 0.2.0.0 and ouroboros-consensus-cardano 0.3.0.0 to CHaP

Workstreams

UTxO HD Prototype

We got the results of the first system level benchmarks for UTxO HD. They seemed to indicate a significant regression in performance. After looking into the benchmark logs we found that the benchmark runs took ledger state snapshots too often, due to the default snapshotting policy depending on k, and k being so small in the benchmark runs. Therefore, the next step is to re-run the benchmarks with a snapshotting policy that more closely resembles the one from mainnet.

At the same time, we continued refactoring and cleaning up the prototype.

Also, we prepared the anti-diff packages (fingertree-rm, diff-containers, simple-semigroupoids) and the lmdb related packages (cardano-lmdb and cardano-lmdb-simple) to CHaP.

Genesis

The Genesis design was presented to the IOG researchers and Peter Thompson from NSol. It was well received. They pointed out one blindspot, but we think it'll be relatively simple to mitigate.

In parallel, we continued developing test and benchmarks for the Genesis prototypes. I particular we tested and implemented a potential fix for increased ChainDB dequeue timings, which partly behaved as we expected, but still needs further investigation. Also we obtained new benchmarking data for the prototype.

Technical debt

Related to #4183, we developed a DSL for specifying ChainDB unit tests. This will allow us to better understand the counter-examples returned by QuickCheck tests, and to write regression tests for them. Also, we added a module to enable QuickCheck counter-examples to be run on the REPL, allowing for faster debugging feedback. Also, we improved the documentation related to followers (#4372).

We are also working on a design for optimizing the way we handle blocks from the future.

Support

We released ouroboros-consensus 0.2.0.0 and ouroboros-consensus-cardano 0.3.0.0 to CHaP. Remember that we decided to split the packages related to Consensus into two bundles, one with the core functionality, Cardano-agnostic code, and another bundle with instantiations specific to Cardano.

· 2 min read
Damian Nadales

High level summary

We continue refactoring the UTxO HD prototype while we wait for the system level benchmarks. We have created a new repository that contains the anti-diff packages used in this prototype.

On the Genesis front, we are preparing another meeting with the researchers to audit the implementation design, and we continued working on basic tests and simplifications.

During the past two weeks we also introduced two new tools. One for dumping CBOR encoded blocks to JSON, and another to serve a local immutable DB.

Workstreams

UTxO HD Prototype

We are in the process of refactoring the UTxO HD prototype, while we wait for the system level benchmarks to confirm if the performance of the prototype is satisfactory.

We also set up a repository for the anti-diff package, which required us to refactor the code, write documentation, and prepare a release to CHaP.

Genesis

We worked on basic tests for the Limit on Eagerness property of Genesis. We also introduced further robustness and simplifications in the Genesis Density governor. Finally, we developed a presentation to engage again with the researchers on our Genesis implementation design.

Technical debt

Fostering collaboration

We are in the process of polishing the ouroboros-consensus documentation site, which we will use a the entry point for Consensus related documentation. The first version will not be complete, but we plan on systematically improving it.

Support

We added a tool to ouroboros-consensus-cardano-tools which allows to dump the Chain DB blocks or any given CBOR encoded blocks as JSON.

We also added another tool that serves an existing immutable DB via BlockFetch and ChainSync. This tool can help in assisting our local benchmarking efforts (for instance Genesis' ChainSync jumping prototype).

· 3 min read
Damian Nadales

High level summary

During the past two weeks, the consensus team finished the testing activities around the UTxO-HD prototype. This is a very important milestone which will enable us to run system-level tests and benchmarks, as well as start refactoring and cleaning the prototype. Regarding our Genesis workstream, we elaborated a roadmap that gives an indication of the remaining work. We also continued our work on benchmarking chain-sync-jumping. We also continued working on improving the way we handle blocks from the future, and advancing the integration of the new VRF and KES crypto.

Workstreams

UTxO HD Prototype

As the prototype is nearing its completion, it was important to have enough confidence that we will be able to move additional parts of the ledger state onto disk. We worked together with the Ledger team to elaborate a sketch on how the UTxO-HD design would accommodate the migration of additional data from memory to disk. This gave us enough confidence that the current architecture will be extensible in the future.

On the testing front, we added property-based tests for the UTxO-HD type classes.

We also enabled disabled components, and addressed several technical debt issues:

  • Implement splitSized anti-diff split (#4269), and integrate it into consensus (#4273).
  • Renaming of peekVal to peekMDBVal (#7).

We ran ad-hoc benchmarks for syncing a chain from scratch and replaying. We found a race condition in the LMDB backing store, which we fixed. After the fix we were able to successfully run these benchmarks. The results were published by this pull request.

We used our db-analyser tool to benchmark the cost of reading keys and flushing values to disk. The following plot shows the duration of these disk operation in relation to the main ledger operations, where we can see that the cost of the former are comparatively low. The spike at the beginning of the graph is when, at the start of the Shelley era, the entire UTxO set is flushed to disk.

UTxO-HD read and flush benchmarks

After months of hard work adding tests for the prototype, we are ready to run end-to-end tests on the node, and system level benchmarks. This signals a very important milestone for the UTxO-HD workstream 🎉.

Genesis

We elaborated a high-level decomposition of the remaining work for Genesis. We also continued benchmarking the chain-sync-jumping happy-path.

Technical debt

We continued working on improving the way we handle blocks from the future.

Support

We completed the mapping of Crypto to HeaderCrypto and body Crypto. HeaderCrypto is moved to cardano-protocol-tpraos. We created a draft pull request to facilitate compiling consensus.

· 3 min read
Damian Nadales

High level summary

The consensus team is resuming its activities after the Christmas break. During these weeks we focused on cleaning and benchmarking the UTxO-HD prototype, and discussing with the Ledger team the changes that might be required for the next iterations. The pull request that adds the Conway era is waiting for a second review round and we hope to merge it soon. On the technical debt side we are looking into a property-test failure found in the iterators. We are investigating if this is an error in the model or in the implementation. We also improved the documentation of our testing code.

Workstreams

UTxO HD Prototype

We worked with the Ledger team to start preparing the next versions of UTxO-HD. The Ledger team is concerned that for the remaining maps we might need the full ledger state on epoch boundaries. Since the main consumer of the ledger rules is Consensus, the code that requires access to a full state could be moved from the ledger to some Ledger-Consensus bridge. Eg. the traversal of rewards could take place in such bridge, instead of querying the ledger for the values that are required in the epoch-transition computations.

We relocated some UTxO-HD definitions, in preparation for merging the prototype into master.

We also completed updated local benchmarks comparing the replay time and memory consumption of:

  • the baseline node (f2fc76ef45647275c98634da1718290b976ff364)
  • the UTxO-HD node with the in-memory backend
  • the UTxO-HD node with the LMDB backend

The following plot shows the results: we can see that the LMDB node barely reaches 8GB of memory, but it takes 1.78 times longer to replay the chain. The in-memory backend is about 30 minutes faster, but still slower than the baseline version. We are aware of this phenomenon and it is inherent to the problem of maintaining sequences of differences of the last k ledger states that allows us to perform rollback and roll-forward. We are in the process of measuring syncing from scratch times.

We also added StaticEither accessors that helped us to simplify the UTxO-HD prototype.

New Conway era

We incorporated the feedback of the pull request, and rebased this branch on top of master. The PR is pending a second review round and we hope to merge this soon.

Technical debt

We are investigating a property-testing failure involving iterators. Solving this requires understanding the expected behavior of iterators in the counterexample found by QuickCheck to determine if the error is in the model or in the implementation.

Fostering collaboration

We moved the contents of docs/Testing.md closer to the code, so that the explanations about the tests are easier to find in the relevant modules, and the documentation is easier to keep up to date.

· 3 min read
Damian Nadales

High level summary

During the past two weeks, the Consensus team finalized the QSM tests for the backing store and Mempool on the UTxO-HD branch with important discoveries regarding parallel QSM testing. We also worked with the Ledger team to envisage the modifications that are required in Ledger and Consensus to accommodate the changes in the crypto VRF and KES. The db-analyser now supports bechmarking the ledger operations, which will allow us to identify, debug, and profile potential performance problems. We drafted a document that defines how to manage the versions of Consensus-related packages. The top level documentation of ouroboros-network now features a description of the consensus components and provides a hyperlinked map to the modules documentation.

Workstreams

UTxO HD prototype

Whereas we had passing sequential state-machine tests for the mempool, the parallel case proved to be more challenging than we thought. The operation of adding a list of transactions to the mempool is not atomic and, as a result, when adding a list of transactions, transactions from other processes can be added in between. The mempool implementation handles this correctly, however this required us to redesign the parallel model we had to take the lack of atomicity into account.

Backing store property tests

We finished refactoring the backing store property tests. The second review round is ongoing.

LSM tree implementation

We are working on benchmarking (in terms of time and number of IO operations) fetching/looking up data from disk.

Genesis

We worked on the design of a mechanism to prevent a DoS attack on our Genesis design related to rollbacks. This was arguably the biggest outstanding question.

During the discussions around Genesis, we noticed a design boundary that nicely delineates a fundamental component. We almost have a full Haskell prototype of it. It will be very nicely self-contained, perhaps even usable in the ultimate implementation!

New VRF and KES crypto integration

We collaborated with the Ledger team on preparing the ledger state and crypto types to avoid huge allocation on the epoch boundary when changing aspects of the crypto that will only manifest in headers, not in the ledger states.

Technical debt

We merged the pull-request that adds a support to db-analyser for benchmarking ledger operations. This will allow us to identify, debug, and profile potential performance problems. The benchmark focus on the main 5 ledger operations that are involved in chain syncing, block forging, and block validation, namely:

  1. Forecast.
  2. Header tick.
  3. Header application.
  4. Block tick.
  5. Block application.

The following figure shows a plot of the benchmarking results for the first 65 million blocks (approximately) of the Cardano chain. The thin yellow lines under the x-axis show the epoch boundaries, whereas the thick yellow lines correspond to the era transitions.

As we can see in this figure, era and epoch boundaries require more computation time. The ledger team are aware of this problem, and we are working to improve this situation.

Fostering collaboration

We drafted a document motivating and defining how Consensus (and possibly other core teams) will/should manage our package versions. This pull-request garnered many great discussions from our team members and other teams too: Sebastian Nagel, Arnaud Bailly, Michael Peyton-Jones, Ziyang Liu, et al. We want to thank you all for your input, and we found this discussion very enlightening!

We merged the pull request that adds an overview of consensus to the top level documentation of ouroboros-network. This overview describes the consensus components and adds a hyperlinked map to the modules documentation.