Skip to main content

· 3 min read
John Lotoski

2024-07 - 2024-09

Main achievements

In addition to ongoing general maintenance and support of cardano environments, SRE achievements for this quarter include:

  • All IOE cardano-parts supported node environments, including preview, preprod, sanchonet, mainnet and other clusters were upgraded through various cardano-node releases of 9.0.0, 9.1.0, 9.1.1, 9.2.0, and finally into 9.2.1 by the end of September.

  • All IOE cardano-parts supported node environments had dual stack ipv4/ipv6 capability added and configured, including supporting scripts and recipes, module updates, terraform/openTofu resource changes and software updates to make previously ipv6 incompatible software ipv6 compatible, example: cardano-faucet. Cardano-parts clusters can now seamlessly participate in ipv6 cardano-node traffic and other ipv6 traffic.

  • Preview, preprod and mainnet networks were hard forked to Conway.

  • Legacy mainnet cluster shelley-era high-load relays were scaled down over the quarter and stopped now that p2p has removed the need for them.

  • Legacy cardano explorer was retired and Cardano Foundation is now providing the replacement landing page which links to several community explorers.

  • Cardano-smash production load was retired from equinix metal hosting from the cardano-world repo and transferred to the new cardano-mainnet cluster.

  • New cardano-mainnet cluster scaling capability was added for the bootstrap machines. Block performance analysis was used to tune RTS parameters on the bootstraps and other mainnet pool machines.

  • Sanchonet environment was re-spun for cardano-node 9.1.0 and greater compatibility.

  • Private chain was stopped and re-spun with 2 hr epochs for testing.

  • New nixosModules were added to cardano-parts and cardano-playground, including: profile-blockperf, profile-tcpdump (for saving node traffic pcaps to s3) and ogmios.

  • Documentation for playground and mainnet cluster operations was improved, such as documents for: debugging of peer-to-peer connections; governance voting with the playground stakepools; faucet setup; faucet pool de-delegation and mainnet dbsync cardano-snapshot operations. See the docs/explain directory of both the cardano-playground and cardano-mainnet repos for details.

  • The cardano-monitoring repository received a lot of documentation and improvements and now also serves as the home for devx-ci metrics after migration away from Grafana cloud hosting.

  • An improved cardano-airgap image for secure signing operations was created and made available.

  • Hydra CI performance was improved with changes to our custom Nix evaluator and optimized resource usage while waiting for IFDs.

Next steps

  • Add a production protocol-parameters cardano-api based server to facilitate community transaction creation without requiring a live node.

  • Migrate from deprecated grafana agent to grafana alloy.

  • Finalize support for the new cardano-node tracing system once the service is rewritten for general consumption.

  • Extend govtool frontend and backend to a process-compose stack once govtool is publicly buildable again.

  • Continue cardano-parts and operations improvements.

· 3 min read
Damian Nadales

Consensus Quarterly Update

2023-01 - 2023-03

Main achievements

UTxO HD

  • We finished the testing activities for the prototype, which involved adding new tests, and fixing and enabling temporarily disabled tests.
  • We spent a substantial amount of effort refactoring and cleaning the prototype.
  • We audited the UTxO HD prototype to make sure it can accommodate the migration of other tables (eg stake-keys registration) from memory to disk. The result of the audit was positive.
  • We ran ad-hoc benchmarks for reading keys and flushing values to disk. No unexpected costs found.
  • We ran the first system level benchmarks. The performance regressions reported were due to an unrealistic snapshotting rate. We need to re-run them again after we design a more fine grained locking mechanism.

Genesis

  • We elaborated a roadmap of the remaining work for Genesis.
  • We presented the design to the IOG Researchers and PNSol on February 20. The design was well received. We updated the Genesis design with the researcher's feedback.
    • We plugged the new DoS vector identified during the aforementioned presentation.
  • We developed a generator for adversarial leader schedules that satisfy key Ouroboros properties, which will be used to test the Genesis design.
    • The generator enables use of smaller Ouroboros parameters, which makes extrema more likely and counterxamples easier interpret.
  • We wrote up the latest design iteration.
  • We continued benchmarking the Chain Sync Jumping prototype. In particular:
    • We debugged the prototype's performance regression, and unmasked the actual cause by patching our initial theory (bad queuing behavior)
    • We identified and validated the actual cause (a pathological case in BlockFetch tiebreaker).

Support

  • We created two new tools. One for dumping CBOR encoded blocks to JSON. Ahother to serve a local immutable DB.

Conway era

  • We integrated the Conway era into consensus.

Technical debt

  • We fixed a bug with followers, which was discovered by property tests.
  • We developed a DSL for specifying and running ChainDB test cases.
  • We fixed failing tests with iterators.
  • We created micro-benchmarks for adding transactions to the mempool.

Fostering collaboration

  • We released a new technical documentation site for consensus.
  • We factored out several packages to external repositories. Some of this work originated in the UTxO HD workstream.

Next steps

UTxO HD

Genesis

Support

  • Design Consensus side of hardfork-enactment in the Voltaire phase (#4180).
  • Estimate the number of file descriptors Consensus needs #20.

Tech debt

  • Identify Quantitative Timeliness Agreements (QTAs) metrics that we can define for consensus. Pick one and implement benchmarks for it.

Fostering collaboration

  • Onboard a new team member.

· 3 min read
Marcin Szamotulski

2023-01 - 2023-03

Main achievements

Gradual dynamic P2P release on mainnet

We released two version of cardano-node with dynamic P2P capabilities:

  • 1.35.6
    • we found and fixed a bug in exception handling in peer-state-actionspull-4357
    • we found and fixed a busy loop when demoting a peer from hot to warm pull-4385
  • 1.35.7
    • includes interoperability in the legacy non-p2p network stack pull-4467
  • we fixed a busy loop of demotion & promotions: warm -> hot -> warm[pull-4485] /it will be included in cardano-node-8.0.0 release/.

Currently there are more than 200 P2P relays on mainnet.

Peer Sharing

We implemented /peer sharing/ pull-4019 which will be available as an experimental feature in one of the future cardano-node releases.

We implemented /light peer sharing/, e.g. adding inbound connections to the set of known peers of the outbound governor, which allows to bootstrap relays not registered on chain. This complements peer sharing. The pull-4277 is in late review stages.

Eclipse Evasion

We finalised design of eclipse evasion and we started implementing it. We have an initial implementation (not merged). We are in the process of extending our test suite to cover new implementation details: issue-3886, pull-4462.

Cardano Network Service Assurance

Galois has been making progress on Cardano Network Service Assurance project.

  • In cardano-node, they have developed a datapoint abstraction that creates a queue of (existing) log events, they now have two such datapoints (of log events) implemented.

  • They have developed a datapoint client executable that can connect to a node which serves the "new tracing".

  • They have been exploring approaches for the consolidation and analysis of datapoint data to extract actionable network health status.

Cardano-Node

  • We made it possible to configure accepted connections limit pull-4902.

Testing improvements

  • We fixed a bug in network simulation implementation of TCP simultaneous open pull-4265.

  • We introduced header-body split in the diffusion simulation pull-4419 (in review).

  • We introduced initiator only nodes in the diffusion simulation pull-4280.

  • We fixed a connection-manager test failure issue-4370.

Technical Debt

  • We refactored Snocket interface decoupling it from the multiplexer pull-4260. This simplified some aspects of the KES agent implementation.

  • We introduces a record for CBOR codecs which are used for various data structures by mini-protocol codecs pull-4430.

Documentation

  • We explained some limitations of CDDL in our technical report pull-4351.

IO-Sim

  • We fixed implementation of MVar's pull-70.

NoThunks

  • We published a new version of nothunks library to Hackage.

Next steps

  • Finish implementation & testing of eclipse evasion issue-3886.
  • Optimise connectivity to peers behind firewall issue-4381.
  • Finish the work on enabling block production dynamically to allow using P2P on block producers issue-3159.
  • If time permits we would like also to reserve some time for finishing publication of io-sim to Hackage.

· 7 min read
Jared Corduan

Ledger Quarterly Update

2023-01 - 2023-03

Main achievements

CIPs

  • Entering the Voltaire phase - CIP-1694 received a major update after participation in the design has expanded to more and more people, including those who attended the Colorado workshop. See CIP-1694.
  • Ledger CIP category - The ledger team continues to embrace the CIP process, and has begun the process of registering the ledger as an official CIP category. See CIP-84.
  • Ledger serialization - A CIP for the ledger serialization deprecation cycle has been accepted. See CIP-80.

Formal ledger model

Our new formal specifications backed by Agda have seen a lot of progress. The majority of the ideas in CIP-1694 are now present, and we have made enough progress that we can now safely say that the PDF produced by the Agda model will be the official ledger specification for the Conway ledger era. See the repository.

Conway ledger era

Progress on the Haskell implementation of CIP-1694 has gone hand in hand with the formal model. The major component still missing is the DRep stake distribution, which still presents some technical challenges.

[pull-3176] [pull-3216] [pull-3226] [pull-3291] [pull-3326] [pull-3330] [pull-3339]

DRep stake distribution computation

Adding another large stake distribution to the ledger state must proceed with caution. We do not want the memory used by the node to increase too much, and performance problems can lead to reduced block production. We have prototyped, tested, and benchmarked several approaches that could give us the current DRep stake distribution at each epoch boundary. This has very important implications, since we want every ADA holder to be able to at any time (such as during a contentious vote) register themselves as a DRep and still have time to vote themselves on the issue.

[pull-3344] [pull-3353] [pull-3364]

Integration work

The ledger has made some wonderful improvements over the past six months, but which entail a significant amount of integration efforts:

  • Our new versioned CBOR schemes
  • Individual deposit tracking
  • An improved cross-era interface utilizing lenses
  • A new ledger API
  • Re-arranging the ledger stake in preparation for CIP-1694
  • Versioning our Haskell packages using CHaPs.
  • Consistent conventions for variable names

[pull-3279] [pull-3282] [pull-3288] [pull-3289] [pull-3292] [pull-3297] [pull-3298] [pull-3299] [pull-3300] [pull-3302] [pull-3303] [pull-3308] [pull-3342] [pull-3345] [pull-3356] [pull-3357] [pull-3360] [pull-3361] [pull-3363] [pull-4349] [pull-378] [pull-376] [pull-373] [pull-370] [pull-361] [pull-4976] [pull-5013]

Deposit tracking

Individual deposits (for stake credential and stake pool registrations) were not tracked by the ledger. Deposits were returned according to the current protocol parameters. When the values of these two protocol parameters change, the deposit pot is adjusted by adding to, or removing from, the reserves.

This has several problems:

  • Most people expect a deposit to be paid back exactly.
  • We cannot increase the deposit amount once the reserves hits zero.
  • If it becomes known that the deposit amount is going to be increased, free Lovelace can be earned by registering credentials.
  • Because of the problems above, it is going to be incredibly hard to ever change the values.
  • There is a serious issue involving hard forks. The consensus layer makes the decision about whether or not to enact a hard fork based on the protocol parameter update state two stability windows before the end of the epoch. However, the ledger will reject a protocol parameter update on the epoch boundary if the deposit pot adjustments cannot be reconciled with the reseve pot. This means that if quorum is met regarding changing the major protocol version, but the update is rejected on the epoch boundary, consensus will change the era but the ledger will not change the major protocol version, leaving the ledger in a split-brain state.

Because we never actually changed the values of the two deposits amounts in the protocol parameters on mainnet, we were able to retroactively change the behavior. We made the following changes:

  • Individual deposits are tracked in the DState.
  • The amount deposited is always returned.

[pull-3195] [pull-3202] [pull-3217]

New ledger API

We have significantly built up the ledger API. We will eventually replace much of the cardano-api in the node repository with this ledger API.

[pull-3242] [pull-3248] [pull-3328]

Constraint-based generators

Our largest scale property tests generate an initial ledger state and a long sequence of valid blocks which span several epochs, mimicking a real network. These tests are, in theory, excellent for checking properties. They are, however, very difficult to maintain and are not as random as we would like (a lot of bias has to be introduced to keep the ledger state in enough order to keep generating blocks).

We have a new declaritive infrastructure for building constraint-based generators, which instead generate a random ledger state representative of not just an initial state, but also those representative of the end result of a long sequence of valid blocks. Moreover, these generators are very fast and are much more random than our old generators. Before we can start using them for our existing property tests, however, we still need to expand them to generate a valid block for a given ledger state.

[pull-3219]

Technical debt

We continued to address technical debt as much as we can.

[pull-3167] [pull-3170] [pull-3172] [pull-3175] [pull-3184] [pull-3205] [pull-3208] [pull-3210] [pull-3212] [pull-3218] [pull-3222] [pull-3223] [pull-3224] [pull-3225] [pull-3229] [pull-3239] [pull-3241] [pull-3244] [pull-3245] [pull-3249] [pull-3260] [pull-3263] [pull-3264] [pull-3268] [pull-3269] [pull-3270] [pull-3274] [pull-3276] [pull-3277] [pull-3286] [pull-3290] [pull-3295] [pull-3296] [pull-3306] [pull-3307] [pull-3310] [pull-3311] [pull-3316] [pull-3320] [pull-3323] [pull-3327] [pull-3331] [pull-3332] [pull-3333] [pull-3338] [pull-3341] [pull-3347] [pull-3350] [pull-3351] [pull-3352] [pull-3354]

Critical fixes

We fixed two critical issues:

  • Growing block production delay on the epoch boundary: [pull-3209]
  • Unexpected node shutdown from balanceR: [pull-3343]

Next steps

  • Conway spec - Complete the first version of the conway formal specification.
  • DRep stake distribution - Have the ledger compute the DRep stake distribution with acceptible performance.
  • Devnet ready - Have the Haskell implementation of the conway era in sync with the formal specification, and integrate the changes with consensus and node. All the details might not be finalized, but the wire specification and the API should be stable so that conway can be placed on a devnet for tool builders to start integrating with.
  • Plutus V3 - Integrate Plutus V3 into the ledger, including a new script context which supports DReps.

More details

This quarterly report was based off of the following fortnightly ones:

· 4 min read
Damian Nadales

Consensus Quarterly Update

2022-12 - 2023-01

Main achievements

UTxO HD

The prototype is feature complete and thoroughly tested at the consensus level. In particular, we invested a lot of time in writing property-test for the mempool, and other crucial new parts of the prototype. Now we are ready to run integration tests and system-level benchmarks.

Genesis

We identified and fixed a slowdown in cross-era forecasting that was inhibiting our efforts to benchmark the ChainSync Jumping prototype. This resulted in a 7% speedup in full sync times in the baseline.

We also started prototyping a self-contained implementation of the Genesis dynamics (in particular of the parts intentionally not part of the ChainSync Jumping prototype) that furthered our understanding of subtleties and edge cases.

Support

  • We worked on designing integration of new VRF and KES crypto into consensus.
    • Crypto class was split into two parts: Crypto and HeaderCrypto.
    • With the Ledger team's help, we refactored cardano-ledger to use a proxy type for VRF.

Conway era

  • PR went through its second review round. It is about to be merged, but it got delayed due to people's availability during Christmas break.

Technical debt

  • We improved the capabilities of our io-sim library, which is key for testing and simulating Cardano components.
  • We removed thunks from epoch translations in the ledger, which is important for reducing memory consumption of the Cardano node.

Fostering collaboration

  • We added a tutorial on how to instantiate the Consensus layer to run custom ledgers. This should be a valuable resource to people looking to roll their own custom blockchain (either for commercial or research purposes).
  • We added an overview of consensus to the top level documentation of ouroboros-network. This overview describes the consensus components and adds a hyperlinked map to the modules documentation.

Next steps

UTxO HD

  • Evaluate the extensibility of the prototype. Moving the UTxO to disk is only the first step towards reducing the memory requirements of Cardano node, and ensuring its long term sustainability. In the future, we plan on moving other large maps, such as delegation maps. The prototype should be able to accommodate these changes without any major modifications.
  • Start the integration with other downstream components, such as the wallet and db-sync. The idea is to identify and address any potential pain points that might arise during this integration.
  • Run integration tests and system-level benchmarks.

Genesis

  • Finish benchmarking and tuning the fast-path ChainSync Jumping prototype
  • Expand and optimize the self-contained implementation of the Disconnect Rule (including density comparisons and the LoE)
  • Develop documentation and smoke tests for these components.
  • Start modifying the ChainSync Client for the LoP and LoR.

Support

  • Help the Network team with diagnosing performance regression in block production.

Tech debt

  • Fix property-test failures concerning iterators (#3999 and #4183).

Fostering collaboration

Risks

UTxO HD

  • Moving other parts of the ledger state to disk might require a major redesign of the prototype. For instance, if it turns out that the epoch change rules require access to the full ledger state. If this is the case, we might accept this risk and do the redesign after the initial release of UTxO-HD.
  • Integration with downstream clients might require more work than we anticipate.
  • Access to the benchmarking's team time and resources.
  • Benchmarking results might show significant performance degradation, which will require additional work if such performance degradation is not accepted by other stakeholders.
  • The prototype's performance might not be accepted by other stakeholders. Here we need to clearly communicate that this is necessary to ensure that as the blockchain size grows, the node can operate within reasonable memory constraints.

· 5 min read
Marcin Szamotulski

Network Quarterly Update

2022-11 - 2023-01

Summary

The primary goal of the networking team was to focus on the single relay release of P2P. We fixed a number of small late bugs, and concluded QA & performance testing. Although it was discovered a regression in performance of block production when P2P is enabled, relaying with P2P performs better comparing to a non p2p. We concluded that this is not a blocker for the Single Relay Release which is planned shortly.

Peer sharing has gone through review and final review is just being done right now. After merging it will still be disabled (hidden behind a flag) as it's not safe without eclipse evasion. We started implementing light peer sharing (i.e. include inbound peers into known peer set of the outbound governor).

We started a detailed eclipse evasion design, it will continue in the next quarter.

We also made a major revision of package structure of the network packages. We ended up with a very clean dependency graph (pr #4155).

Armando Santos delivered a talk at the ODOPIS 2022 conference on principles of distributed systems in Brussels. The slides are available here.

Neil Davies gave an invited seminar on DeltaQ at Université Catholique de Louvain.

We also found and fixed a few of bugs:

  • a bug in keep alive mini-protocol which resulted in warm to cold transitions to be always executed through a timeout path rather than do a clean demotion ([pr #4168]).

  • fixed an assetion failure in the outbound governor (issue #4177)

Next steps

We will work towards the next release of P2P for block producer nodes. This includes:

  • analysing performance regression for BP nodes when using P2P
  • finish the work on controlling the block forger through node kernel (pr #3800)
  • address issue #3907 and write a script to analyse deployment of P2P relays

We would like also to push forward eclipse evasion. Although most of the work has be done already the release of io-sim on Hackage will happen in the next quarter.

We would also like to address chain-sync timeout issue recently diagnosed by Karl Knutsson.

If time permits we would also like to address some technical debt, especially:

Risks

The performance regression for block producer with P2P needs to be investigated in the near future. This is blocker for the release of P2P on BP nodes.

Detailed log

Contributions to Ouroboros-Network

  • We added TraceDemoteLocalAsynchronous, which enables notification of critical issues for SPOs
  • We fixed cardano-ping compatibility with NodeToNodeV_10 (P2P, pr #4165)
  • We fixed a bug in demotion peers to cold which affected P2P nodes (commit-61058aa5c2)
  • Karl Knutsson enhanced SendFetchRequest (commit-bb1c3dddee), open-source contribution)
  • We turned SizeInBytes into a newtype.
  • We extended CONTRIBUTING.md, README.md, added CODE_OF_CONDUCT.
  • We fixed DNS test failure issue #4191
  • We fixed a simulation bug found in issue #4258
  • [pr #4168]
  • issue #4177

Contributions to Cardano-Node

  • We maintained the Single Relay Release pr #4612, (e.g. fixing CI issues, Rebasing it when necessary, publishing packages to Cardano Haskell Packages);
  • We enhanced JSON serialisation / deserialisation of NodeToNodeVersion and NodeToClientVersion;

Contributions to IOSim

  • We started to use Cardano Haskell Packages for IOSim (pr #48)
  • We updated change log files
  • We added support of ghc-9.4 (pr #50)

We also addressed the following issues in pr #57 in order to prepare the package for publication on Hackage:

  • refactored io-classes timers API (issue #46);
  • created a new package si-timers which exposes an interface using SI units and is safe on 32-bit systems (issue #59);
  • added monad transformers instances for classes defined in io-classes (issue #58);
  • created io-classes-mtl package which includes (experimental) instances for monad transformers;
  • provide MonadMonotonicTimeNSec in io-classes and MonadMonotonicTime in si-timers (so that io-classes follow the base package);
  • added registerCancellableDelay in si-timers (which allowed us to hide fancy timer api and clean io-classes)
  • added support for js_HOST_ARCH (the new GHC JS backend)

Note the pr #57 contains almost 40 commits, and was a major step forward for io-sim ecosystem. We also prepared a draft pr #4281 which updates ouroboros-network.

Other changes for 1.0.0.0 release on Hackage:

  • Refactored test suite (pr #47)
  • Updated documentation, cabal files, CONTRIBUTING, SECURITY documents, etc in pr #60, currently under review.

· 3 min read
Marcin Szamotulski

Open Source Quarterly Update

2022-11 - 2023-01

Summary

In the last quarter the open-source initiative delivered a comprehensive report on the state of our repositories. As part of this work stream we identified the key open-source repositories for the cardano project across all the projects From a list of more than 500 repositories (some of which are forks) we identified key repositories which constitute the core of Cardano. 20 of them were identified as to be transferred to the future MBO which will govern Cardano development. Some where excluded (like io-sim and typed-protocols), to be govern by IOG, since they have a much broader application than Cardano itself, and thus we think their open-source future will be better outside of the Cardano umbrella.

Christian Taylor identified a number of ways we can improve our repositories to make them more attractive for open-source contributions by analysing each of them. This includes adding or improving various documentation files, like CONTRIBUTING files, adding code of conduct, improving readme files, issue & pull request templates etc. Christian also computed various interesting metrics which gives a very good insight into the development practices: e.g. average merge ratio, average number of reviews, comments and many more! The presentation is available here.

We followed with work on the Cardano Engineering Handbook. We included a standard code of conduct which is now used by most important projects in the Cardano space. We included cardano-node's security policy and added a responsible disclosure policy. We also described how roles and responsibilities should be clarified. This progress was made by a collaborative effort of the Cardano Core, Plutus and Architecture teams, and it wouldn't be possible without Michael Peyton Jones, Arnaud Bailly, Kevin Hammond, Jared Corduan and Marcin Szamotulski.

We also improved the documentation of key repositories, by adding description, improving their README file & CONTRIBUTING files, adding code of conducts following the Cardano Engineering Handbook. This includes improvements to:

And also

The work was carried by Marcin Szamotulski, Addie Girouard and Jared Corduan.

In this quarter we also identified a number of projects which can be published to Hackage (Haskell's package repository) or crate (Rust package repository). The list contains 21 packages, 2 of which (hedgehog-extras and quickcheck-dynamic) are already published on Hackage and another 5 (from the io-sim repository) are close to be published.

Detailed log

The progress of the open-source project is tracked in this project.

· 4 min read
Damian Nadales

Consensus Quarterly Update

2022-09 - 2022-11

Main achievements

UTxO HD

  • As a consequence of the errors observed when running distributed mempool benchmarks, we re-designed the UTxO HD mempool integration, which fixed these errors and lead to a simpler and more maintainable design.

  • We focused on increasing test coverage for the UTxO-HD prototype. In particular, we added property tests for:

    • Backing store (work ongoing)
    • Era transitions
  • The property tests we added uncovered several bugs, which is a great result given the exponential increase in the cost of finding bugs as they are closer to deployment.

  • One of the errors found by our tests required us to work on improvements in the Haskell bindings for LMDB. This work is ongoing.

  • We started working on the mempool property tests that will exercise the new code paths that UTxO HD introduced.

  • We developed, benchmarked and tested an implementation of sequences of differences based on "anti-diffs". Performance results of diff sequence operations show that we achieved a speedup of about 4x across several scenarios. Note: this speedup is taking into account diff sequence operations only, so the consensus-wide speedup is less than 4x.

  • We integrated the "anti-diff" prototype into the UTxO HD feature branch.

Genesis

  • We wrote a simulator that demonstrates soundness of an abstract implementation of the new chain selection rule.
  • We elaborated a draft specification for the Genesis implementation (currently awaiting feedback from other architects).
  • We elaborated a draft specification for the ChainSync Jumping optimization. In particular, this includes a proof sketch that the latter preserves liveness and safety in all cases.
  • With the Networking team, we co-designed the eclipse avoidance mechanism, specifically its coherence with the Genesis implementation plan's security and its dependence on the new ChainSync Jumping optimization.
  • We implemented a prototype for ChainSync Jumping. Initial benchmarks showed a performance degradation wrt the baseline. Our optimization attempts so far have brought the performance closer to the baseline, but not yet to parity.

Conway era

  • We did most of the heavy lifting required to integrate the Conway era into the Consensus layer.

Technical debt

  • We started working on enabling CI nightly tests, which revealed several test failures due to thunks being found it data structures used by the ledger and consensus. We made a lot of progress fixing those thunk errors, but some errors still remain.

  • We elaborated a db-analyser benchmark for the ledger operations. This led us to the identification of high processing time at epoch boundaries, and we could not observe any performance degradation that can be attributed to era changes.

  • We fixed a source of flakiness in the ChainDB QSM test.

  • We clarified a common source of confusion around VRF tie-breaking and cross-era chain selection.

  • We fixed a bug in the maximum-allowed ledger major protocol version.

Fostering collaboration

  • We spent time making cardano-updates the central source of information for the core teams stakeholders.
  • We went through the Galois gap analysis and extracted actionable points to take on next.
  • Bart and Yogesh continued with their onboarding and stated making substantial contributions to consensus.

Next steps

UTxO HD

  • Finish the mempool property tests.
  • Benchmark the latest version of the prototype.
  • Elaborate a document that describes new integration test scenarios and pass it to the SDET team.
  • Bring query UTxO by address command performance on par with the baseline version.

Genesis

  • Receive and incorporate Duncan's feedback on the first draft specification for the Genesis implementation.
  • Begin prototyping the first genesis implementation, unless the first draft needs major changes.
  • Draft a second revision of the Genesis report.
  • Review the second revision with a wider audience, which includes at least Alexander Russell. That feedback will drive a third and hopefully final revision.
  • Investigate how to mitigate the ~30% slowdown we have observed so far in the ChainSync jumping prototype, and try to mitigate it. In particular, we might need to optimize the existing BlockFetch logic.

Tech debt

  • Enabling nightly CI tests.

Fostering collaboration

  • Merge the tutorial document Galois wrote; requires CI integration.
  • Come up with our own documentation improvements, many of which were suggested in the Galois gap analysis.
  • Try to hire a new team member.

· 4 min read
Marcin Szamotulski

Network Quarterly Update

2022-09 - 2022-11

Summary of most important improvements

During this quarter the networking team delivered low level specification of peer sharing & eclipse evasion. We held a session with the consensus & the scientists; we got a positive feedback on the design.

Further we focused on implementation of peer sharing. We produced a detail design and an early implementation.

We prepared the P2P Single Relay Release (cardano-node-1.35.5). It includes over 130 patches of network stack improvements over the previous version 1.35.4, which were accomplished over a longer period of time. Among them are both bug fixes and UX improvements for stake pool operators like simplified format of the topology file, or improvements in the logged messages:

We also provide better integration with systemd (socket activation improvements) or improvements in the networking stack:

  • exit policies,
  • peer metrics improvements,
  • DNS TTL improvements (which make it harder to misconfigure the system, an issue discovered by the performance & monitoring team),
  • do not trigger inbound idle timeout for node-to-client connections (pr #3844), an issue reported to us by Matthias Benkort from Cardano Foundation.

Duncan has been making progress with the input endorsers demo. His simulation provides a useful animated visualisation and live quantification of behaviour of the modeled design.

We also improved our e2e diffusion simulation by implementing header-body split, similar to what the real implementation does.

We also made some advances towards our future goals of P2P release for block producer nodes (pr #3800 - in review) & for Daedalus users (pr #3690 - merged).

Detailed log

  • We expanded diffusion simulation with block-fetch protocol bringing it closer to the production system.

  • We addressed some additional technical depth in diffusion simulation

  • We slightly improved documentation & CI of io-sim and typed-protocols repositories for open-source contributors.

  • We closed a number of issues towards publishing io-sim on Hackage (only two essential issues are left open).

  • We pushed a branch of typed-protocols which captures one of the developer UX problems in the API which we need to solve.

  • We identified and fixed an issue related to systemd sockets.

  • We identified and fixed an issue in consensus initialisation not giving feedback on early errors.

  • We deployed RT View, identified a number of issues which were communicated to the performance & monitoring team.

  • We finished high level & detailed design of peer sharing, very early implementation of peer sharing is done (note that peer sharing cannot be safely deployed without eclipse evasion & genesis).

  • We finished high level design of eclipse evasion, and started working on a detailed design.

  • We were assigned the role of release engineer for 1.35.5 release (the P2P single relay release); we prepared a cardano-node for 1.35.5 release which contains more than 130 patches of just network stack improvements done over last few months.

  • We diagnosed and fixed an tricky bug in the peer state actions (a component which sits between outbound governor and connection manager). That bug was introduced earlier this year and never released. It was caught by the QA testing framework. We expanded our diffusion simulation to cover such case and also mitigated a chance for reintroducing such a bug in future.

  • We identified and quite likely mitigated a misconfiguration in the benchmarking cluster (next benchmarking run will confirm our hypothesis).

  • We simplified the format of p2p topology file, we got positive feedback from SPOs.

  • We raised severities of some of the logging messages, which is an important improvement for SPOs, exchanges and other users of the system.

  • We worked on input endorsers simulation which gives both animated and quantified live feedback on network operation, using a simplified model of a TCP/IP network.

Next quarter

  • Release the Single Relay P2P Release 1.35.5.

  • Carry on with Peer Sharing (review, testing).

  • Deliver a talk at Conference on Principles of Distributed Systems 2022 in Brussels, Belgium.

  • Present Detailed Design of Eclipse Evasion and start implementation phase.

  • Work on P2P Block Producer release.

  • Carry on with publishing of io-sim on Hackage.

· 2 min read
Jared Corduan

Ledger Quarterly Update

2022-09 - 2022-11-04

  • We finished a minimal ledger era capable of master key rotation. This will be re-purposed our upcoming work.
  • We have the humble beginnings of a proper ledger API.
  • We improved the problematic cost model serialization (recall the song and dance about updating the cost model one epoch after the hard fork).
  • We have added benchmarks for problematic areas.
  • Massive repository restructure and cleanup.
    • Unified and consistent variable name schemes (not completely finished, but nearly there).
    • Massive reduction in type constraints, which causes a lot of developer friction, in our code and also downstream.
    • More organized module structures.
    • Improved generators for our property tests.
    • We removed our dependency on cardano-prelude.
  • The formal ledger model has come a long way.
    • We created a fork of Agda that provides some meta-programming support for the ledger rules.
    • We have a large amount of the basic UTxO support in the model.
    • We can generate a good looking PDF from the model.
    • We can produce Haskell from the model.
    • We have a nice finite set theory library that we can use for many of the ledger rules.
    • We have nix support for the model.

Next steps

  • Individual tracking of deposits. [issue-3113]
  • Versioned CBOR encoders/decoders. [issue-3014]
  • New ledger era transaction body (and the surround work associated with it).
  • Designs for the next ledger era.