Skip to main content

4 posts tagged with "network"

View All Tags

· 6 min read
Marcin Szamotulski

2023-04 - 2023-06

Main achievements

Eclipse Evasion

We finalised the design of eclipse evasion and implemented its mechanism which relays on connectivity to big ledger peers. Big ledger peers are the largest ledger peers which accumulate 90% of stake (currently there are less than 1000 of them). The outbound governor has new targets for known, established and active big ledger peers which work in a similar way that such targets work for ledger peers. The ouroboros-network#4662 PR is currently in review.

As part of this work we also identified a bug which would prevent a node to connect to itself. Such connections are not easily detectable and are expected to be dropped by the churn mechanism, nonetheless they should not be buggy. The failure was discovered thanks to our e2e simulation of diffusion using io-sim & property based testing.

The PR also refactors the heart of the ouroboros-network interface reducing technical debt that would otherwise accumulate.

We also identified a possible improvement in the churn mechanism, which will be implemented in Q3. Churn needs to await for peers to terminate, we can improve the synchronisation. [ouroboros-network#4617]

Ecosystem P2P Deployment Progress

We reached 50% of stake in hands of SPOs who run at least on P2P relay. Now also Emurgo and CF are running some P2P relays. Also 20% of IOG relays are running in P2P mode.

P2P Progress

Peer Sharing

We implemented bootstrapping for peer sharing (also known as light peer sharing). New downstream (inbound) peers are now added to the known peers of the outbound governor. Together with peer sharing this allows for non registered relays to propagate through the network. ouroboros-network#3596

Please note that peer sharing is disabled by default and is not considered safe until Bootstrap Peers (see below) or Genesis is implemented.

Diffusion (P2P)

  • We designed a feature which will reduce the load on IOG relays (in future also run by CF & Emurgo). The feature consists of two parts. A new source of peers called bootstrap peers (obtained from via an https request), the ability to switch from bootstrap peers to ledger peers if the node is synced (we are collaborating with the consensus team on the interface Bootstrap Peers IER). This feature will be completed in Q3. ouroboros-network#4530

  • We published a blog post about P2P design & implementation.

  • Karl Knutsson (CF) fixed an issue observed on a relay with a lot of outbound connections: ouroboros-network#4559.

  • We merged changes which allow the consensus layer to start / stop block forging thread. This will allow to deploy P2P block producing nodes which serve as a live backup node. ouroboros-consensus#140

  • We fixed a few bugs in local root peers DNS resolution service: ouroboros-network#4583, ouroboros-network#4571.

  • We limited concurrency of DNS name resolutions: ouroboros-network#4596.

  • Galois Inc implemented query option for Handshake: ouroboros-network#4256.

  • We fixed handshake query timeout: ouroboros-network#4608.

  • We implemented warm valency for local root peers. This can help when using DNS names in local root peers which resolve to many IP addresses. ouroboros-network#4575

  • We merged handshake changes which allow query protocol versions. Thanks to James Parker from Galois Inc.: ouroboros-network#4256, cardano-cli#30.

Other Improvements & Developments

CDDL

  • We added node-to-node and node-to-client CDDL specs / tests for encoding of NodeToNodeVersionData and NodeToClientVersionData.

  • We clarified an inconsistency between CDDL spec and implementation which is highly polymorphic. We designed and implemented a fix for tx-submission and local-tx-submission mini-protocols. Specs for other mini-protocols will be improved at a later stage. ouroboros-network#4580

Cardano Ping

IOSim

Typed Protocols

Cardano Client

  • We fixed a bug in cardano-client-0.1.0.2 release which results in clients (e.g. db-sync) negotiate an experimental protocol version.

Technical debt

CI improvements

GHC 9.4 & 9.6

We made all repositories under our control compile with ghc-9.4 and ghc-9.6 which includes ouroboros-network, io-sim, typed-protocols and Win32-network.

Next steps

We will continue towards our aspirational roadmap.

  • We will continue reviewing eclipse evasion.
  • As ouroboros-consensus#140 was merged, we are making progress towards releasing P2P on block production nodes. We hope to analyse performance regression on such nodes observed on the benchmarking cluster. roadmap-3887
  • We are also focused on roadmap-3969. Note that it was expanded in Q2.

· 3 min read
Marcin Szamotulski

2023-01 - 2023-03

Main achievements

Gradual dynamic P2P release on mainnet

We released two version of cardano-node with dynamic P2P capabilities:

  • 1.35.6
    • we found and fixed a bug in exception handling in peer-state-actionspull-4357
    • we found and fixed a busy loop when demoting a peer from hot to warm pull-4385
  • 1.35.7
    • includes interoperability in the legacy non-p2p network stack pull-4467
  • we fixed a busy loop of demotion & promotions: warm -> hot -> warm[pull-4485] /it will be included in cardano-node-8.0.0 release/.

Currently there are more than 200 P2P relays on mainnet.

Peer Sharing

We implemented /peer sharing/ pull-4019 which will be available as an experimental feature in one of the future cardano-node releases.

We implemented /light peer sharing/, e.g. adding inbound connections to the set of known peers of the outbound governor, which allows to bootstrap relays not registered on chain. This complements peer sharing. The pull-4277 is in late review stages.

Eclipse Evasion

We finalised design of eclipse evasion and we started implementing it. We have an initial implementation (not merged). We are in the process of extending our test suite to cover new implementation details: issue-3886, pull-4462.

Cardano Network Service Assurance

Galois has been making progress on Cardano Network Service Assurance project.

  • In cardano-node, they have developed a datapoint abstraction that creates a queue of (existing) log events, they now have two such datapoints (of log events) implemented.

  • They have developed a datapoint client executable that can connect to a node which serves the "new tracing".

  • They have been exploring approaches for the consolidation and analysis of datapoint data to extract actionable network health status.

Cardano-Node

  • We made it possible to configure accepted connections limit pull-4902.

Testing improvements

  • We fixed a bug in network simulation implementation of TCP simultaneous open pull-4265.

  • We introduced header-body split in the diffusion simulation pull-4419 (in review).

  • We introduced initiator only nodes in the diffusion simulation pull-4280.

  • We fixed a connection-manager test failure issue-4370.

Technical Debt

  • We refactored Snocket interface decoupling it from the multiplexer pull-4260. This simplified some aspects of the KES agent implementation.

  • We introduces a record for CBOR codecs which are used for various data structures by mini-protocol codecs pull-4430.

Documentation

  • We explained some limitations of CDDL in our technical report pull-4351.

IO-Sim

  • We fixed implementation of MVar's pull-70.

NoThunks

  • We published a new version of nothunks library to Hackage.

Next steps

  • Finish implementation & testing of eclipse evasion issue-3886.
  • Optimise connectivity to peers behind firewall issue-4381.
  • Finish the work on enabling block production dynamically to allow using P2P on block producers issue-3159.
  • If time permits we would like also to reserve some time for finishing publication of io-sim to Hackage.

· 5 min read
Marcin Szamotulski

Network Quarterly Update

2022-11 - 2023-01

Summary

The primary goal of the networking team was to focus on the single relay release of P2P. We fixed a number of small late bugs, and concluded QA & performance testing. Although it was discovered a regression in performance of block production when P2P is enabled, relaying with P2P performs better comparing to a non p2p. We concluded that this is not a blocker for the Single Relay Release which is planned shortly.

Peer sharing has gone through review and final review is just being done right now. After merging it will still be disabled (hidden behind a flag) as it's not safe without eclipse evasion. We started implementing light peer sharing (i.e. include inbound peers into known peer set of the outbound governor).

We started a detailed eclipse evasion design, it will continue in the next quarter.

We also made a major revision of package structure of the network packages. We ended up with a very clean dependency graph (pr #4155).

Armando Santos delivered a talk at the ODOPIS 2022 conference on principles of distributed systems in Brussels. The slides are available here.

Neil Davies gave an invited seminar on DeltaQ at Université Catholique de Louvain.

We also found and fixed a few of bugs:

  • a bug in keep alive mini-protocol which resulted in warm to cold transitions to be always executed through a timeout path rather than do a clean demotion ([pr #4168]).

  • fixed an assetion failure in the outbound governor (issue #4177)

Next steps

We will work towards the next release of P2P for block producer nodes. This includes:

  • analysing performance regression for BP nodes when using P2P
  • finish the work on controlling the block forger through node kernel (pr #3800)
  • address issue #3907 and write a script to analyse deployment of P2P relays

We would like also to push forward eclipse evasion. Although most of the work has be done already the release of io-sim on Hackage will happen in the next quarter.

We would also like to address chain-sync timeout issue recently diagnosed by Karl Knutsson.

If time permits we would also like to address some technical debt, especially:

Risks

The performance regression for block producer with P2P needs to be investigated in the near future. This is blocker for the release of P2P on BP nodes.

Detailed log

Contributions to Ouroboros-Network

  • We added TraceDemoteLocalAsynchronous, which enables notification of critical issues for SPOs
  • We fixed cardano-ping compatibility with NodeToNodeV_10 (P2P, pr #4165)
  • We fixed a bug in demotion peers to cold which affected P2P nodes (commit-61058aa5c2)
  • Karl Knutsson enhanced SendFetchRequest (commit-bb1c3dddee), open-source contribution)
  • We turned SizeInBytes into a newtype.
  • We extended CONTRIBUTING.md, README.md, added CODE_OF_CONDUCT.
  • We fixed DNS test failure issue #4191
  • We fixed a simulation bug found in issue #4258
  • [pr #4168]
  • issue #4177

Contributions to Cardano-Node

  • We maintained the Single Relay Release pr #4612, (e.g. fixing CI issues, Rebasing it when necessary, publishing packages to Cardano Haskell Packages);
  • We enhanced JSON serialisation / deserialisation of NodeToNodeVersion and NodeToClientVersion;

Contributions to IOSim

  • We started to use Cardano Haskell Packages for IOSim (pr #48)
  • We updated change log files
  • We added support of ghc-9.4 (pr #50)

We also addressed the following issues in pr #57 in order to prepare the package for publication on Hackage:

  • refactored io-classes timers API (issue #46);
  • created a new package si-timers which exposes an interface using SI units and is safe on 32-bit systems (issue #59);
  • added monad transformers instances for classes defined in io-classes (issue #58);
  • created io-classes-mtl package which includes (experimental) instances for monad transformers;
  • provide MonadMonotonicTimeNSec in io-classes and MonadMonotonicTime in si-timers (so that io-classes follow the base package);
  • added registerCancellableDelay in si-timers (which allowed us to hide fancy timer api and clean io-classes)
  • added support for js_HOST_ARCH (the new GHC JS backend)

Note the pr #57 contains almost 40 commits, and was a major step forward for io-sim ecosystem. We also prepared a draft pr #4281 which updates ouroboros-network.

Other changes for 1.0.0.0 release on Hackage:

  • Refactored test suite (pr #47)
  • Updated documentation, cabal files, CONTRIBUTING, SECURITY documents, etc in pr #60, currently under review.

· 4 min read
Marcin Szamotulski

Network Quarterly Update

2022-09 - 2022-11

Summary of most important improvements

During this quarter the networking team delivered low level specification of peer sharing & eclipse evasion. We held a session with the consensus & the scientists; we got a positive feedback on the design.

Further we focused on implementation of peer sharing. We produced a detail design and an early implementation.

We prepared the P2P Single Relay Release (cardano-node-1.35.5). It includes over 130 patches of network stack improvements over the previous version 1.35.4, which were accomplished over a longer period of time. Among them are both bug fixes and UX improvements for stake pool operators like simplified format of the topology file, or improvements in the logged messages:

We also provide better integration with systemd (socket activation improvements) or improvements in the networking stack:

  • exit policies,
  • peer metrics improvements,
  • DNS TTL improvements (which make it harder to misconfigure the system, an issue discovered by the performance & monitoring team),
  • do not trigger inbound idle timeout for node-to-client connections (pr #3844), an issue reported to us by Matthias Benkort from Cardano Foundation.

Duncan has been making progress with the input endorsers demo. His simulation provides a useful animated visualisation and live quantification of behaviour of the modeled design.

We also improved our e2e diffusion simulation by implementing header-body split, similar to what the real implementation does.

We also made some advances towards our future goals of P2P release for block producer nodes (pr #3800 - in review) & for Daedalus users (pr #3690 - merged).

Detailed log

  • We expanded diffusion simulation with block-fetch protocol bringing it closer to the production system.

  • We addressed some additional technical depth in diffusion simulation

  • We slightly improved documentation & CI of io-sim and typed-protocols repositories for open-source contributors.

  • We closed a number of issues towards publishing io-sim on Hackage (only two essential issues are left open).

  • We pushed a branch of typed-protocols which captures one of the developer UX problems in the API which we need to solve.

  • We identified and fixed an issue related to systemd sockets.

  • We identified and fixed an issue in consensus initialisation not giving feedback on early errors.

  • We deployed RT View, identified a number of issues which were communicated to the performance & monitoring team.

  • We finished high level & detailed design of peer sharing, very early implementation of peer sharing is done (note that peer sharing cannot be safely deployed without eclipse evasion & genesis).

  • We finished high level design of eclipse evasion, and started working on a detailed design.

  • We were assigned the role of release engineer for 1.35.5 release (the P2P single relay release); we prepared a cardano-node for 1.35.5 release which contains more than 130 patches of just network stack improvements done over last few months.

  • We diagnosed and fixed an tricky bug in the peer state actions (a component which sits between outbound governor and connection manager). That bug was introduced earlier this year and never released. It was caught by the QA testing framework. We expanded our diffusion simulation to cover such case and also mitigated a chance for reintroducing such a bug in future.

  • We identified and quite likely mitigated a misconfiguration in the benchmarking cluster (next benchmarking run will confirm our hypothesis).

  • We simplified the format of p2p topology file, we got positive feedback from SPOs.

  • We raised severities of some of the logging messages, which is an important improvement for SPOs, exchanges and other users of the system.

  • We worked on input endorsers simulation which gives both animated and quantified live feedback on network operation, using a simplified model of a TCP/IP network.

Next quarter

  • Release the Single Relay P2P Release 1.35.5.

  • Carry on with Peer Sharing (review, testing).

  • Deliver a talk at Conference on Principles of Distributed Systems 2022 in Brussels, Belgium.

  • Present Detailed Design of Eclipse Evasion and start implementation phase.

  • Work on P2P Block Producer release.

  • Carry on with publishing of io-sim on Hackage.