Network Quarterly Update
2022-09 - 2022-11
Summary of most important improvements
During this quarter the networking team delivered low level specification of peer sharing & eclipse evasion. We held a session with the consensus & the scientists; we got a positive feedback on the design.
Further we focused on implementation of peer sharing. We produced a detail design and an early implementation.
We prepared the P2P Single Relay Release (cardano-node-1.35.5
). It
includes over 130 patches of network stack improvements over the
previous version 1.35.4
, which were accomplished over a longer period of
time. Among them are both bug fixes and UX improvements for stake pool
operators like simplified format of the topology file, or
improvements in the logged messages:
- tracing of early consensus exceptions
- tracing of demotion of local root peers (traced with
Warning
severity), the trace is calledTraceDemoteLocalAsynchronous
, injson
format it is encoded asDemoteLocalAsynchronous
. For an SPO tracking these demotion is vital (such demotion could indicate that a block producer is no longer connected to its relays or vice versa)
We also provide better integration with systemd (socket activation improvements) or improvements in the networking stack:
- exit policies,
- peer metrics improvements,
- DNS TTL improvements (which make it harder to misconfigure the system, an issue discovered by the performance & monitoring team),
- do not trigger inbound idle timeout for
node-to-client
connections (pr #3844), an issue reported to us by Matthias Benkort from Cardano Foundation.
Duncan has been making progress with the input endorsers demo. His simulation provides a useful animated visualisation and live quantification of behaviour of the modeled design.
We also improved our e2e diffusion simulation by implementing header-body split, similar to what the real implementation does.
We also made some advances towards our future goals of P2P release for block producer nodes (pr #3800 - in review) & for Daedalus users (pr #3690 - merged).
Detailed log
We expanded diffusion simulation with block-fetch protocol bringing it closer to the production system.
We addressed some additional technical depth in diffusion simulation
We slightly improved documentation & CI of io-sim and typed-protocols repositories for open-source contributors.
We closed a number of issues towards publishing io-sim on Hackage (only two essential issues are left open).
We pushed a branch of typed-protocols which captures one of the developer UX problems in the API which we need to solve.
We identified and fixed an issue related to systemd sockets.
We identified and fixed an issue in consensus initialisation not giving feedback on early errors.
We deployed RT View, identified a number of issues which were communicated to the performance & monitoring team.
We finished high level & detailed design of peer sharing, very early implementation of peer sharing is done (note that peer sharing cannot be safely deployed without eclipse evasion & genesis).
We finished high level design of eclipse evasion, and started working on a detailed design.
We were assigned the role of release engineer for 1.35.5 release (the P2P single relay release); we prepared a cardano-node for 1.35.5 release which contains more than 130 patches of just network stack improvements done over last few months.
We diagnosed and fixed an tricky bug in the peer state actions (a component which sits between outbound governor and connection manager). That bug was introduced earlier this year and never released. It was caught by the QA testing framework. We expanded our diffusion simulation to cover such case and also mitigated a chance for reintroducing such a bug in future.
We identified and quite likely mitigated a misconfiguration in the benchmarking cluster (next benchmarking run will confirm our hypothesis).
We simplified the format of p2p topology file, we got positive feedback from SPOs.
We raised severities of some of the logging messages, which is an important improvement for SPOs, exchanges and other users of the system.
We worked on input endorsers simulation which gives both animated and quantified live feedback on network operation, using a simplified model of a TCP/IP network.
Next quarter
Release the Single Relay P2P Release
1.35.5
.Carry on with Peer Sharing (review, testing).
Deliver a talk at Conference on Principles of Distributed Systems 2022 in Brussels, Belgium.
Present Detailed Design of Eclipse Evasion and start implementation phase.
Work on P2P Block Producer release.
Carry on with publishing of
io-sim
onHackage
.