Skip to main content

59 posts tagged with "network"

View All Tags

· 2 min read
Marcin Szamotulski

High level summary

In the current sprint the networking team focused on fixing bugs and pushing forward implementation of eclipse evasion. We also found a bug in our simulation testing setup (in integration of test node). We also overviewed the work on extending handshake protocol which is delivered by Galois Inc.

We published ouroboros-network-0.4.0.1 and ouroboros-network-protocols-0.3.0.0 to CHaP.

We also fixed a bug in cardano-node which results in not being able to configure inbound connection limits, see PR #4902.

Together with Karl Knutsson (CF) we realised an issue in cardano-cli: it's validation of DNS names, IP address & ports when registering a stake pool should be more strict to protect against common mistakes which we identified on the chain. See issue #4929.

Detailed work log

In PR #4385 we fixed two bugs in peer state actions. First one results in a busy loop if demotion from hot to warm times outs. This busy loop is eventually exited when mux exits (we reported this in our previous report). This fix made it to 1.35.6 release as well.

In addition the PR #4385 also fixes another bug which results in hot -> warm -> hot demotion / promotion busy loop.

The PR #4385 also fixed a bug in a node only used in simulation which resulted in not using chain-sync or block-fetch mini-protocols. In the review process, we realised that the header-body split in the simulated node requires further work (see PR #4419, which is under review).

The PR #4385 also extend our generators, which together with the above fix, cover the hot -> warm -> hot demotion / promotion busy loop.

In PR #4419 we introduce a ChainDB for our simulation node, which plays similar role to ChainDB in the ouroboros-consensus: a persistent (across simulated restarts) store of blocks which does chain selection. This ensures that the simulated node is using block-fetch to download blocks announced by chain-sync mini-protocol.

We also made progress with reviewing PR #4019 - peer sharing.

We also fixed issue #4370 - a connection manager test failure, see PR #4384.

· One min read
Marcin Szamotulski

High level summary

Recently QA found a bug in P2P code, which results in busy loops. We added one fix to 1.35.6 release, another one will likely be part of next release. The first one is already included in ouroboros-network-0.3.0.1 release. These bugs could only affect nodes which are out of sync and thus should not impose risk on well maintained nodes on mainnet. We also advertise to deploy at most one of the relays as a P2P node, which shields from possible consequences.

We recently finished design phase of eclipse evasion and we started implementing it (see issue #3886 for progress).

Galois finished implementing Handshake extension which will allow to query network protocol versions (see pr #4256).

We also recently released a newer set of network packages to be integrated with cardano-node master branch, this includes:

* monoidal-synchronisation-0.1.0.2
* cardano-client-0.1.0.2
* network-mux-0.3.0.0
* ouroboros-network-api-0.1.0.0
* ouroboros-network-protocols-0.2.0.0
* ouroboros-network-testing-0.2.0.1
* ouroboros-network-mock-0.1.0.0
* ouroboros-network-framework-0.3.0.0
* ouroboros-network-0.4.0.0 (it doesn't not yet include the fix we included
in `0.3.0.1`)

· 2 min read
Marcin Szamotulski

High level summary

We have been working towards cardano-node-1.35.5 release. QA & benchmarking teams gave a green light for the release, and we made decent progress with some CI problem which we encountered on the way (PR #4612). We are also working on peer sharing, making improvements in our testing infrastructure, reducing technical debt and making progress towards io-sim-1.0.0.0. Galois is making progress on Handshake improvements.

Low level summary

Our diffusion simulation network now includes a mixed network of initiator only and initiator and responder nodes. issue #4222

We are now reviewing the peer sharing pull request.

We are also reviewing pull request which introduces handshake query flag. PR #4256

We fixed a bug in our network simulator. The bug was triggered when a node died when performing a simultaneous TCP open (a corner case of a corner case!). PR #4265

We also refactored Snocket interface and removed the bearer construction from its methods. PR #4260

We are working towards releasing io-sim-1.0.0.0 on Hackage, which includes reviewing two PRs: PR #57 and PR #60 as well as writing an announcement blog post.

· One min read
Marcin Szamotulski

High level summary

In last sprint the team focused on preparations for the conference talk at OPODIS 2022. We also worked on preparations to publish io-sim and related packages on Hackage (PR #57, PR #60).

We also started reviewing:

  • ouroboros-network
  • cardano-node
  • cardano-ledger repositories for open-source readiness (PR #4128).

We prepared a PR which changes how node-to-node and node-to-client protocol versiones are serialised in cardano-node log (PR #4691).

· 4 min read
Marcin Szamotulski

Stake-Driven Data Diffusion Release for Relays

IOG networking team decided to release the Stake-Driven Data Diffusion with Robust Optimised Peer Selection also more commonly known as P2P. In the last update, we informed about a performance regression, but it turns out it only affects block producers, and thus we highly advise against running it on such nodes. Further investigation is required to find the cause of it.

On IOG's benchmarking cluster we have seen quite a good performance improvement on block propagation itself. The cluster is running a static topology with valency 6 (each node is connected to 6 other nodes). In which every of the 50 nodes are block producers. The setup of this network is the same as mainnet. We've seen 40-50% performance improvement on block propagation comparing to the same cluster deployed with the same topology but using non-P2P nodes. We think this performance improvement is caused by using full duplex connections. Quite likely the transaction traffic floating in both directions on the same TCP connection helps to keep the TCP window open. Note that in a cluster of 50 nodes with valency 6 the probability of having at least one duplex connection is more than 50%. We don't expect the same improvement on mainnet because the network is much wider and the transaction traffic is not as large.

Just before the release we squashed two small bugs:

  • issue #4163 - top level integration bug in keep-alive;
  • issue #4177 - a bug in outbound-governor;
  • PR #4165 - a fix cardano-ping support of NodeToNodeV_10.

Peer Sharing

We were carrying a review of peer sharing PR.

DeltaQ

Neil Davies was invited to give a guest lecture entitled Avoiding System Catastrophes at UCLouvain.

What have we achieve last sprint

  • issue #4163: we found out that a control message is not passed to the keep-alive mini-protocol, this results in every demotion executing demotion timeout rather than a graceful termination. With the fix the node will no longer log:

    { "kind": "PeerStatusChangeFailure"
    , "peerStatusChangeType": "WarmToCold (ConnectionId {localAddress = 192.168.0.10:7000, remoteAddress = 3.129.186.40:3000})"
    , "reason": "TimeoutError"
    }
  • issue #4177: we fixed an assertion failure in the outbound-governor; now we don't try demoted peers which are being demoted already.

  • PR #4155: we refactored ouroboros-network packages. There's a top level ouroboros-consensus-diffusion package which integrates network & consensus code. We also introduced:

    • ouroboros-network-api package which contains the API shared between network & conensus;
    • ouroboros-network-mock package which contains mock API used for testing (e.g. a mock chain & chain producer, etc.)
    • ouroboros-network-protocols package which contains implementation of all (but handshake) mini-protocols, exposes a testlib and contains test and cddl components.

    This made the dependency tree of network & consensus packages much cleaner.

  • PR #4169: we described the usage of release branches in CONTRIBUTING.md doc.

  • PR #4165: we fixed cardano-ping support of NodeToNodeV_10 protocol.

DeltaQ

The abstract of the talk:

An essential step to ensuring that distributed systems are fit for purpose.

Distributed systems have become an integral part of our society and daily lives. We are, both implicitly and explicitly, individually as well as collectively, placing ever more trust in them.

Are they worthy of this trust? Our need for them to be ‘fit-for-purpose’ goes well beyond notions of functional correctness (i.e. never getting the wrong answer). We need them to deliver the desired outcomes in a timely, robust, reliable, resilient fashion, at scale and in a sustainable way (both economically and environmentally).

This all sounds like a worthy aspiration, but what would be a practical approach to capturing and reasoning about these issues? How can we ensure that systems can meet their fit-for-purpose objectives, not just in their design but as they are deployed, encounter the imperfect world, are scaled to become economic, and proceed into ongoing maintenance?

This talk will illustrate how the notions of Outcomes and Quality Attenuation (as captured by ‘∆Q’) are being used to both frame the necessary notions and provide a basis for assuring the refinement and reification of such systems, from initial concept to operational infrastructure.

You can download the slides from here.