58 posts tagged with "performance-tracing"

View All Tags

Performance & tracing update

March 8, 2023 · 2 min read

Michael Karg

Performance and Tracing Team Lead

Release benchmarking: We again performed benchmarks for the next 1.35.6 release candidate.
New tracing: Backwards compatibility with legacy tracer nomenclature is being implemented to smoothe the transition for end users.
Analysis pipeline: A major refinement of benchmarking metrics has been realized, along with a structural improvementents regarding metrics denomination.
Open Sourcing: Work on going live with our benchmarking data has begun, as well as creating an API demo and documentation.
Nomad backend: The backend was adapted to a major refactoring in workbench and is being equipped with a nomad-exec based task driver.

Performance

1.35.6 release

Benchmarking the second release candidate for 1.35.6 could again attest to a perfectly clean bill of health.

Analysis pipeline

Our analysis pipeline has seen an introduction of additional metrics, especially when focusing on the block producing node. They allow us to better differentiate the timing of ledger ticking and mempool snapshotting in the forging loop - a feature that promises much deeper insight into UTxO-HD performance. Additionally, a restructuring of metrics names has been undertaken along with improvements in their data dictionary; a measure that will make benchmarking data more easily accessible.

Open Sourcing

As a prerequisite for going live with our benchmarking data, we're currently working on consolidation of existing analyses, such as to provide a common foundation when accessing them externally. Additionally, we've begun working on a small visualization demo and interactive API documentation. Those will enable third parties to make use of that data much more easily, by having reliable guidelines and a working example.

Tracing

The new tracing system is being outfitted with a comprehensive mapping of its structure to the legacy tracer nomenclature. This feature will make the switch to the new system as smooth as possible for end users, allowing them to gradually adapt their tooling without breaking any functionality in the process.

Infrastructure

Nomad backend

The Nomad backend was adapted to the latest major refactoring in workbench. Work was done on making stateful Nomad clients more autonomous, which will greatly facilitate any automation building on that backend. A task driver based on nomad-exec is currently being implemented.

Performance & tracing update

February 23, 2023 · 3 min read

Michael Karg

Performance and Tracing Team Lead

SECP benchmarking: we concluded our benchmarking runs and analyses of the new SECP primitives for the Valentine hard-fork.
Release benchmarking: we performed a round of benchmarks for the 1.35.6 release.
UTxO-HD benchmarking: we performed first runs for UTxO-HD and are currently refining the benchmarking setup.
New tracing: for better accessibility, the new tracing system is being outfitted with introspective capabilities.
Infrastructure: with the Nomad cloud workbench backend we were able to perform our first test cluster runs successfully on SRE infrastructure.
Infrastructure: the initial NixOps workbench backend has been completed; a PR containing this work, along with many quality-of-life improvements of our tooling, got merged.

Performance

SECP

For SECP, we settled on a fixed tx count per block, while simultaneously spending as much as possible of the block budget. Thus we were able to minimize the impact of per-SC-call overhead.
The final runs were performed with various fractions, e.g. half, of the current block budget to ascertain how these workloads would fare compared to a value-only run.
The SECP machinery and profiles are currently being generalized into an approach to aim for very specific aspects of a smart contract for benchmarking.

UTxO-HD

After analyzing initial UTxO-HD runs, it turned out that mempool snapshotting had to be throttled for benchmarking; it affects a lock that UTxO-HD had to introduce into the forging loop.
We're currently adapting the benchmark setup to that, and will then perform a new combination of baseline and UTxO-HD runs.

1.35.6 release

Benchmarking the 1.35.6 release candidate could attest to a perfectly clean bill of health.

Tracing

Work on the new tracing system's introspective capabilites is ongoing: Immediate use cases of the new API include being able to statically validate generated tracer documentation, as well as providing information of a specific tracing setup in the node via traces themselves. These features will make the new system both more robust, and more accessible.

Infrastructure

Nomad backend

Work on the cloud deployment capability of the Nomad workbench backend continued; for testing we can automate multiple Nomad clients.
Locality assumptions were removed and job monitoring was refactored.
To facilitate directly-executable derivations, Nomad Job specification files are now self contained with GitHub references and configs needed to run a cluster.
We're currently evaluating different options for genesis distribution in said cluster.

NixOps backend

The NixOps workbench backend has reached an initial functional stage. Consequently, the relevant PR was merged. It also contained many improvements to our analysis tooling, as well as a structural overhaul of workbench itself. We consider this an important step of future-proofing our benchmarking machinery.

Performance & tracing update

February 8, 2023 · 3 min read

Serge Kosyrev

Performance and Tracing Team Lead

High level summary

SECP benchmarking: we ran several rounds of SECP benchmarks, refining the benchmark setup as we discovered the properties of the system. After formulating an initial suggested change to the protocol parameters, we're currently running what we consider the final benchmark, to validate the underlying assumptions.
Release benchmarking: we've performed a round of benchmarks for the hotfix 1.35 release update and initiated the 1.35.6 benchmarks.
New tracing: the improvement in the tracing API, with the underlying restructuring, was completed and merged into the node.
New tracing: before going live, we're performing the documentation update, as well as reworking the end user migration guide.
Open sourcing: the benchmarking data publishing has been completed and deployed. After populating it with relevant benchmark data and providing basic user documentation we can go live.
Infrastructure: the cloud workbench backend is progressing well, the networking aspects of multi-region deployment are currently being worked on.
Infrastructure: the NixOps workbench backend is still being worked on, as part of migration from cardano-ops and benchmarking infrastructure unification.

Performance

We are approaching the end of a chain of SECP benchmarks, as we gradually eliminated deficiencies in the setup as we were discovering them and answering newly appearing questions:

we improved the tx/block filling strategy in the generator, to maximise the per-block utilisation of resources and so better approximate the worst-case,
after a discovery of what looked like significant per-SC-call overhead, we again tweaked the the tx/block filling strategy,
finally, we're redoing all benchmarks together with a value-only run against the backdrop of Mainnet-sized datasets, to balance the suggested adjustment. That also ran into difficulties wrt. limitations of our benchmarking hardware.

In addition, we started benchmarks of the 1.35.6 release.

Tracing

A rework of the new tracing system's internals and API was merged. It extended the system with introspection, which enabled a range of improvements, some of which were implemented along the way.

Specifically, we were able to completely short-cut processing of messages generated by the tracers that were made provably ineffective by current tracing configuration. Further, now ongoing work enabled by the introspection facilities, includes static validation of documentation and enhanced node state reporting.

Infrastructure

On the opensourcing/transparency front, the benchmark data publishing machinery was finally fully assembled and put online. As resources permit, we'll work on populating it with benchmarking data, preparing basic documentation and engaging the stakeholders.

The work on the cloud deployment capability of the Nomad workbench backend continued with focus on setting up inter-node networking and removal of locality assumptions. A major step besides those, was completion of a switch-over to the directly-executable derivations, which eliminate the need for creation and distribution of images -- thereby increasing the speed of deployment.

The Nixops workbench backend progressed steadily, reaching minimal deployment capability. The remaining parts are proper shared configuration generation, and porting of the run control functionality from cardano-ops.

Performance & tracing update

January 11, 2023 · 2 min read

Serge Kosyrev

Performance and Tracing Team Lead

High level summary

Since our last update, we focused on infrastructure work: benchmark enablement, tracing system, benchmark environment merge and open source support:

SECP benchmarking enablement is underway: enabling SECP runs in our cardano-ops benchmarking environment is still in progress.
The new tracing system: the improved API of the new tracing system was implemented, and we're now porting the tracing integration layer over.
Infrastructure: the mainnet protocol parameter history is now encoded in the workbench profile machinery at epoch-level granularity, which gives us a systematic approach towards description of past and future benchmarks.
New benchmark deployment infrastructure: we've made some progress on Nomad deployment backend, shared by both of the data publishing and benchmarking needs.
Legacy benchmarking: we've started merging the legacy benchmark deployment infrastructure into the workbench.
Open sourcing: the benchmarking data publishing tool was adapted to the Nomad execution environment provided by SRE, pending final deployment.

Performance

The AWS cluster infrastructure necessary for SECP benchmarking is still being worked on.

Tracing

The improved tracing internals were implemented, and we're now into the phase of updating the tracing integration, which is also mostly done.

Infrastructure

Thanks to collaboration with the DevX team, we have identified and pursued a design that would enable our Nomad workbench backend to execute deployments of both the benchmarking cluster and our data publishing components.

On the benchmark parametrisation front, we have eliminated a long-standing weakness in the way we were specifying the protocol parameters. We now have a very clear and granular method to keep track of protocol parameter evolution -- e.g. the mainnet history changes are now tracked at epoch granularity, while also allowing for systematically described change overlays. This makes the benchmark profile definition much more clear and robust against mistakes.

We also started a merge of the legacy benchmarking environment (based on cardano-ops) into the workbench. The separation between environments was too costly, causing us to reimplement any benchmarking change twice -- first, during development, in the workbench, then in cardano-ops. In addition, maintenance of compatibility code was incurring additional costs, slowing benchmark data analysis development. Once this merge is complete, this will allow us to sharply cut the benchmark development cycle and overheads.

Performance & tracing update

December 14, 2022 · 4 min read

Serge Kosyrev

Performance and Tracing Team Lead

High level summary

SECP benchmarking enablement was completed: we are now able to do local runs of the SECP workloads. The next step is to port this to the AWS environment.
A new workstream for Plutus cost modeling improvement: we've planned and started implementing the smart contract call overhead measurement machinery.
The new tracing system: after doing more benchmarking to address inter-run variance, we discovered that the regression, while still there, is small enough not to be release critical. Nevertheless, we're continuing with the further performance-oriented rework of the internals.
Infrastructure: a significant refactoring of the workbench internals was merged. We also started improving the denotation for ever-evolving protocol parameters. Comparative analysis of multi-run batches implementation started.
Open sourcing: our plans matured sufficiently so that we now expect actual deployment work to start this week.

Performance

The SECP benchmarking workload has been fully implemented in the workbench. We are now porting it over to AWS, and after that we'll be running the model cluster workload.

We've also started implementing mechanics for the upcoming investigation of the Plutus smart contract call overhead, which is expected to lead us to improved Plutus cost modeling.

Tracing

After the initial model-scale performance data caused us to panic, among other things we've done more benchmarks, and it turned out that inter-run variance increase was the culprit. The actual regression averages to barely noticeable 1-2% in key metrics -- which is certainly not release critical.

To understand the impact of the new tracing system, we have to bear in mind the extra functionality it provides:

We are now processing all messages generated by the system, without making any shortcuts that the old system had to resort to. That causes the new tracing to do more work, but is more useful for all users and developers involved -- since it leads to a simple, non-confusing configuration. Incidentally, that's also the area where we are reworking the internals, to deduce and enable the optimisations that are implied by the particular configuration.
The new tracing system is benchmarked with remote tracing as the default backend (whereas the old one was using local, builtin log storage mechanism). In some sense it's the fair benchmark, because that's the way we expect SPO's to set up tracing. That, however also causes it to do more work.

All that said, since we've established the performance of the new system to be adequate for the release, we won't be delaying it much further.

In addition, we're still pursuing our performance-enhancing rework of the new tracing internals.

Infrastructure

After implementing the multi-backend capability in the workbench, we got the opportunity to reassess the generic/backend boundaries and perform some long-awaited cleanups and simplifications in that area. The results of this work have been merged and will serve as a solid foundation for the CI and cloud backends.

Moving to analysis, we've also improved provenance of the raw data, by collecting more identification information and statistics about it. This means, e.g. that we now record checksums, message frequencies and timestamps from the log files coming into analysis. This will be used to enable us to see more data anomalies earlier, and lift that information directly into the generated reports.

A new feature is now under implementation -- the ability to provide comparative analysis of multi-run batches. Previously we only had automation for two aspects separately, so we only could either:

compare individual runs (used for different node configurations / versions)
collect variance statistics from a batch of runs (used to enhance statistical confidence for a single node configuration / version) Naturally, combining these two capabilities was a long-desired feature of our analysis pipeline.

Performance

1.35.6 release​

Analysis pipeline​

Open Sourcing​

Tracing

Infrastructure

Nomad backend​

Performance

SECP​

UTxO-HD​

1.35.6 release​

Tracing

Infrastructure

Nomad backend​

NixOps backend​

High level summary​

Performance​

Tracing​

Infrastructure​

High level summary​

Performance​

Tracing​

Infrastructure​

High level summary​

Performance​

Tracing​

Infrastructure​

1.35.6 release

Analysis pipeline

Open Sourcing

Nomad backend

SECP

UTxO-HD

1.35.6 release

Nomad backend

NixOps backend

High level summary

Performance

Tracing

Infrastructure

High level summary

Performance

Tracing

Infrastructure

High level summary

Performance

Tracing

Infrastructure