Skip to main content

49 posts tagged with "performance-tracing"

View All Tags

· 3 min read
Michael Karg
  • Benchmarking: We performed benchmarks for the new tracing system, and started benchmarking for varying GHC RTS configurations.
  • New tracing: Backwards compatibility with legacy tracer nomenclature has been merged; we're currently improving documentation and creating setup guidelines for end users.
  • Analysis pipeline: Our refined metrics PR has been merged. We're working on including variance analysis to our reporting machinery.
  • Infrastructure: Support for Conway genesis in our workbench has been merged. At the moment, we're laying the groundwork for enabling GHC 9.2 in our benchmarks.
  • Open Sourcing: The API demo has reached prototype phase; work on documenting the API and providing exemplifying use cases is ongoing.
  • Nomad backend: The nomad-exec based task driver has been merged. The backend has been equipped with the capability for genesis distribution via S3 bucket.

Performance

New tracing

The new tracing system has undergone various benchmarking runs with variance analysis, and comparison to a baseline using legacy tracing. We could observe a slight shift in the resource usage profile from memory to CPU, but no regressions in block propagation metrics. Variance was observed to be notably smaller, which gives the new system a much better predictability. From this angle, we consider the new system fit for production use.

GHC RTS parametrization

We're currently prerforming various runs on the cluster to explore the space of different GHC RTS settings for running nodes. The main focus lies on different configurations for the garbage collector, as well as increasing the number of CPU cores the node may use.

Open Sourcing

Our API demo has reached prototype stage, and operates on live data from the production database. Making use of the experience gained, we're refining version 1 of the API to provide optimized usability, and creating documentation that both is descriptive of the API endpoints, and focuses on practical, exemplary use cases.

Tracing

For the new tracing system we're currently undertaking an effort to multi-layered documentation: a condensed version, as well as a setup guide with pragmatical focus, will be provided alongside the in-depth documentation. This effort should cater to different audiences, and provide distinct entry points for users of the new system, depending on their wants and needs.

Infrastructure & Analysis

General

Having included Conway genesis in the workbench, as a next step in future-proofing out benchmarking infrastructure, we're laying the foundation for a switch in compiler version to GHC 9.2. Additionally, we considered variance analysis of our runs to merit inclusion into our reporting pipeling - which will increase confidence in specific metrics.

Nomad backend

We have implemented an appropriate mechanism for genesis distribution: Only after a benchmarking cluster has been deployed successfully, genesis is patched and uploaded to an AWS S3 bucket for the nodes to retrieve - as a final step before initiating the actual run. We're confident that this deferred approach will provide clearer evidence for genesis patches, as well as minimize startup time for all runs by factoring in deployment re-tries.

· 2 min read
Michael Karg
  • Release benchmarking: We again performed benchmarks for the next 1.35.6 release candidate.
  • New tracing: Backwards compatibility with legacy tracer nomenclature is being implemented to smoothe the transition for end users.
  • Analysis pipeline: A major refinement of benchmarking metrics has been realized, along with a structural improvementents regarding metrics denomination.
  • Open Sourcing: Work on going live with our benchmarking data has begun, as well as creating an API demo and documentation.
  • Nomad backend: The backend was adapted to a major refactoring in workbench and is being equipped with a nomad-exec based task driver.

Performance

1.35.6 release

Benchmarking the second release candidate for 1.35.6 could again attest to a perfectly clean bill of health.

Analysis pipeline

Our analysis pipeline has seen an introduction of additional metrics, especially when focusing on the block producing node. They allow us to better differentiate the timing of ledger ticking and mempool snapshotting in the forging loop - a feature that promises much deeper insight into UTxO-HD performance. Additionally, a restructuring of metrics names has been undertaken along with improvements in their data dictionary; a measure that will make benchmarking data more easily accessible.

Open Sourcing

As a prerequisite for going live with our benchmarking data, we're currently working on consolidation of existing analyses, such as to provide a common foundation when accessing them externally. Additionally, we've begun working on a small visualization demo and interactive API documentation. Those will enable third parties to make use of that data much more easily, by having reliable guidelines and a working example.

Tracing

The new tracing system is being outfitted with a comprehensive mapping of its structure to the legacy tracer nomenclature. This feature will make the switch to the new system as smooth as possible for end users, allowing them to gradually adapt their tooling without breaking any functionality in the process.

Infrastructure

Nomad backend

The Nomad backend was adapted to the latest major refactoring in workbench. Work was done on making stateful Nomad clients more autonomous, which will greatly facilitate any automation building on that backend. A task driver based on nomad-exec is currently being implemented.

· 3 min read
Michael Karg
  • SECP benchmarking: we concluded our benchmarking runs and analyses of the new SECP primitives for the Valentine hard-fork.
  • Release benchmarking: we performed a round of benchmarks for the 1.35.6 release.
  • UTxO-HD benchmarking: we performed first runs for UTxO-HD and are currently refining the benchmarking setup.
  • New tracing: for better accessibility, the new tracing system is being outfitted with introspective capabilities.
  • Infrastructure: with the Nomad cloud workbench backend we were able to perform our first test cluster runs successfully on SRE infrastructure.
  • Infrastructure: the initial NixOps workbench backend has been completed; a PR containing this work, along with many quality-of-life improvements of our tooling, got merged.

Performance

SECP

  1. For SECP, we settled on a fixed tx count per block, while simultaneously spending as much as possible of the block budget. Thus we were able to minimize the impact of per-SC-call overhead.
  2. The final runs were performed with various fractions, e.g. half, of the current block budget to ascertain how these workloads would fare compared to a value-only run.
  3. The SECP machinery and profiles are currently being generalized into an approach to aim for very specific aspects of a smart contract for benchmarking.

UTxO-HD

  1. After analyzing initial UTxO-HD runs, it turned out that mempool snapshotting had to be throttled for benchmarking; it affects a lock that UTxO-HD had to introduce into the forging loop.
  2. We're currently adapting the benchmark setup to that, and will then perform a new combination of baseline and UTxO-HD runs.

1.35.6 release

Benchmarking the 1.35.6 release candidate could attest to a perfectly clean bill of health.

Tracing

Work on the new tracing system's introspective capabilites is ongoing: Immediate use cases of the new API include being able to statically validate generated tracer documentation, as well as providing information of a specific tracing setup in the node via traces themselves. These features will make the new system both more robust, and more accessible.

Infrastructure

Nomad backend

  1. Work on the cloud deployment capability of the Nomad workbench backend continued; for testing we can automate multiple Nomad clients.
  2. Locality assumptions were removed and job monitoring was refactored.
  3. To facilitate directly-executable derivations, Nomad Job specification files are now self contained with GitHub references and configs needed to run a cluster.
  4. We're currently evaluating different options for genesis distribution in said cluster.

NixOps backend

The NixOps workbench backend has reached an initial functional stage. Consequently, the relevant PR was merged. It also contained many improvements to our analysis tooling, as well as a structural overhaul of workbench itself. We consider this an important step of future-proofing our benchmarking machinery.

· 3 min read
Serge Kosyrev

High level summary

  1. SECP benchmarking: we ran several rounds of SECP benchmarks, refining the benchmark setup as we discovered the properties of the system. After formulating an initial suggested change to the protocol parameters, we're currently running what we consider the final benchmark, to validate the underlying assumptions.
  2. Release benchmarking: we've performed a round of benchmarks for the hotfix 1.35 release update and initiated the 1.35.6 benchmarks.
  3. New tracing: the improvement in the tracing API, with the underlying restructuring, was completed and merged into the node.
  4. New tracing: before going live, we're performing the documentation update, as well as reworking the end user migration guide.
  5. Open sourcing: the benchmarking data publishing has been completed and deployed. After populating it with relevant benchmark data and providing basic user documentation we can go live.
  6. Infrastructure: the cloud workbench backend is progressing well, the networking aspects of multi-region deployment are currently being worked on.
  7. Infrastructure: the NixOps workbench backend is still being worked on, as part of migration from cardano-ops and benchmarking infrastructure unification.

Performance

We are approaching the end of a chain of SECP benchmarks, as we gradually eliminated deficiencies in the setup as we were discovering them and answering newly appearing questions:

  • we improved the tx/block filling strategy in the generator, to maximise the per-block utilisation of resources and so better approximate the worst-case,
  • after a discovery of what looked like significant per-SC-call overhead, we again tweaked the the tx/block filling strategy,
  • finally, we're redoing all benchmarks together with a value-only run against the backdrop of Mainnet-sized datasets, to balance the suggested adjustment. That also ran into difficulties wrt. limitations of our benchmarking hardware.

In addition, we started benchmarks of the 1.35.6 release.

Tracing

A rework of the new tracing system's internals and API was merged. It extended the system with introspection, which enabled a range of improvements, some of which were implemented along the way.

Specifically, we were able to completely short-cut processing of messages generated by the tracers that were made provably ineffective by current tracing configuration. Further, now ongoing work enabled by the introspection facilities, includes static validation of documentation and enhanced node state reporting.

Infrastructure

On the opensourcing/transparency front, the benchmark data publishing machinery was finally fully assembled and put online. As resources permit, we'll work on populating it with benchmarking data, preparing basic documentation and engaging the stakeholders.

The work on the cloud deployment capability of the Nomad workbench backend continued with focus on setting up inter-node networking and removal of locality assumptions. A major step besides those, was completion of a switch-over to the directly-executable derivations, which eliminate the need for creation and distribution of images -- thereby increasing the speed of deployment.

The Nixops workbench backend progressed steadily, reaching minimal deployment capability. The remaining parts are proper shared configuration generation, and porting of the run control functionality from cardano-ops.

· 2 min read
Serge Kosyrev

High level summary

Since our last update, we focused on infrastructure work: benchmark enablement, tracing system, benchmark environment merge and open source support:

  1. SECP benchmarking enablement is underway: enabling SECP runs in our cardano-ops benchmarking environment is still in progress.
  2. The new tracing system: the improved API of the new tracing system was implemented, and we're now porting the tracing integration layer over.
  3. Infrastructure: the mainnet protocol parameter history is now encoded in the workbench profile machinery at epoch-level granularity, which gives us a systematic approach towards description of past and future benchmarks.
  4. New benchmark deployment infrastructure: we've made some progress on Nomad deployment backend, shared by both of the data publishing and benchmarking needs.
  5. Legacy benchmarking: we've started merging the legacy benchmark deployment infrastructure into the workbench.
  6. Open sourcing: the benchmarking data publishing tool was adapted to the Nomad execution environment provided by SRE, pending final deployment.

Performance

The AWS cluster infrastructure necessary for SECP benchmarking is still being worked on.

Tracing

The improved tracing internals were implemented, and we're now into the phase of updating the tracing integration, which is also mostly done.

Infrastructure

Thanks to collaboration with the DevX team, we have identified and pursued a design that would enable our Nomad workbench backend to execute deployments of both the benchmarking cluster and our data publishing components.

On the benchmark parametrisation front, we have eliminated a long-standing weakness in the way we were specifying the protocol parameters. We now have a very clear and granular method to keep track of protocol parameter evolution -- e.g. the mainnet history changes are now tracked at epoch granularity, while also allowing for systematically described change overlays. This makes the benchmark profile definition much more clear and robust against mistakes.

We also started a merge of the legacy benchmarking environment (based on cardano-ops) into the workbench. The separation between environments was too costly, causing us to reimplement any benchmarking change twice -- first, during development, in the workbench, then in cardano-ops. In addition, maintenance of compatibility code was incurring additional costs, slowing benchmark data analysis development. Once this merge is complete, this will allow us to sharply cut the benchmark development cycle and overheads.