On the future of validity rollups on Tezos

Validity rollups currently suffer from a “scaling trilemma” that calls for a strategic shift in how they are integrated into Tezos. Instead of offering both optimistic and validity rollups, they will be combined in a single product.

Originally published at Nomadic Labs blog


2,000 words, 10 minute read

A faster and more scalable Tezos A sneak peak at the Mumbai proposal image 1

TL;DR: Validity rollups currently suffer from a “scaling trilemma” that calls for a strategic shift in how they are integrated into Tezos. Instead of offering both optimistic and validity rollups, we will combine them in a single product**.**

Validity rollups (aka. zk-rollups) are all the rage, and we would like to update the community on our work to bring this technology to Tezos.

As you may know from previous communications, our implementation is referred to as Epoxy, and an early version is enabled on Mondaynet. In order to test the system we have also developed epoxy-tx, a transactional rollup capable of handling Tezos’ tickets.

It is the result of two years of R&D by our cryptography team and has given us great insights into the usefulness and applicability of Zero Knowledge Proofs, but also into the challenges and limitations involved. This has led us to draw some conclusions about this technology that may be surprising for some.

In short, we believe that validity technology won’t achieve general compatibility and high throughput at a reasonable cost for at least a few years. Not just on Tezos, but in general.

In this blog post we lay out our perspective on the current state of validity rollups, based on our research and conclusions. And we present an exciting strategic shift in how we aim to integrate this technology into Tezos in a way that counterbalances its intrinsic limitations.

Optimistic rollups vs. validity rollups #

The scaling roadmap for Tezos, published in early 2022, focused on two rollup technologies: optimistic rollups and validity rollups.

The Tezos variant of optimistic rollups, Smart Rollups, launched with the Mumbai upgrade. With optimistic rollups, the work of rollup operators is treated as honest by default, hence the “optimistic” element. However, if there is foul play, an operator’s fraud can be refuted by another operator by posting a proof demonstrating the wrong-doing within a two-week period.

Because proofs are only produced in case of a dispute, hardware requirements for rollup operators are moderate even with high throughput and complex operations. You can run any virtual machine - on Tezos, Smart Rollups offer a WASM execution environment. And it takes just one honest operator to ensure the integrity of the rollup.

The main drawback is the two-week dispute period. Until this period has passed, transactions in the rollup can’t be considered final, and when withdrawing assets from the rollup to Layer 1, assets are only released after expiration of the dispute period. Additionally, all incoming transaction data needs to be kept publicly available for verification during this period, though solutions for this are underway.

Validity rollups work differently. A small proof is posted with every commitment guaranteeing the correctness of the operator’s work. The proof is small, lightweight and can be easily verified by anyone. Hence, a validity rollup can be run by a single operator without concerns about the honesty about that operator’s work. Foul play is automatically rejected by the protocol itself, and no other actor needs to keep track of the rollup to guarantee security.

Additionally, the technology enables the state and operations of the rollup to (optionally) be completely hidden from the main chain. They can even be hidden from the users of the service, e.g. each user knows their balance and transactions but not anyone else’s. Only rollup operators have full access to the data of the rollup in order to produce the proof.

Third, providing a proof with every commitment means that validity rollups have instant finality. This greatly simplifies the security analysis and the implementation of applications when compared with optimistic rollups.

So, is this the solution that solves everything? Unfortunately not.

The challenges with validity rollups #

Validity rollups currently have their own significant drawbacks - some of which we believe deserve a bit more attention in the general promotion of them as a scaling solution.

These drawbacks are a couple of interconnected challenges that we, and everyone else working on this technology, are currently faced with.

Challenge #1: Proofs are very expensive. Creating SNARK proofs - which must be done with every commitment - requires a considerable amount of computing power. This makes running validity rollups very expensive and puts a limit on the complexity of operations that can be expressed (see challenge #2). For example, support of standard cryptography, such as ed25519 signatures (used for ‘tz1’ accounts) and Blake2b hashes, is currently difficult to achieve at high throughput. New cryptographic primitives for use in validity rollups are being developed, but this again requires that existing infrastructure is adapted.

In our efforts to address this challenge, we have developed aPlonK, an advanced proving system tailored to Tezos, which uses a novel technique for efficient proof aggregation. Essentially, it reduces the proof size and verification time when multiple statements are proven in a batch. It contains a language to describe circuits, and a prover that enables proof-generation to be distributed over a cluster of machines to achieve, in theory, unlimited scalability. The limiting factor being the data-center bill.

But regardless of computing power (and funds), even our best prover, fully optimized and parallelized, cannot go faster than an optimistic rollup operator, which by default does not produce any proof, but simply executes the program and posts a commitment. As long as there is at least one honest participant continuously validating rollup activity, security is ensured and in a much more cost-efficient way than using SNARK proofs.

Challenge #2: Limited compatibility. The ability to execute arbitrary code is a crucial feature for a Layer 2 solution. This not only enables compatibility with existing smart contracts programmed for blockchain ecosystems (such as EVM and Michelson), but also enables the Layer 2 solution to become a distributed backend for more conventional development environments.

However, validity rollups rely on so-called Succinct Non-interactive Arguments of Knowledge (SNARKs) for their proofs. For this to work, all statements must be translated into circuits - a set of mathematical equations that the proving systems can process. In effect, every smart contract or dApp must be ported into what is essentially another programming language.

This also brings us back to challenge #1: current proving technology doesn’t support sufficiently advanced circuits to interpret the execution of existing smart contracts directly, if cost of computation is to be kept reasonable. Hence, general compatibility is currently prohibitively expensive at the throughput necessary to make validity rollups relevant as scaling solutions. Again, this is true not just for Epoxy, but for validity tech in general.

Challenge #3: Fragmentation in tooling. Due to the use of circuits for the proofs, a whole new stack of tooling, SDKs, wallet integration and other kinds of infrastructure must be created. While this in itself may not be the biggest challenge, having parallel stacks for Layer 1, optimistic rollups and validity rollups introduces a level of fragmentation which we believe will become problematic. We believe there are better ways, which we will go into further below.

A (current) validity trilemma #

The above can be illustrated as a “trilemma” of validity rollups. The three desired properties of validity rollups are

Smart Rollups give you all three properties - at the cost of longer finality - while the current state of validity rollup tech allows you to only pick two.

If you want high compatibility and reasonable cost, throughput will be too low for practical purposes. If you go for high compatibility and high throughput, it will become incredibly expensive to run a rollup - we’re talking massive data centers. And if you prioritize reasonable cost of operation and high throughput, the complexity of operations that can be processed will be severely limited. For example, epoxy-tx, our rollup for Tezos tickets transactions, has high throughput and low cost relative to other validity designs, but is limited to, well, transactions.

Of course, a lot of work is currently going into reducing the required computational resources. However, our own experiments and extensive review of current research into this by various projects in the industry lead us to conclude that this trilemma will remain relevant for at least a couple of years - possibly longer.

A brief overview of currently available validity rollups and their approach:

So, what about Epoxy? #

The question is then: what does this conclusion mean for our work?

To answer that question, it is important to make clear what we have built. We have referred to our exploration into validity rollups on Tezos as Epoxy, but it actually consists of two parts: a prover and a connecting framework.

Our prover is called aPlonk and includes a language for describing circuits. The prover is the ‘engine’ of a validity rollup and by far the most important element. And it’s the part we have spent most resources developing. What we call Epoxy is the framework, the glue, that connects this proving system to the Tezos blockchain.

Based on our conclusions presented above, we have decided to go a different route than launching Epoxy as a product competing with Smart Rollups. Not because we don’t believe in a bright future for validity rollups - far from it. In fact, we are excited to be able to give a sneak peek into a strategic shift that we believe will benefit everyone using rollups on Tezos.

We are essentially taking the Epoxy prototype apart and re-purposing the engine and other parts in what we believe is a revolutionary new product.

The hybrid approach #

In Smart Rollups, we already have a high-compatibility and high-throughput solution at a low cost, but with longer finality. Tooling is developing rapidly and soon vast Smart Rollup-based infrastructure will be built out on Tezos.

Launching validity rollups as a product competing with Smart Rollups, but with different tooling and the above mentioned trade-offs, is not the best approach.

What is the better approach? Upgrading Smart Rollups with validity tech! Think instant finality for a higher fee. Or standard transactions with short finality, and longer finality for more complex operations. Or new confidentiality features.

This ‘hybrid’ approach has several advantages

Make no mistake: In the long term, we see a bright future for validity rollups. But our analysis tells us that the technology just isn’t there yet for them to be competitive with optimistic rollups. Again, this is not specific for Epoxy, but for validity rollups in general. We believe that a gradual implementation, resting on the solid foundation of Smart Rollups, is the right solution for Tezos for the years to come.

The hybrid design for Smart Rollups is ongoing R&D, but we will soon be able to release more details. We look forward to embarking on this exciting journey in cooperation with the Tezos ecosystem and community!