Arthur Breitman on 'Data Availability: Why You Need It and How to Get It'

Tezos early architect and co-founder, Arthur Breitman, discusses data availability – what it is, why it’s important, and the different ways to achieve it.


1,700 words, 9 minute read

For any blockchain network, data availability is a critical issue for a variety of reasons. In his video presentation ‘Data Availability: Why You Need It and How to Get it’, Tezos early architect and co-founder, Arthur Breitman, explains what data availability is and why it’s so important; what some of the difficulties in achieving data availability in a decentralized way are; and finally, he touches on some of the different ways to achieve data availability.

Specifically, the concept of data availability and its importance is explored with respect to layer 2 architecture recently implemented on Tezos, known as ‘rollups.’ In the course of explaining what data availability is, Breitman highlights its importance by touching on the fundamental relationship between data availability and different kinds of layer 2 rollups, sketching along the way certain scenarios where a failure of data availability can lead to undesirable consequences. There are different ways to achieve data availability in a decentralized, scalable way, and in particular, Breitman discusses data availability committees and data availability sampling.

Below you’ll find some extracts from Arthur Breitman’s presentation, and you can watch the full video at the top of this page.

Introducing data availability:

Data availability. If you’ve been reading about rollups, it’s probably something you’ve come across, but it’s not always clear what it is.

In the context of blockchains, making data available is one of several important functions performed by block producers and miners:

If you take a traditional blockchain like Bitcoin for example, you’ll find that the miners and validators do several important functions, one of which is ordering transactions, and that’s really key for ensuring that there’s no double spend […] the other one is executing transactions, and that’s mostly in optimization - its just convenience […] and the third thing that they do, which is super important, is making the data available.

For some, it may sound strange that one of the crucial functions of a validator is to make data available. In theory, block producers could agree to post all and only the unique thumbprints (hashes) of valid transactions, and we could then verify if a given transaction were included by checking to see if the thumbprint were included. But in that case, others wouldn’t yet know anything about the content of the transaction. This poses a problem because **‘**if you were to do that, other people wouldn’t be able to validate the chain.’

In other words, it’s essential to the concept of validation that data be made available, such as the different balances of user accounts. This speaks to one of the most popular catch phrases in crypto: don’t trust, verify.

As an honest party, you want to be able to look at the blocks and verify for yourself that it is valid. You want to also be able to make transactions - if you don’t know the state of the chain, if you don’t know what are the different balances, you can’t really do that. So it’s a bit abstract, but if you start looking into rollups, the concept becomes a lot more obvious.

How optimistic rollups work:

Through its native on-chain governance process, the Tezos blockchain recently implemented layer 2 architecture that enables so-called ‘rollups.’ Rollups help to scale and increase transaction throughput, lower fees, and overall improve user experience in certain respects. There are different kinds of rollups, including optimistic and validity rollups.

The way a rollup works usually, is you put in a transaction on a layer 1, but it’s not going to be executed, at least not by the validators, not by the block producers. It’s going to be executed by different parties, those who are rollup node operators […] They’re going to look at these transactions, they’re going to monitor the blockchain […] And they execute them on a separate state called the rollup states. And then depending on the type of rollups, they will make assertions about it. They will go to the main chain and say, “I assert that this is the current state of the rollup.” And the real reason they do this assertion is so that you can transfer assets from the blockchain to the rollup and back and forth.

In the case of an optimistic rollup, anyone can post any commitment. You can come in and say, “Oh yeah, sure. That’s the state of the rollup.” But if you’re lying, someone can come and prove that you’re lying […] So long as you have one honest rollup validator who can come in and show that people are trying to cheat, then the cheaters will lose money and the rollup will proceed.

Validity/ZK rollups:

There are other kinds of rollups called ‘validity rollups’. As opposed to the mechanism of verification in optimistic rollups, ‘in validity rollups, sometimes called ZK rollups, it’s different - you don’t even rely on one honest party, the assertions have to carry cryptographic proof that they’re correct.’

On this point, think back to what was said above about the possibility of validators posting all and only the unique hashes/thumbprints of transactions - in that case, perhaps ‘when you make your proof on a chain, you could prove that, yes, look here is a transaction and that’s a state. The problem is what if only design sectors know about the transaction and they don’t tell anyone else about it?’

Why is data availability important?

So you could have a valid commitment, but no one would be able to prove that it’s valid aside from the people who have the transaction, or worse, in the case of an optimistic rollup, you could have an invalid commitment, but no one can prove it’s invalid because they’re missing some of the inputs, they’re missing some of the data. So in an optimistic rollup, if you had a failure of data availability, what it means is someone can ‘pwn’ the rollup. You can extract everything from the rollup, you can prove anything that’s false because no one will prove you wrong. So data availability is super important.

In the case of a validity rollup, it’s a little better but it’s not a lot better. […] If the data isn’t available, they can’t extract directly funds from the rollups, they can’t make invalid statements because the statements are proven with cryptography. However, they can lock out everyone. They can say, “Look, you’ll never get your assets out of the rollup unless you pay us X much.” So you can hold funds hostage if data availability is compromised.

The Gold Standard - Data availability sampling

There are certain difficulties with achieving data availability in scaleable, decentralized ways. But there are ways by which it can be done:

The gold standard for doing data availability in a scalable way, is data availability sampling […] And the way data availability sampling works is instead of every validator downloading everything in every block, the blocks will contain a commitment to the data, but the data will be downloaded by many different people. So people will download a random sample of the data.

The idea is that everyone downloads a little bit of data so as to approximate with high probability that everything checks out. On the face of it, this may seem to be a watertight solution, however ‘the problem is you don’t know that all the data is there, maybe a tiny little bit is missing.’

One solution is to ‘turn the data into an error correcting code. So you take the data, you pair it with more information that lets you detect and repair errors. And if you do that, then you go from a position where you need 100% percent of the data to be available to everyone […] to a situation where you only need 50% of the data to be available […] If people download random samples of the data and they always get it, then with very high probability 50% of it is available.’

Data availability committees: security tradeoffs

So when it comes to scaling data availability in a secure and decentralized way, data sampling is one method. But there is another ‘intermediate solution’ - data availability committees.

The concept is that instead of putting your data on the L-1, which is slow because it has all these different validators, you’re going to go to a few big data providers and you’ll put your data over there. And maybe also they’ll just focus on your rollup, so it will give you some form of horizontal scaling.

So here the idea is like: you’ll get a multi-signature maybe from a committee and the committee will tell you, “Oh yeah. Yeah, yeah we’ve seen the data. It’s all there. It’s all available. You don’t need to worry about it.” And if your committee is a honest, then it’s fine because honest rollup participants will be able to go to them to get the data, and then to include it in proofs if they need to. But what if it’s dishonest?

One way to mitigate this concern would be to simply set a very high standard for what counts as available data, such as 100%. In that case, ‘all it takes is one honest member, and you will not have unavailable data, which is pretty reasonable property. Of course, the flip side is that it takes only one dishonest member to cause a big headache, because a dishonest member could come and say, ‘Oh yeah, data is not available.’’ Luckily, however, one dishonest member of the committee can’t actually steal money, they just create a headache by making everyone withdraw money from the rollup and move it to another.

In conclusion:

Data availability is a crucial issue for blockchain networks. This is especially true in the case of Tezos and its new layer 2 rollup infrastructure. There are certain difficulties with achieving data availability in a scalable and decentralized way, but data availability sampling and data availability committees provide at least two ways to do it.