How does Celestia solve the data availability problem? 🧵
The data availability problem is concerned with whether data in the most recent block is hidden or available. This is important because by hiding block data, rollup operators can commit fraud or steal funds.
To prevent such an attack, the complete block data must be available for anyone to download.
A naive way to verify data availability is for full nodes to download the entire block, but this introduces high requirements for full nodes and limits the scalability of the network.
Celestia uses a clever scheme that allows light clients to verify data availability without needing to download the entire block.
This means that you can verify data availability with a small computer – in fact you can even do it on your smartphone! 🤯
twitter.com/musalbas/status/1480901457633239048?s=20
How? Data availability sampling.
Instead of downloading the entire block, Celestia light nodes just download small random samples of data from the block.
If all the samples available, then this serves as proof that the entire block is available.
If they aren’t able to download a sample then the block is unavailable.
Erasure coding is a cryptographic tool fundamental to data availability sampling, which helps to increase the difficulty that a block can be made unavailable.
This is done by replicating the data, which expands the block’s size.
For example, if a block contains 50% original data and 50% replica data, more than 50% of the block’s data would need to be withheld to make data unavailable...
as the original data can be recovered with only 50% of the block.
With data availability sampling and erasure coding, minimal hardware is required for light nodes to contribute to both the security and throughput of the chain.