It is a known fact that every new node, to comply with Bitcoin blockchain protocols, as a public blockchain node, has to obtain and verify all blockchain data, both the recent and the historic data. Which has taken a toll on the scalability feasibility for years now.
When a node first joins the Bitcoin network, it needs to obtain its individual view on Bitcoin’s current state of consensus, i.e., the UTXO set resulting from the blockchain path containing most PoW. To keep this process fully decentralized and independent from trusted nodes, each node initially establishes eight outgoing connections to random established nodes, called neighbors, and downloads the complete blockchain from them.
Due to the separation of headers and transactions, nodes first fetch the headerchain, i.e., chain of block headers, and simultaneously request full blocks, i.e., the corresponding transactions. While receiving the data, the joining node verifies its correctness by:
- Verifying the blockchain’s cryptographic links back to the hard-coded genesis block,
- Keeping track of the amount of performed PoW to remain on the currently valid blockchain.
- Validating transaction sets tied to each block.
- By checking the correctness of transactions and replaying them to obtain an up-to-date UTXO set.
Even though this information is sufficient to process newly mined blocks, nodes keep a full copy of the blockchain by default.
What if there is a snapshot methodology that takes a recent snapshot of the blockchain’ state, and is made available for newly joining nodes? And, these snapshots can be validated via distributed manner.
A recent study shows that via pruning — removing non-critical blockchain information from storage, full nodes and miners can reduce disk space utilisation by 86% , yet help newly joining nodes synchronize at a very faster rate.
The snapshots also help in reducing the network traffic generated by joining nodes by 93%, with synchronisation time drop from 4h to 46 minutes on high end computing devices.
Pruning Could Address The Following:
- Addressing the huge storage requirement: The costs involved in storing hundreds of GB of historical blockchain data as an individual node operators and constraints on devices.
- Bandwidth requirements: Newly joining nodes are subject to have good internet connectivity both joining the network and serving as a node. In addition to this existing nodes to support synchronisation with new nodes consumes more bandwidth.
- Processing Costs. In addition to downloading the blockchain, joining nodes also need to verify the blockchain’s integrity and locally replay every single transaction to build the UTXO set.
- Synchronization Time. The combination of high band- width requirements and high processing costs cause pro- longed synchronization times. While benchmarks using powerful clients report about 5h in 2018 , literature already highlighted this issue in 2016 when four days were required to synchronize Amazon EC2 nodes. Naturally, this problem aggravates over time as new blocks are added continuously.