Sync & The Merge
**Warning!** This is a work in progress. After initial review, it seems that the sync scheme presented here will not work without modifications. See end of document for known issues and potential solutions. For now, you should read this as a description of the ideal sync algorithm, keeping in mind that it will become more complicated later.
In this document, we (the geth team) present our ideas for implementing chain synchronization on the merged eth1 + eth2 chain. After the merge event, eth1 and eth2 clients run in tandem. The eth2 client maintains the connection to the beacon chain and performs fork choice. The eth1 client, a.k.a. the 'execution layer', receives block data from the eth2 client, executes/verifies it and maintains the application state.
The interface that eth2 and eth1 use to communicate is uni-directional: all cross-client communication is initiated by eth2, and happens in the form of requests. Eth1 responds to requests, but cannot request any information from eth2.
In the text below, we refer to beacon chain blocks as bx. We also assume that the beacon chain begins at block bW, a recent checkpoint, which must be a block after the merge event. There is a direct correspondence between beacon chain blocks and block data of the execution layer: for every beacon block bx (for x >= w), a corresponding execution-layer block Bx also exists. Additionally, every execution-layer block Bx contains its block header Hx.
Please note that this document is an abstract description of the sync algorithm and isn't concerned with the real APIs that eth1 and eth2 nodes will use to communicate. We assume that eth2 can invoke the following operations in the eth1 client:
- checkpoint(H): notifies the eth1 client about a checkpoint header. This has no useful response.
- final(B): submits a finalized block. The eth1 client can answer 'old', 'syncing', invalid(B) or synced(B). Note that we assume this will be called for all finalized blocks, not just on epoch boundaries.
- proc(B): submits a non-finalized block for EVM processing. The eth1 client can respond with 'valid', 'invalid' or 'syncing'.
In diagrams, not all responses to eth2 requests are shown.
This section explains the sync procedure from the eth2 client point-of-view.
When the eth2 client starts, it is initialized with a 'weak subjectivity checkpoint' containing the beacon chain state of a historical block bW. The checkpoint also contains the execution-layer block header HW. On startup, HW is immediately relayed to the eth1 client (1).
To sync, the eth2 client must first process the beacon chain optimistically—without accessing application state—up to the latest finalized block bF (2). When block bF is reached, the eth2 client starts eth1 sync by providing the execution-layer block BF to the eth1 client (3).
The eth2 client keeps following the beacon chain until eth1 sync completes, and keeps submitting finalized blocks to the eth1 client. This means it should repeat step (3) for every new finalized block.
Eth1 sync will usually take quite a bit of time to complete. While it is syncing, the beacon chain advances by t blocks to the latest finalized block bF+t.
The eth1 client signals that it is done by responding with synced(BF+t) (4). The application state of BF+t is now available and the eth2 client can perform additional cross-validation against this state. For example, it could read the deposit contract here.
The eth2 client should now submit the execution-layer block data of all non-finalized beacon blocks to the eth1 client for processing (5). The sync procedure completes when the current head block bH is reached.
Upon startup, the eth1 client first waits for a checkpoint header HW from the eth2 client. HW must be a descendant of the genesis block BG.
Sync begins when the finalized block BF is received. This block is assumed to be valid. Furthermore, it is assumed that BF is a descendant of BW.
While the chain is downloading/processing, the eth1 client receives further notifications about newly-finalized blocks in range BF+1…BF+t. During sync, at latest finalized block Bf, clients must handle final(Bx) as follows:
- for x <= f, the response is 'old' if the block is known, or invalid(Bx) if the block is unknown.
- for x > f+1, attempting to finalize an unknown future block, sync is restarted on Bx and the response is 'syncing'.
- for x == f+1, the block is appended to the database. If the client is still busy syncing to Bf, the response is 'syncing'. If the client is done syncing to block Bf, it processes block Bx and outputs synced(Bx) or invalid(Bx).
When proc() is received during sync, the response is 'syncing'.
After starting sync on BF (1), the eth1 client first downloads the chain of block headers down from HF, following parent hashes (2). Headers are written to the database. The header chain must contain the checkpoint header HW, and sync fails if a different header is encountered at the same block number. This sanity check exists to ensure that the chain is valid without having to sync all the way back to the genesis block.
When the genesis header HG is reached, block body data can be downloaded (3). There are two ways to do this:
The client can perform 'full sync', downloading blocks and executing their state transitions. This recreates the application state incrementally up to the latest block. Sync is complete when the latest finalized block BF+t has been processed.
The client can perform state synchronization by downloading the blocks BG+1…BF and their application state without EVM execution. This is expected to be faster than full sync, and is equally secure because the state root of BF was finalized by eth2. The state download can happen concurrently with steps (2) and (3).
The peer-to-peer network can only provide the state of very recent blocks. Since it is expected that the state of BF will gradually become unavailable as the chain advances, the client must occasionally re-target its state sync to a more recent 'pivot block'. Conveniently, the newly-finalized blocks BF+1…BF+t received from eth2 can be used for this purpose. You can read more about the pivot block in the snap sync protocol specification.
After reporting sync completion of BF+t to the eth2 client (4), the execution layer is done and switches to its ordinary mode of operation: individual blocks are received from the eth2 client, the blocks are processed, and their validity reported back to the eth2 client. Reorgs of non-finalized blocks may also be triggered after sync has completed. Reorg handling is discussed later in this document.
Handling restarts and errors
The above description of sync focuses on a single sync cycle. In order to be robust against failures, and to handle client restarts, clients must be able to perform multiple sync cycles with an initialized database. The interface between eth2 and eth1 makes this easy for eth2 because it is uni-directional: When eth2 restarts, it can simply perform the usual request sequence and expect that the eth1 client will reset itself to the correct state.
When eth1 receives note of a finalized block BF, there are two possibilities: if the block already exists in the local chain, and its application state is also available, sync isn't necessary. If the finalized block is unknown, the eth1 client should restart sync at step (1), downloading parent headers in reverse. If the block is known but its state is unavailable, the client should attempt to synchronize the state of BF or, when configured for full sync, attempt to process blocks forward up to BF from the most recent available state.
For eth1 sync restarts, block data persisted to the database by previous sync cycles can be reused. Whenever a finalized header Hx is to be fetched from the network, the client should check if the database already contains block data at the same block height x. If the local database contains a finalized header at height x, but its hash does not match Hx, the client should delete the header and all block data associated with it. If the hash of the previously-stored header does match Hx, sync can skip over the chain of locally available headers and resume sync at the height of the next unavailable header.
To make this skipping operation work efficiently, we recommend that clients store and maintain 'marker' records containing information about previously-stored contiguous chain segments. When sync starts at HF, the client stores marker MF = F. As subsequent headers Hx are downloaded, the marker is updated to MF = x. Similarly, as the chain is extended forward by concurrent calls to final(BF+1), the marker also moves forward, i.e. MF+1 = MF and MF is deleted.
Now assume that the sync cycle terminates unexpectedly at block height s. When the next cycle starts, it first loads marker records of previous sync cycles. As the new cycle progresses downloading parents, it will eventually cross the previous height F. If the header hash matches the previously-stored header HF, the marker can be used to resume sync at height s where the first cycle left off.
Reorg processing and state availability
It is common knowledge that the application state of eth1 can become quite large. As such, eth1 clients usually only store exactly one full copy of this state.
In order to make state synchronization work, the application state of the latest finalized block BF must be available for download. We therefore recommend that clients which store exactly one full copy of the state should store the state of BF.
For the tree of non-finalized blocks beyond BF, the state diff of each block can be held in main memory. As new blocks are finalized, the client applies their diffs to the database, moving the persistent state forward. Storing diffs in memory allows for efficient reorg processing: when the eth2 client detects a reorg from block bx to block by, it first determines the common ancestor ba. It can then submit all blocks Ba+1…By for processing. When the eth1 client detects that a block has already been processed because its state is available as a diff in memory, it can skip EVM processing of the block and just move its head state reference to the new block.
While reorgs below BF cannot happen during normal operation of the beacon chain, it may still be necessary to roll back to an earlier state when EVM processing flaws cause the client to deviate from the canonical chain. As a safety net for this exceptional case, we recommend that eth1 clients to maintain a way to manually reorg up to 90,000 blocks (roughly 2 weeks), as this would provide sufficient time to fix issues.
To make this 'manual intervention reorg' work, eth1 client can maintain backward diffs in a persistent store. If an intervention is requested, these diffs can be incrementally applied to the state of BF, resetting the client to an earlier state.
In early review of this scheme, two issues were discovered. Both stem from our misunderstanding of eth2 finalization semantics.
(1) Since eth2 finalizes blocks only on epoch boundaries, it wants to call final(B) only for epoch blocks. This could be handled a bit better by also using proc(B) in the sync trigger.
(2) While finalization will work within ~64 blocks in the happy case, it can take up to 2 weeks to finalize in the event of a network partition. Since the maximum number of non-finalized blocks is so much larger than we initially anticipated, it will not be possible to use BF as the persistent state block.
We have decided to tackle this issue in the following way:
- At head H, define the 'calcified' block BC with C = max(H-512, F). This puts an upper bound of 512 blocks on the number of states kept in memory.
- Define that clients should keep the state of BC in persistent storage.
- Use BC as the initial sync target. This has implications on the sync trigger because the eth1 client can no longer rely on final(B) to start sync (BC may be non-final).
- Add a new call **reset(B)** to reset the eth1 client to a historical block. Require that clients must be able to satisfy any reset in range BF…BH. They will probably have to implement something like the persistent reverse diffs recommended in the reorg section.
Adding the calcified block also adds some tricky new corner cases and failure modes. In particular, if the eth1 client just performed snap sync, it will not be able to reorg below BC, because reverse diffs down to BF will not be available. We may solve this by recommending that nodes should attempt snap sync if reset(B) cannot be satisfied. For sure, some nodes will be synced enough to serve the target state. In the absolute worst case, we need to make reverse diffs available for download in snap sync.