Forums
Talk about anything you want!
Login to get your referral link.
One of the important problems that has been raised during the Olympic version of the stress network is the large amount of data that customers must store; Over a little more than three months of operation, and in particular during the last month, the amount of data in the blockchain folder of each Ethereum client has increased to 10 to 40 gigabytes, depending on the customer you use and whether compression is activated or not. Although it is important to note that it is indeed a stress test scenario where users are encouraged to empty transactions on blockchain paying only the free test as a transaction costs, and the levels of transaction flow are therefore several times higher than bitcoin, it is nevertheless a legitimate concern for users, which, in many cases, do not have hundreds of gigabytes for hundreds of gigabytes Compection yourself on the other transaction stories of others.
First of all, let’s start by exploring why the current Ethereum client database is so large. Ethereum, unlike Bitcoin, has the property that each block contains something called the “state root”: the rooting of a Specialized Merkle tree type Who stores the entire system state: all account balances, contract storage, contract code and non-accounts are inside.
The purpose of this is simple: it allows a node since the last block, with a certain assurance that the last block is in fact the most recent block, to “synchronize” with the blockchain extremely quickly without treating any historical transaction, simply downloading the rest of the tree from the nodes of the network (the proposed Hashlookup Wire protocol message Will facilitate this), verifying that the tree is correct by checking that all the hashs correspond, then proceeding from there. In a fully decentralized context, this will probably be done through an advanced version of the Bitcoin check-on strategy, which will be more or less:
For light customers, the state root is even more advantageous: they can immediately determine the balance and the exact condition of any account by simply asking the network A particular branch of the tree, without needing to follow the 1 of the bitcoin in several stages “” Ask all the transaction outputs, then ask all the transactions Expert these outings and take the model “
However, this state tree mechanism has an important drawback if it is implemented naively: the intermediate nodes of the tree considerably increase the quantity of disk space necessary to store all the data. To see why, consider this diagram here:
Change in the tree during each individual block is quite small, and the magic of the tree as a data structure is that most data can simply be referenced twice without being copied. However, even for each change in the state that is made, a large nodes (i.e. ~ 5 to 1000 knots, ~ 10 to 1000,000 knots, ~ 15 to 10,000,000 nodes) must be stored twice, a version for the old tree and a version for the new sorting. Finally, as a node deals with each block, we can therefore expect the total use of the disk space to be, in terms of computer science, roughly O (n * log (n))Or n is the transaction load. In practical terms, Ethereum blockchain is only 1.3 gigabytes, but the size of the database, including all these additional nodes, is 10 to 40 gigabytes.
So what can we do? A rear appearance correction is simply to go ahead and implement the synchronization of headers, essentially resetting the hard drive consumption of new users to zero, and allowing users to maintain their low hard disk consumption by renewing every one or two months, but it is a somewhat ugly solution. The alternative approach is to implement pruning of state trees: Essentially, use Reference count To follow the moment when the nodes of the tree structure (here by using “node” in the term of computer science meaning “data pieces which are somewhere in a graph or a tree structure”, and not “computer on the network”) deposit X Blocks (for example. X = 5000), after this number of blocks passes, the node must be permanently deleted from the database. Essentially, we store the tree nodes which are part of the current state, and we even store recent history, but we do not store the history of more than 5000 blocks.
X should be set as little as possible to keep the space, but the setting X Too low compromise robustness: once this technique is implemented, a node cannot come back more than X blocks without essentially completely restarting synchronization. Now, let’s see how this approach can be fully implemented, taking into account all angle cases:
Once this is done, the database should only store the state nodes associated with the last X Blocs, you will therefore always have all the information you need in these blocks but nothing more. In addition to that, there are other optimizations. In particular, after X The blocks, the transaction and reception trees must be completely deleted, and even the blocks can undoubtedly be deleted – although there is an important argument to keep an “archive knots” subset which absolutely store everything in order to help the rest of the network to acquire the data it needs.
Now, how many savings can it give us? It turns out that a lot! In particular, if we were to take the ultimate road to the daredevil and go X = 0 (ie absolutely losing any capacity to manage even the single block forks, not storing any history), then the database size would essentially be state size: a value which, even now (these data have been entered in block 670000) stand at around 40 megaocytes-the majority of which are composed of which is made up of around 40 mega-mega-mega-the majority is composed of which is made up of around 40 Megoocytes – The majority Accounts like this With storage locations filled to deliberately spam the network. HAS X = 100,000We would mainly obtain the current size of 10 to 40 gigabytes, because most of the growth occurred in the last one hundred thousand blocks, and the additional space required to store journals and lists of death layers would make up the rest of the difference. With each value between the two, we can expect the growth of the disk space to be linear (that is to say. X = 10000 would take us about eighty percent from the path to near zero).
Note that we may want to continue a hybrid strategy: keep each block But not all state -shaped knot; In this case, we would need to add approximately 1.4 gigabytes to store block data. It is important to note that the cause of the blockchain size is not quick block times; Currently, the block headers of the last three months represent approximately 300 mega-cells, and the rest is transactions of the last month, so at high levels of use, we can expect to continue to see the transactions dominate. That said, light customers will also have to cut block headers if they want to survive in low memory circumstances.
The strategy described above has been implemented in a very early alpha form in pyeth; It will be implemented properly in all customers in good time after the launch of Frontier, because these storage bloating are only a medium -term concern and not in the short term.
post url: https://altcoin.observer/snaping-of-state-trees-ethereum-foundation-blog/
1
Voice
0
Replies