Scaling IOTA Part 2 - Untangling the Tangle
To solve the problems mentioned in the first part of this blog post, we need a completely different form of sharding.
In the following sections, we will make a step-by-step introduction to the concepts that will allow IOTA to infinitely scale the number of nodes, without jeopardizing security in the early stages of adoption or breaking down to “supply shocks” in the throughput requirements.
The Tangle shards automatically
The Tangle is not limited by a block size and anyone can always attach new transactions without relying on anything but the previous two events it references. This means that the Tangle is essentially never full and can contain an arbitrary number of transactions.
If the network throughput exceeds the processing capabilities of the nodes, then the nodes will see and process different transactions depending on where in the network they are located and will therefore also see different tangles. Nodes with similar processing capabilities, that are close to each other will have a similar perception.
Every node will validate their respective part of the DAG consisting out of a subset of all existing transactions. This is the very definition of sharding: “Split the tasks and make each node only perform a subset of the total amount of work.”
The following picture shows how such a naturally sharded tangle would look like (half of the transactions get processed by half of the nodes):
This process of splitting can happen recursively so that i.e. shard 1 splits into further shards whenever the nodes are unable to process all of the remaining transactions. Different network segments will shard at different times depending on the actual throughput and more or less instantly without having to engage in complicated negotiations about when and where to shard. It simply depends on the transactions the nodes can process (agent-centric approach).
Once the load drops, the different tangles are in theory (see problems below) able to merge back together, resulting in a system that is able to dynamically react to different network conditions and growing adoption.
Problems with this approach
While this solution is extremely simple and in theory allows the network to scale to an arbitrary amount of transactions per second, it also has a few very serious problems:
- The nodes have no perception of which part of the tangle they are responsible for. Consequently, the users would have a hard time knowing which node to choose when trying to access their funds.
- It becomes hard for the different shards to interact as there is no objective perception of which shards even exist.
- Since the different tangles share the same history but not the same validators, one could double-spend funds that were received before the split, in each shard separately — see the picture:
This double-spending problem is pretty severe as it even prevents the tangles from merging back together (once the load drops), undermining the full potential of the solution.
The reason for these problems is however not the structure of the DAG itself (as claimed in this video by RADIX), but the fact that the Tangle essentially shards completely uncontrolled and randomly.
Solving the mentioned problems
To solve these problems we need to find a way for nodes to shard in a non-random and deterministic way. This means that we essentially need to find a mechanism that allows us to associate nodes and their transactions to a certain location in the sharded network.
To achieve this, we make every node and transaction carry a marker that defines to which shard they belong (even before any shards evolve). When the need arises to split up, the DAG splits and the ledger state splits with it.
Let’s have a look at the same example but this time with every transaction carrying a marker (blue or red):
Initially, the transactions of the different shards are “entangled” but after the tangle splits, the red nodes will only track red balances and transactions and the blue nodes only their respective part. This simple mechanism removes the ability to double-spend old funds without introducing a complicated form of inter-shard communication. Every shard has a certain jurisdiction that it is responsible for and a user that tries to spend blue funds in the red shard will simply be ignored. This mechanism is called state sharding.
In the example we used colors but the markers are really just numbers that can be represented by a bitmask which allows the tangle to shard arbitrarily often (a 64 bit marker would, for example, allow us to create more than 9 quintillion shards).
The shard marker becomes something like the sharding-DNA of a transaction that carries all the information it needs, to determine its future position in a grown-up DAG. In its infant state, this information will be latent but it will allow the Tangle to reliably shard as soon as the need arises.
Since the ledger states of both branches can never conflict, these shards can now even merge back together at a later point in time when the load drops.
A logical overlay mapping of the “real world”
The shard markers solve all of the aforementioned problems but in addition, they also provide us with a very powerful tool that allows us to design a sharding layout that reflects the real world as a form of logical overlay mapping, which ultimately means that we can give the numbers representing the shard markers a certain meaning.
Since we are designing a protocol for the Internet of Things, it makes a lot of sense to use a geographical mapping where shard markers translate to certain coordinates on this planet. This way, nodes that are physically close to each other will be part of the same shard.
This will not only reduce the network latencies within a shard, but it will also reduce the amount of inter-shard communication to an absolute minimum as most economic activity happens locally:
- A car buys electricity from a charging station close by.
- People buy pizza from the pizza place around the corner.
- and so on …
This geographic overlay mapping is, therefore, a very good approximation of the economic relationships in a certain region.
Additionally, this usage of a geographical mapping is a good way to deal with large scale network splits. If a network segment goes offline due to i.e. things like World War III, then nobody can spend Chinese funds in the US and vice versa. The two separate networks have therefore a higher chance to merge back together once the split is resolved.
If shard markers solve all of the mentioned problems … are we done? Not, yet! There are still a few problems that need to be addressed:
Before we introduced shard markers, the nodes were sharding automatically based on their own processing capabilities. This mechanism however no longer makes sense as we are trying to make nodes shard according to their belonging to a certain jurisdiction.
Nodes, therefore, need to have some form of consensus on when and where to shard. This is a tricky problem, especially since nodes have completely different processing capabilities and therefore different perceptions on when the time to split has come.
Untangling the tangle
Since we want to keep the mechanisms simple and do not want to introduce another consensus mechanism on top of the tangle, we need to find a different approach to deal with this problem.
If we look at the shard markers and what it actually “means” to have separate sub-tangles for every shard, then we realize pretty quickly that it gives us exactly one benefit: The nodes processing i.e. the red shard have a way to identify, download, and process only the red transactions, without them being entangled with transactions from other shards.
If the only purpose of having separate sub-tangles is to separate related transactions, then we can achieve the same thing by modifying the tip selection in the following way:
When attaching a new transaction carrying a shard marker X, then we choose a tip as the branch that has a shard marker Y where Y ≤ X and a trunk that has a shard marker Z where Z ≥ X.
What we end up with is a DAG that is pre-partitioned according to the markers in the tangle. Following the referenced branch will allow us to discover transactions with an equal or smaller marker and following the trunk will allow us to discover transactions with an equal or higher marker — see picture (using colors instead of numbers for the markers):
Even though the graph is not split into distinct branches anymore, we can still identify, verify and solidify transactions that are carrying equal markers in the same way as if they would actually be split.
When solidifying the tangle, nodes will stop to request the missing transactions, as soon as they reach a marker they are not interested in and just consider these transactions to be solid. Since the “blue ledger state” will only be affected by “blue transactions”, ignoring the red transactions will have no adverse effect on a blue node.
The described way of untangling the tangle does not just get rid of the need to reach consensus on when and where to split, but it also gives nodes total freedom to choose which part of the resulting DAG they are interested in and how much of the DAG they want to see and process without requiring some predetermined sharding layout.
Resource-constrained IoT devices can monitor a smaller part of the DAG than more powerful nodes. The shards are therefore no longer isolated chunks of data but continuous regions, with nodes being able to validate different parts according to their interest (location in the real world) — see the picture:
Example: A person living between East and West Berlin, could decide to “follow” half of East and half of West Berlin, whereas a person living in the center of West Berlin could decide to only follow West Berlin instead.
We are also able to adjust the size of the monitored slice according to the throughput requirements in the network by changing the observation radius.
Example: Nodes in a region with very little economic activity could afford to monitor a much bigger “slice” of the DAG than nodes that are operating in a densely populated area with relatively more transactions.
By using a priority queue for the received transactions — that sorts received transactions by their distance from a node’s location in the network — nodes will automatically reduce their radius of perception whenever they become overloaded, filtering out transactions from nodes that are too far away. They will therefore instantly react to different throughput requirements without having to re-negotiate a new sharding layout or use fees to decide which transactions to execute. The solution becomes agent-centric and allows us to instantly react to different loads in the network.
We have introduced a few very simple changes to the way the tangle works. These changes allow us to untangle the transactions in the DAG and group them according to their participation in the corresponding shards.
Instead of simply running many tangles in parallel (like blockchains), this new way of fluid sharding allows us to continue using one large continuous tangle that spans the whole world and is still able to process an arbitrarily large amount of transactions per second.
In the beginning, when adoption is still low, most nodes will most probably see and process all transactions, but as soon as adoption grows and the throughput requirements exceed the processing capabilities of nodes, we will see nodes slowly decrease their radius of perception in order to handle the higher load.
In the 3rd part of this blog post, which will be published soon, I will explain how inter-shard transactions between distant shards are going to work and how having a continuous tangle allows us to implement a similar mechanism as a beacon chain, but without a beacon chain, unlocking true “infinite” scalability (with the number of nodes).