A consensus is a general agreement on value. When you have only two parties, it can be pretty straightforward:
Bob: What should we eat?
Sam: How about pizza?
Bob: I had pizza for lunch; how about tacos instead?
Sam: Sounds good!
Bob and Sam have agreed on a value that was suggested by one of them, and they will now take action based on that value. But imagine if we added ten or twenty people; finding consensus would be a lot more difficult. As the number of parties involved in a decision increases, the complexity does too.
Consensus works similarly in computer systems. In distributed systems, a set of nodes (primarily computers) work together to achieve some common goal such as processing a large computation. This requires a lot of coordination and understanding of roles in relation to the overall system.
Nodes may crash or malfunction, and they can get hacked and become unreliable. It’s much harder to achieve consensus in a distributed system when there are faulty nodes.
There are two types of failure modes in distributed systems: crash failures and Byzantine failures. Crash failures are common; they are the result of a node abruptly crashing. Byzantine failures are more complicated because there are no restrictions and no assumptions about the kind of behavior a node can have (e.g., a node posing as an honest actor can generate arbitrary data).
The Byzantine Generals Problem
In 1982, two Microsoft researchers proposed a perplexing thought experiment called the Byzantine Generals Problem. Here’s a summary:
- A group of generals are all in different locations and can only communicate by messenger, one message at a time.
- To successfully attack or retreat, the generals must all coordinate and perform the same action. If they all attack, they will be fine; if they all retreat, they will be fine. But if some generals attack while others retreat, that will bring a bad result.
- The twist is that some generals are disloyal and will try to deceive the others.
In this situation, the generals represent the nodes of a network; they must reach a consensus regarding the current state of the system. The Byzantine Generals Problem theorizes that the fundamental question regarding decentralized networks is how to ensure agreement among nodes when some of the nodes are likely to fail or act dishonestly.
To be reliable, a computing environment has to be designed in a way that solves this problem. At least two thirds of the nodes in the network must be honest. If a system can resist up to one third of the nodes failing or acting maliciously, it has what’s known as Byzantine Fault Tolerance (BFT). It’s secure.
(Systems that require BFT are used in industries like aeronautics and nuclear power. BFT is a viable solution for systems whose actions depend on many sensors’ results.)
How Does a Blockchain Fail?
If more than half of the nodes in a system act maliciously, the system is subject to a 51% attack. These are problematic because whoever controls the hash rate controls the blockchain’s operation. It’s essentially a mining monopoly.
Bad actors can intentionally exclude or modify the ordering of transactions and reverse previous transactions leading to “Double Spending.” Furthermore, 51% attacks are transient events, meaning that it’s impossible to detect them afterward; they must be observed at the time of the attack.
Once a blockchain grows large enough, the likelihood of a single person or group controlling the majority of the network’s hash rate drops drastically.
As Bitcoin rose in price and popularity, more miners joined the network, causing more competition for the block rewards. This competition added security to the system by incentivizing miners to act honestly. There’s no incentive to invest large amounts of resources in acting maliciously if you won’t be receiving the block reward.
Consensus Protocols
Most traditional distributed computing systems have centralized configuration databases or authorities that can help fix Byzantine failures when they occur.
By definition, blockchain networks do not have a central governing body to validate and verify each transaction, yet transactions are carried out successfully; how is that possible?
Consensus protocols are arguably the most critical core aspect of any blockchain network. Whenever a new transaction gets broadcasted to the network, nodes decide to either include that transaction in their copy of their ledger or ignore it. When the majority of the nodes decide on a single state, a consensus is achieved.
Therefore, in a distributed computing system, consensus algorithms obtain accuracy in the blockchain network and create trust between unknown participants.
The most common consensus protocols used in blockchain networks are proof of work (PoW), proof of stake (PoS), and proof of authority (PoA). While there are big differences, they all have some property of BFT.
By applying BFT, we can design systems that are not controlled by a single authority and do not rely on trusting certain parties — effectively creating trust in a trustless environment.
Why it Matters
For end users working with a well-designed blockchain, Byzantine faults and the details of BFT won’t matter. But if (like me) you want to apply blockchain technology in areas beyond digital currency, it’s imperative that you understand BFT.
BFT is a crucial part of effective blockchains, and there are many ways to implement it. The approach you choose requires careful consideration of the nature and priorities of the community that will be using the blockchain. The solutions to BFT that have made systems like Bitcoin possible may not work well in blockchain applications of the future.