Original address: www.inlighting.org/archives/un...
FLP this paper has an important role in the distributed field, of course, this article is also obscure. This is the first distributed paper that I read every word. It is very difficult. I record it here, and it may be simple to write. I hope it can help new people who are new to distributed computing to understand FLP papers more easily. Of course, no matter how simple it is, mathematical symbols can t run away, but don t be afraid.
The name of the original paper is: Impossibility of Distributed Consensus with One Faulty Process. The simple name is FLP-Impossibility.
If something is wrong in the article, please leave a message to correct it.
The FLP theorem is equivalent to a coffin board for the distributed consensus algorithm, which proves that it is impossible to achieve a true consensus algorithm.
Of course, before starting, let me explain what the real consensus algorithm approved by the paper is like:
Validity : Validity (legitimacy). If there are only two types of data in all nodes, 0 and 1, then the final consensus resolution must be one of 0 and 1. It is impossible to say that the algorithm somehow reached a consensus value such as -1.
Agreement : Consistency, all nodes reach a consensus resolution.
Termination : Termination, all normally running nodes can finally make a decision.
Note that the consensus algorithms you know, such as Paxos or Raft, are not consensus algorithms in the true sense, because they cannot satisfy the above three conditions at the same time.
The following terminology description is only for this paper. Of course, you can skip this section first, wait until you see the related terminology later, and then return here for reference.
Consensus algorithm : Consensus algorithm can also be said to be a consensus algorithm. For example, Raft and Paxos can be called consensus algorithm.
Byzantine Generals problem : The Byzantine Generals problem refers to what to do if data is tampered with during transmission. Usually consensus algorithms assume that the Byzantine general problem does not occur, that is, assume that the data will not be tampered with during transmission (the node will not receive a forged data). Because under normal circumstances, network protocols such as TCP can guarantee that the data you get is most likely to be correct, and the consensus algorithm generally runs on the intranet, which is relatively safe.
Process : In the paper, you can understand it as a node in a distributed system. Every process has an input register, Can be understood as the input value, there are only two kinds of 0, 1. In addition process also has an output register , There are three values of b, 0, and 1. .
Message system : Just imagine this as a network, which connects all processes in the entire cluster. All processes in the entire system share this message system, and the processes communicate with each other through this, that is, send messages to each other.
Asynchronous system : Asynchronous system. The paper made the following requirements for the asynchronous system: (1) You don't know the relative processing speed of each process, that is, you don't know which process will be processed first. (2) The message in the Message system may arrive late (delay, and you don t know how long the delay is, but it will arrive). (3) There is no shared synchronized clock in the entire system, which means that you cannot design algorithms by time. (4) You have no ability to detect the death (crash) of a certain process, and a certain process will not notify other processes in advance before it is dead.
Nonfaulty : No fault.
Internal state : the input register of a process, Output register , Internal storage space, etc. constitute the internal state of a process.
Configuration : A Configuration contains the internal state of all processes in the entire cluster plus the message system.
Step : The process from one configuration to another is called step. For example, a process receives a piece of data from the message system, and then changes its output register and sends a message to the message system, thus forming a new configuration for the entire system.
Atomic step : An atomic step. This step includes the process of receiving data from the message system, processing it locally, and sending data to other processes through the message system.
Decision state : The state of reaching a resolution, for example, the process has reached a resolution and determines its value is 0 or 1.
Initial state : All processes can be different except the input register, and other values are fixed to death, among which the output register .
Initial configuration : All processes are in the initial state, and the message system is empty.
Regarding the initial state, initial configuration, and configuration, here is a little explanation. Initial configuration does not really refer to the configuration when the consensus algorithm just starts to run, but refers to the initial configuration before the start of each round of resolution. For example, when we start the first round of resolution, this is an initial configuration. In the process of resolution, our configuration will be changed every time it passes through a step until all processes are unified, forming a new initial configuration. In the second round, our algorithm will start to decide again. At this time, the result of the first round of resolution will be used as the initial configuration of the second round, and so on.
This is why the initial state allows the input register of the process to be different, because everyone's input register is the value inherited from the previous round of resolutions.
Run : A run contains multiple steps.
Deciding run : In this run, if some process can reach the decision state, then this run is called a deciding run. (Note that certain processes are required in the paper).
Admissible run : There is only one process dead at most, and all other normal processes receive their own messages from the message system (all messages in the message system have been sent).
Event : means process received data and proceed with this process.
Schedule : A series of events form a schedule. schedule use said. we use means configuration through scheduleThe result obtained after . Also explain is from reachable. If a configuration can be reachable from some initial configuration, then this configuration is accessible (accessible actually has the same meaning as reachable, except that it is changed from the initial configuration and the other from the configuration to another word).
Accessible configuration : See the instructions in the schedule.
Decision value : If in a configuration ,someprocesses are already in decision state, and their output values , then we think that the decision value of this configuration is . Note: It is also not necessary for all processes to be in decision state here, because the paper relaxes the conditions, as long as a part of the process can make a decision, even if the whole decision is made.
Note: The decision state is an expression for a certain process, and the decision value is an expression for the entire configuration.
Partially correct : Partially correct can be called if the following two conditions are met. (1) All no accessible configuration includes two or more decision values. It is equivalent to saying that all accessible configurations have only one decision value. To put it bluntly, all processes in the entire cluster, or you are not in the decision state, and the process in the decision state, the value of the decision reached is the same. (2) Assuming decision value . There are some accessible configurations with decision value . Note that there are only some, because some accessible configurations may not be able to make a decision value.
Totally correct : In the case of only one process dead, it is partially correct, and each admissible run is a deciding run. Then it is totally correct.
Regarding partially correct and totally correct, here is a brief explanation. You can find that partially correct is very lenient. As long as you have the ability to make a resolution, you can make a resolution through several consultations out of ten times, even if you are partially correct. For example, a bunch of nodes start the first round of negotiation, but no agreement is reached, forget it, and then enter the second round. Then the second round of negotiation is lucky and can reach a resolution. This is considered partially correct.
For totally correct, he requires that an agreement can be reached in every round of negotiation.
This paper proves that when all consensus algorithms are partially correct, there are certain admissible runs that are not deciding runs. In the vernacular, no algorithm can guarantee that every negotiation will reach a resolution. (You can think about it here. If this algorithm can't guarantee that consensus can be reached in every negotiation, then in the process of reaching consensus, we will choose this process that cannot reach consensus every time. It is not that consensus will not be reached forever.)
Bivalent : If the configuration contains two decision values (that is, in multiple processes, some decision states are 0 and some are 1), then it is called bivalent.
Univalent : If there is only one decision value in the configuration, it is called univalent.
0-valent : It is univalent, and its decision value is 0.
1-valent : It is univalent, and its decision value is 1.
Here is a specific description of the model established in the paper.
Distributed environment assumed by the paper:
- Byzantine failures are not considered.
- Assume that the message system is reliable, will not lose data, and will not send data to the process repeatedly (exactly once). But note that the data sent in the message system can be out of order, that is, the order of data transmission is not guaranteed, and sometimes it will return empty ( ).
- Suppose a process will be dead at an inappropriate time.
Under these three loose conditions, the paper proves that there is no consensus algorithm that can guarantee consistency.
- For simplicity, the paper assumes that the input register of all processes . All nonfaulty processes will determine a value output register when entering the decision state . Normally, since it is a consensus algorithm, it must be ensured that all nonfaulty processes choose the same value in the decision state, for example, everyone chooses 0 or 1. But in the paper, the conditions are relaxed again here. It only requires thatpart of thenonfaulty process is enough to make a decision,and itis enough to reach the decision state.
- As mentioned earlier, the message system may return But the paper assumes that if a process keeps receiving messages from the message system, it will eventually be able to receive the messages that should have been sent to it. This simulates the delay of the network, but the message will be delivered. (Message system by sending way to simulate delay).
Assuming a consensus algorithm , running in an asynchronous system to ensure the number of processes in the entire system . Each process input registercan only be one of, , output register for one of, . When a process is in the initial state, its output register , Will pass the transition function becomes 0 or 1. When the process makes a decision (reaching the decision state), then the transition function can no longer change its output register. It can be understood that the output register is write-once.
About transition function , you understand it as a process that receives a message, calls its own transition function (internal function), and changes its internal state.
The communication between processes is realized through the message system. The format of message is , is the target process, is the content of the message. You can imagine the entire message system as a buffer (message order is not guaranteed).
Message system supports two operations:
, will Put it in the message system buffer.
, taken out of the message system belongs to the process 's message. If the retrieval is successful, the message system will remove the message. Of course, it may fail to retrieve and return a .
The Message system simulates the uncertainty of the network in the entire distributed system through the above two operations (but at least it also guarantees that the data will not disappear, much better than in real life).
Suppose the configuration of the entire system is , means that an event occurs when configured asThe result after the cluster of
Lemma1 "Commutativity" property of schedules
Suppose that from some configuration , the schedules , lead to configuration , , respectively. If the sets of processes taking steps in and , respectively, are disjoint, then can be applied to and can be applied to , and both lead to the same configuration .
we assume that with No intersection , Modify only Value, Modify only Value. You can see whichever one is executed first , the final result is the same, this is the commutative law of schedule.
Let me start with the conclusion:
No consensus protocol is totally correct in spite of one fault.
As long as there is a node failure, no consensus algorithm can achieve totally correct.
The paper is proved by contradiction, assuming a consensus algorithm is total correct. The idea of contradiction is divided into two steps: (1) the existence of the initial configuration is bivalent (2) the proof that the consensus algorithm starts from a bivalent intial configuration, and there is always an admissible run that prevents the system from making a decision.
has a bivalent initial configuration.
Similarly, we use the contradiction method, assuming does not exist bivalent initial configuration. ThenThere are only two types of 0-valent and 1-valent. You can't say that there is only one of 0-valent and 1-valent, because if so, what is the significance of your consensus algorithm? Anyway, everyone said that it is a value, and there is no need to change it.
As shown in the figure above, if two adjacent initial configurations only have a different value for one process, then these two initial configurations are called advanced initial configurations. At the same time, assume that the rule of our consensus algorithm is that the minority obeys the majority:The number of processes is the most, so the configuration is i-valent.
On the left is the initial configuration change process when no node fails.
On the right is It was dead at an inappropriate time. At this time, you can notice that in fact, the one on the right 0-valent 1-valent 0-valent 1-valent 0-valent 1-valent bivalent initial configuration univalent Lemma 2
Let be a bivalent configuration of , and let be an event that is applicable to . Let be the set of configurations reachable from without applying , and let and is applicable to . Then, contains a bivalent configuration.
bivalent configuration configuration univalent
i-valent configuration 0 1 bivalent Lemma2 univalent 1 , 2
configuration step neighbors
neighbor configuration univalent
Case 1 Lemma1 univalent 0-valent 1-valent univalent
Case 2 deciding run deciding run dead
deciding run process dead
deciding run univalent 0-valent 1-valent bivalent Lemma 3
Lemma2 Lemma3 Lemma1 bivalent initial configuration Lemma 2 event univalent configuration univalent consensus algorithm termination consensus algorithm
consensus algorithm Raft Paxos termination FLP Impossibility asynchronous system synchronize clock FLP CAP