Understand FLP-Impossibility papers

Understand FLP-Impossibility papers

Original address: www.inlighting.org/archives/un...

FLP this paper has an important role in the distributed field, of course, this article is also obscure. This is the first distributed paper that I read every word. It is very difficult. I record it here, and it may be simple to write. I hope it can help new people who are new to distributed computing to understand FLP papers more easily. Of course, no matter how simple it is, mathematical symbols can t run away, but don t be afraid.

The name of the original paper is: Impossibility of Distributed Consensus with One Faulty Process. The simple name is FLP-Impossibility.

If something is wrong in the article, please leave a message to correct it.


The FLP theorem is equivalent to a coffin board for the distributed consensus algorithm, which proves that it is impossible to achieve a true consensus algorithm.

Of course, before starting, let me explain what the real consensus algorithm approved by the paper is like:

Validity : Validity (legitimacy). If there are only two types of data in all nodes, 0 and 1, then the final consensus resolution must be one of 0 and 1. It is impossible to say that the algorithm somehow reached a consensus value such as -1.

Agreement : Consistency, all nodes reach a consensus resolution.

Termination : Termination, all normally running nodes can finally make a decision.

Note that the consensus algorithms you know, such as Paxos or Raft, are not consensus algorithms in the true sense, because they cannot satisfy the above three conditions at the same time.


The following terminology description is only for this paper. Of course, you can skip this section first, wait until you see the related terminology later, and then return here for reference.

Consensus algorithm : Consensus algorithm can also be said to be a consensus algorithm. For example, Raft and Paxos can be called consensus algorithm.

Byzantine Generals problem : The Byzantine Generals problem refers to what to do if data is tampered with during transmission. Usually consensus algorithms assume that the Byzantine general problem does not occur, that is, assume that the data will not be tampered with during transmission (the node will not receive a forged data). Because under normal circumstances, network protocols such as TCP can guarantee that the data you get is most likely to be correct, and the consensus algorithm generally runs on the intranet, which is relatively safe.

Process : In the paper, you can understand it as a node in a distributed system. Every processpp has an input registerxpx_p, Can be understood as the input value, there are only two kinds of 0, 1. In addition processpp also has an output registeryby_b ,yby_b There are three values of b, 0, and 1.xpA{0,1},ypA{b,0,1}x_p/in/{0,1\},y_p/in/{b,0,1\} .

Message system : Just imagine this as a network, which connects all processes in the entire cluster. All processes in the entire system share this message system, and the processes communicate with each other through this, that is, send messages to each other.

Asynchronous system : Asynchronous system. The paper made the following requirements for the asynchronous system: (1) You don't know the relative processing speed of each process, that is, you don't know which process will be processed first. (2) The message in the Message system may arrive late (delay, and you don t know how long the delay is, but it will arrive). (3) There is no shared synchronized clock in the entire system, which means that you cannot design algorithms by time. (4) You have no ability to detect the death (crash) of a certain process, and a certain process will not notify other processes in advance before it is dead.

Nonfaulty : No fault.

Internal state : the input register of a processxpx_p, Output register yby_b, Internal storage space, etc. constitute the internal state of a process.

Configuration : A Configuration contains the internal state of all processes in the entire cluster plus the message system.

Step : The process from one configuration to another is called step. For example, a process receives a piece of data from the message system, and then changes its output register and sends a message to the message system, thus forming a new configuration for the entire system.

Atomic step : An atomic step. This step includes the process of receiving data from the message system, processing it locally, and sending data to other processes through the message system.

Decision state : The state of reaching a resolution, for example, the process has reached a resolution and determines its value is 0 or 1.

Initial state : All processes can be different except the input register, and other values are fixed to death, among which the output registeryb=by_b=b .

Initial configuration : All processes are in the initial state, and the message system is empty.

Regarding the initial state, initial configuration, and configuration, here is a little explanation. Initial configuration does not really refer to the configuration when the consensus algorithm just starts to run, but refers to the initial configuration before the start of each round of resolution. For example, when we start the first round of resolution, this is an initial configuration. In the process of resolution, our configuration will be changed every time it passes through a step until all processes are unified, forming a new initial configuration. In the second round, our algorithm will start to decide again. At this time, the result of the first round of resolution will be used as the initial configuration of the second round, and so on.

This is why the initial state allows the input register of the process to be different, because everyone's input register is the value inherited from the previous round of resolutions.

Run : A run contains multiple steps.

Deciding run : In this run, if some process can reach the decision state, then this run is called a deciding run. (Note that certain processes are required in the paper).

Admissible run : There is only one process dead at most, and all other normal processes receive their own messages from the message system (all messages in the message system have been sent).

Event :e(p,m)e(p,m) means processpp received datamm and proceed with this process.

Schedule : A series of events form a schedule. schedule use \sigma said. we use (C)\sigma(C) means configurationCC through schedule \sigmaThe result obtained after . Also explain (C)\sigma(C) is fromCC reachable. If a configuration can be reachable from some initial configuration, then this configuration is accessible (accessible actually has the same meaning as reachable, except that it is changed from the initial configuration and the other from the configuration to another word).

Accessible configuration : See the instructions in the schedule.

Decision value : If in a configurationCC ,someprocesses are already in decision state, and their output valuesyb=vy_b=v , then we think that the decision value of this configuration isvv . Note: It is also not necessary for all processes to be in decision state here, because the paper relaxes the conditions, as long as a part of the process can make a decision, even if the whole decision is made.

Note: The decision state is an expression for a certain process, and the decision value is an expression for the entire configuration.

Partially correct : Partially correct can be called if the following two conditions are met. (1) All no accessible configuration includes two or more decision values. It is equivalent to saying that all accessible configurations have only one decision value. To put it bluntly, all processes in the entire cluster, or you are not in the decision state, and the process in the decision state, the value of the decision reached is the same. (2) Assuming decision valuevA{0,1}v\in/{0,1\} . There are some accessible configurations with decision valuevv . Note that there are only some, because some accessible configurations may not be able to make a decision value.

Totally correct : In the case of only one process dead, it is partially correct, and each admissible run is a deciding run. Then it is totally correct.

Regarding partially correct and totally correct, here is a brief explanation. You can find that partially correct is very lenient. As long as you have the ability to make a resolution, you can make a resolution through several consultations out of ten times, even if you are partially correct. For example, a bunch of nodes start the first round of negotiation, but no agreement is reached, forget it, and then enter the second round. Then the second round of negotiation is lucky and can reach a resolution. This is considered partially correct.

For totally correct, he requires that an agreement can be reached in every round of negotiation.

This paper proves that when all consensus algorithms are partially correct, there are certain admissible runs that are not deciding runs. In the vernacular, no algorithm can guarantee that every negotiation will reach a resolution. (You can think about it here. If this algorithm can't guarantee that consensus can be reached in every negotiation, then in the process of reaching consensus, we will choose this process that cannot reach consensus every time. It is not that consensus will not be reached forever.)

Bivalent : If the configuration contains two decision values (that is, in multiple processes, some decision states are 0 and some are 1), then it is called bivalent.

Univalent : If there is only one decision value in the configuration, it is called univalent.

0-valent : It is univalent, and its decision value is 0.

1-valent : It is univalent, and its decision value is 1.


Here is a specific description of the model established in the paper.

Distributed environment assumed by the paper:

  1. Byzantine failures are not considered.
  2. Assume that the message system is reliable, will not lose data, and will not send data to the process repeatedly (exactly once). But note that the data sent in the message system can be out of order, that is, the order of data transmission is not guaranteed, and sometimes it will return empty ( \emptyset ).
  3. Suppose a process will be dead at an inappropriate time.

Under these three loose conditions, the paper proves that there is no consensus algorithm that can guarantee consistency.

  1. For simplicity, the paper assumes that the input register of all processes xpA{0,1}x_p/in/{0,1\} . All nonfaulty processes will determine a value output register when entering the decision stateypA{0,1}y_p/in/{0,1\} . Normally, since it is a consensus algorithm, it must be ensured that all nonfaulty processes choose the same value in the decision state, for example, everyone chooses 0 or 1. But in the paper, the conditions are relaxed again here. It only requires thatpart of thenonfaulty process is enough to make a decision,and itis enough to reach the decision state.
  2. As mentioned earlier, the message system may return \emptyset But the paper assumes that if a process keeps receiving messages from the message system, it will eventually be able to receive the messages that should have been sent to it. This simulates the delay of the network, but the message will be delivered. (Message system by sending \emptyset way to simulate delay).

Consensus protocol PP

Assuming a consensus algorithm PP , running in an asynchronous system to ensure the number of processes in the entire systemN 2N\geq2 . Each process input registerxpx_pcan only be {0,1}\{0,1\} one of,xpA{0,1}x_p/in/{0,1\} , output registerypy_{p} for {b,0,1}\{b,0,1\} one of,ypA{b,0,1}y_p/in/{b,0,1\} . When a process is in the initial state, its output registeryp=by_p=b ,yby_b Will pass the transition function pp becomes 0 or 1. When the process makes a decision (reaching the decision state), then the transition functionpp can no longer change its output register. It can be understood that the output register is write-once.

About transition function pp , you understand it as a process that receives a message, calls its own transition function (internal function), and changes its internal state.

The communication between processes is realized through the message system. The format of message is(p,m)(p,m) ,pp is the target process,mm is the content of the message. You can imagine the entire message system as a buffer (message order is not guaranteed).

Message system supports two operations:

  • send(p,m)send(p,m) , will(p,m)(p,m) Put it in the message system buffer.

  • receive(p)receive(p) , taken out of the message system belongs to the processpp 's message. If the retrieval is successful, the message system will remove the message. Of course, it may fail to retrieve and return a \emptyset .

The Message system simulates the uncertainty of the network in the entire distributed system through the above two operations (but at least it also guarantees that the data will not disappear, much better than in real life).

Suppose the configuration of the entire system is CC ,e(C)e(C) means that an event occurs when configured asCCThe result after the cluster of

Lemma1 "Commutativity" property of schedules

Suppose that from some configuration CC , the schedules 1\sigma_1 , 2\sigma_2 lead to configuration C1C_1, C2C_2 , respectively. If the sets of processes taking steps in 1\sigma_1 and 2\sigma_2, respectively, are disjoint, then 2\sigma_2 can be applied to C1C_1 and 1\sigma_1 can be applied to C2C_2 , and both lead to the same configuration C3C_3 .

we assume that 1\sigma_1 with 2\sigma_2 No intersection , 1\sigma_1 Modify only p1,p2p_1,p_2 Value, 2\sigma_2 Modify only p3p_3Value. You can see whichever one is executed first \sigma , the final result is the same, this is the commutative law of schedule.

Main Result

Let me start with the conclusion:

No consensus protocol is totally correct in spite of one fault.

As long as there is a node failure, no consensus algorithm can achieve totally correct.

The paper is proved by contradiction, assuming a consensus algorithm PP is total correct. The idea of contradiction is divided into two steps: (1) the existence of the initial configuration is bivalent (2) the proof that the consensus algorithm starts from a bivalent intial configuration, and there is always an admissible run that prevents the system from making a decision.


PP has a bivalent initial configuration.

Similarly, we use the contradiction method, assuming PP does not exist bivalent initial configuration. ThenPPThere are only two types of 0-valent and 1-valent. You can't say that there is only one of 0-valent and 1-valent, because if so, what is the significance of your consensus algorithm? Anyway, everyone said that it is a value, and there is no need to change it.

As shown in the figure above, if two adjacent initial configurations only have a different value for one process, then these two initial configurations are called advanced initial configurations. At the same time, assume that the rule of our consensus algorithm is that the minority obeys the majority:process=iprocess=iThe number of processes is the most, so the configuration is i-valent.

On the left is the initial configuration change process when no node fails.

On the right is p2p_2 It was dead at an inappropriate time. At this time, you can notice that in fact, the one on the right C0C_0 C1C_1 p2p_2 0-valent 1-valent C0C_0 0-valent P2P_2 1-valent C1C_1 C1C_1 C1 C2C_1 C_2 0-valent 1-valent bivalent initial configuration univalent Lemma 2


Let CC be a bivalent configuration of PP, and let e=(p,m)e=(p,m) be an event that is applicable to CC. Let C\mathbb{C} be the set of configurations reachable from CC without applying ee, and let D=e(C)={e(E) E C}\mathbb{D}=e(\mathbb{C})=\{e(E)|E\in/mathbb{C}\} and ee is applicable to EE. Then, D\mathbb{D} contains a bivalent configuration.

D\mathbb{D} bivalent configuration configuration D DD/in/mathbb{D} univalent

EiE_i i-valent configuration CC i {0,1}i/in/{0,1\} EiE_i 0 1 CC bivalent Lemma2 Fi DF_i/in/mathbb{D} FiF_i univalent 1 ee EiE_i Ei CE_i/in/mathbb{C}, Fi=e(Ei) DF_i=e(E_i)/in/mathbb{D} 2 ee EiE_i Fi DF_i/in/mathbb{D} Fi=e(bivalent) CF_i = e(bivalent)/in/mathbb{C} FiF_i EiE_i

configuration step neighbors

neighbor configuration C0,C1 CC_0, C_1/in/mathbb{C} C1=e (C0)C_1=e'(C_0) Di=e(Ci) DD_i=e(C_i)/in/mathbb{D} DiD_i univalent e=(p,m),e =(p ,m )e=(p,m),e'=(p',m')

Case 1 p pp'/neq p Lemma1 D1=e (D0)D_1=e'(D_0) ee e e' DiD_i univalent D0D_0 0-valent D1D_1 1-valent D0D_0 univalent

configuration i-valent

Case 2 p =pp'=p deciding run C0C_0 deciding run pp dead

deciding run \sigma process pp dead A= (C0)A=\sigma(C_0)

\sigma pp dead e,e e,e' pp process

\sigma deciding run AA univalent AA E0E_0 0-valent E1E_1 1-valent AA bivalent Lemma 3

Lemma2 Lemma3 Lemma1 bivalent initial configuration C0C_0 Lemma 2 event ee C1=e(C0)C_1=e(C_0) C1C_1 univalent ee configuration univalent consensus algorithm termination consensus algorithm

consensus algorithm Raft Paxos ee termination FLP Impossibility asynchronous system synchronize clock FLP CAP