1 00:00:00,710 --> 00:00:02,820 >> Welcome to our talk about 2 00:00:02,820 --> 00:00:05,570 model-based testing with TLA+ and Apalache. 3 00:00:05,570 --> 00:00:08,795 This is a joint work of Igor Konnov and myself. 4 00:00:08,795 --> 00:00:12,755 We both are proud to be members of Informal Systems. 5 00:00:12,755 --> 00:00:14,935 Igor Konnov is the Principal Scientist 6 00:00:14,935 --> 00:00:17,700 developing next generation verification methods. 7 00:00:17,700 --> 00:00:20,970 He's also the Leading Developer of Apalache model checker. 8 00:00:20,970 --> 00:00:24,660 I'm a Research Engineer at Informal who also contributes to 9 00:00:24,660 --> 00:00:29,220 Apalache and leads the product on model-based testing. 10 00:00:29,220 --> 00:00:33,690 [MUSIC] 11 00:00:33,690 --> 00:00:35,910 >> We're using Rust Informal verification tools to 12 00:00:35,910 --> 00:00:38,530 build open source fault-tolerant distributed systems, 13 00:00:38,530 --> 00:00:41,270 like the Tenement Byzantine Fault Tolerant Consensus engine 14 00:00:41,270 --> 00:00:43,610 and the Inter-Blockchain Communication protocol. 15 00:00:43,610 --> 00:00:46,300 We're cooperatively owned and governed by our employees. 16 00:00:46,300 --> 00:00:47,805 We have offices in Toronto, 17 00:00:47,805 --> 00:00:49,685 Berlin, Vienna, and Lausanne. 18 00:00:49,685 --> 00:00:52,835 In fact, a few of the folks at Informal would love to say hello. 19 00:00:52,835 --> 00:00:54,785 >> Hello. 20 00:00:54,785 --> 00:00:57,055 >> Hello, world. 21 00:00:57,055 --> 00:00:58,605 >> Hi. 22 00:00:58,605 --> 00:00:59,730 >> Hi, everyone. 23 00:00:59,730 --> 00:01:01,170 >> Hello. 24 00:01:01,170 --> 00:01:02,625 >> Hey, everyone. 25 00:01:02,625 --> 00:01:03,900 >> Hello. 26 00:01:03,900 --> 00:01:04,845 >> Hi. 27 00:01:04,845 --> 00:01:06,015 >> Hey, everyone. 28 00:01:06,015 --> 00:01:08,070 >> You can learn more about us, check out our work, 29 00:01:08,070 --> 00:01:11,845 and contribute online at github.com/informalsystems. 30 00:01:11,845 --> 00:01:13,340 We hope you enjoy the conference. 31 00:01:13,340 --> 00:01:19,440 [MUSIC] 32 00:01:19,440 --> 00:01:21,405 >> Who we are. At Informal Systems, 33 00:01:21,405 --> 00:01:24,455 we envision the bright future 34 00:01:24,455 --> 00:01:27,875 of verifiable distributed systems and organizations. 35 00:01:27,875 --> 00:01:31,620 Nowadays, everything becomes distributed. 36 00:01:32,030 --> 00:01:36,335 At the same time, when we interact with systems and organizations, 37 00:01:36,335 --> 00:01:38,810 we need a certain level of trust. 38 00:01:38,810 --> 00:01:42,950 Our grand vision is that formal verification can help us to bring 39 00:01:42,950 --> 00:01:44,930 this necessary level of trust to 40 00:01:44,930 --> 00:01:49,140 the systems and organizations with whom we interact. 41 00:01:49,690 --> 00:01:53,330 We actively contribute to this future vision by 42 00:01:53,330 --> 00:01:56,360 developing the next-generation blockchain infrastructure, 43 00:01:56,360 --> 00:02:00,725 named as a Tendermint blockchain consensus protocol 44 00:02:00,725 --> 00:02:04,505 and the Inter-Blockchain Communication protocols. 45 00:02:04,505 --> 00:02:07,250 Our formal verification portfolio 46 00:02:07,250 --> 00:02:09,020 consists of the Apalache model checker, 47 00:02:09,020 --> 00:02:11,865 one of the leading model checkers for TLA+. 48 00:02:11,865 --> 00:02:14,180 We also contribute to 49 00:02:14,180 --> 00:02:18,335 this verifiable organization's vision with Themis Contract. 50 00:02:18,335 --> 00:02:21,605 Our infrastructure power is the Cosmos network. 51 00:02:21,605 --> 00:02:23,900 The Cosmos network is a dream vision of 52 00:02:23,900 --> 00:02:26,615 the next generation Internet of blockchains, 53 00:02:26,615 --> 00:02:30,755 where each blockchain can operate independently. 54 00:02:30,755 --> 00:02:32,780 At the same time, these blockchains 55 00:02:32,780 --> 00:02:35,730 can interact and trust each other. 56 00:02:37,300 --> 00:02:40,220 How do we choose this trust? 57 00:02:40,220 --> 00:02:44,405 We propose verification-driven development framework. 58 00:02:44,405 --> 00:02:48,000 We have applied this framework to our own products, 59 00:02:48,000 --> 00:02:51,045 namely, to the Tendermint family of protocols. 60 00:02:51,045 --> 00:02:52,530 We have developed English 61 00:02:52,530 --> 00:02:54,840 and TLA+ specifications of these protocols, 62 00:02:54,840 --> 00:02:57,010 model checks, and with Apalache. 63 00:02:57,010 --> 00:02:59,285 This has helped us to fix 64 00:02:59,285 --> 00:03:02,690 some protocol issues, but, most importantly, 65 00:03:02,690 --> 00:03:04,190 it helped to clarify 66 00:03:04,190 --> 00:03:07,460 the mutual protocol understanding 67 00:03:07,460 --> 00:03:10,805 between the research and the development teams. 68 00:03:10,805 --> 00:03:15,529 What's not so nice about this is that we have realized that, 69 00:03:15,529 --> 00:03:18,065 after we start to develop code, 70 00:03:18,065 --> 00:03:22,470 the code starts to diverge from the specifications. 71 00:03:22,610 --> 00:03:27,390 You see that the necessary feedback loop is missing here. 72 00:03:27,390 --> 00:03:29,180 The way we propose to address 73 00:03:29,180 --> 00:03:31,820 this issue is by model-based testing. 74 00:03:31,820 --> 00:03:34,740 In model-based testing, we generate 75 00:03:34,740 --> 00:03:40,020 integration tests from simple TLA+ assertions. 76 00:03:40,020 --> 00:03:42,690 This helps us to eliminate 77 00:03:42,690 --> 00:03:45,150 the divergence between the spec and the code, 78 00:03:45,150 --> 00:03:48,330 they now evolve in sync. 79 00:03:48,330 --> 00:03:52,950 This significantly improves ease of writing and using the tests. 80 00:03:52,950 --> 00:03:55,385 Also, it improves the test maintainability and 81 00:03:55,385 --> 00:03:59,070 helps us to achieve higher code coverage. 82 00:03:59,360 --> 00:04:04,370 This is a teaser of our model-based testing process. 83 00:04:04,370 --> 00:04:06,845 But before diving into it, 84 00:04:06,845 --> 00:04:10,670 Igor will tell you more about the Tendermint protocols, 85 00:04:10,670 --> 00:04:12,004 the Light Client protocol, 86 00:04:12,004 --> 00:04:14,130 and the Apalache model. 87 00:04:15,230 --> 00:04:18,295 >> Hey, my name is Igor Konnov. 88 00:04:18,295 --> 00:04:21,875 I'm a Principal Researcher at Informal Systems. 89 00:04:21,875 --> 00:04:25,715 I will give you a brief introduction into Tendermint light client, 90 00:04:25,715 --> 00:04:27,875 how we specify it in TLA+, 91 00:04:27,875 --> 00:04:30,230 and I'll give you an interaction in 92 00:04:30,230 --> 00:04:32,875 the model checking results with Apalache. 93 00:04:32,875 --> 00:04:35,190 Then Andrey Kuprianov will continue to 94 00:04:35,190 --> 00:04:38,175 talk with model-based testing. 95 00:04:38,175 --> 00:04:42,885 Let's have a look at Tendermint blockchain. What is that? 96 00:04:42,885 --> 00:04:44,400 That's a sequence of blocks, 97 00:04:44,400 --> 00:04:48,230 and every block has a certain structure. 98 00:04:48,230 --> 00:04:51,170 There are interesting fields here, 99 00:04:51,170 --> 00:04:53,555 one of them is a validator set. 100 00:04:53,555 --> 00:04:57,960 This is set of computers that are going to vote on this block. 101 00:04:57,960 --> 00:05:02,335 When more than two-thirds of them vote on the block, 102 00:05:02,335 --> 00:05:05,660 this block is going to be added on the blockchain. 103 00:05:05,660 --> 00:05:08,780 There is an algorithm which is called Tendermint Consensus, 104 00:05:08,780 --> 00:05:13,195 and all of these validators are running this algorithm. 105 00:05:13,195 --> 00:05:16,490 What these validators are deciding on, 106 00:05:16,490 --> 00:05:21,785 they decide on the contents of this block, 107 00:05:21,785 --> 00:05:25,290 the application state in this block. 108 00:05:25,290 --> 00:05:29,810 They also decide on who the next validators are going to be. 109 00:05:29,810 --> 00:05:34,385 When they have decided that, the validators change. 110 00:05:34,385 --> 00:05:37,000 The set of validators might change. 111 00:05:37,000 --> 00:05:40,125 When this block has been committed, 112 00:05:40,125 --> 00:05:44,660 there is another record that is formed in the next block, 113 00:05:44,660 --> 00:05:46,435 which is called LastCommit. 114 00:05:46,435 --> 00:05:48,650 This record contains the votes 115 00:05:48,650 --> 00:05:50,900 of the validators from the previous block, 116 00:05:50,900 --> 00:05:55,775 and the cache that this points into previous block. 117 00:05:55,775 --> 00:05:58,820 If you're running Tendermint Consensus as a validator, 118 00:05:58,820 --> 00:06:02,030 you can make sure that your 119 00:06:02,030 --> 00:06:06,260 transactions that you have submitted on the blockchain, 120 00:06:06,260 --> 00:06:09,875 you can figure out whether they have been committed or not. 121 00:06:09,875 --> 00:06:13,280 But what happens if you just have a smartphone? 122 00:06:13,280 --> 00:06:15,745 You cannot run Tendermint Consensus on it. 123 00:06:15,745 --> 00:06:18,800 It's just too expensive and you don't want to 124 00:06:18,800 --> 00:06:22,820 invest a lot of money into running the validator. 125 00:06:22,820 --> 00:06:26,765 That is the question, how the light client is solving. 126 00:06:26,765 --> 00:06:30,355 The light client who have some trust in block, say Block 1. 127 00:06:30,355 --> 00:06:34,325 It doesn't matter how you figured out whether to trust it or not, 128 00:06:34,325 --> 00:06:36,230 but you know you can trust it. 129 00:06:36,230 --> 00:06:37,580 Then you want to download 130 00:06:37,580 --> 00:06:40,475 another block that is far on the blockchain. 131 00:06:40,475 --> 00:06:42,170 You cannot trust anyone. 132 00:06:42,170 --> 00:06:45,945 You cannot trust the peers that give you the blocks. 133 00:06:45,945 --> 00:06:49,355 You have to use some knowledge about these blockchain 134 00:06:49,355 --> 00:06:53,730 to figure out whether you can trust the block. What can you do? 135 00:06:54,290 --> 00:06:56,880 Say, if you want to download Block 9, 136 00:06:56,880 --> 00:07:00,304 the obvious solution would be to download the whole blockchain, 137 00:07:00,304 --> 00:07:02,210 check all these digital signatures, 138 00:07:02,210 --> 00:07:04,100 check how many validators have 139 00:07:04,100 --> 00:07:06,125 voted for each block, and so on and so forth. 140 00:07:06,125 --> 00:07:08,030 But that's too expensive, especially 141 00:07:08,030 --> 00:07:09,875 if you do it on a mobile phone. 142 00:07:09,875 --> 00:07:14,360 Fortunately, we can do what is called skipping verification. 143 00:07:14,360 --> 00:07:19,790 It's using the knowledge about how this consensus is working. 144 00:07:19,790 --> 00:07:23,105 First of all, there is one interesting observation here. 145 00:07:23,105 --> 00:07:28,669 If I know that my block was committed by a set of validators, 146 00:07:28,669 --> 00:07:31,775 and more than one-third of 147 00:07:31,775 --> 00:07:36,030 these validators using the validators in a trusted block, 148 00:07:36,030 --> 00:07:37,835 then I know that at least one of 149 00:07:37,835 --> 00:07:40,145 the validators can be trusted because they 150 00:07:40,145 --> 00:07:45,505 have less than one series of false validators in our system. 151 00:07:45,505 --> 00:07:47,535 That's basically the idea. 152 00:07:47,535 --> 00:07:50,160 If you find a pair of blocks, 153 00:07:50,160 --> 00:07:52,740 such that you can establish this root of trust, 154 00:07:52,740 --> 00:07:56,025 you can trust the new block. 155 00:07:56,025 --> 00:07:59,420 Then do bisection again and find 156 00:07:59,420 --> 00:08:03,305 these pairs of blocks that can establish the trust. 157 00:08:03,305 --> 00:08:07,785 Basically, skipping verifications doing some formal bisection. 158 00:08:07,785 --> 00:08:10,260 There is one complication though. 159 00:08:10,260 --> 00:08:12,545 First of all, I have to make sure that 160 00:08:12,545 --> 00:08:15,725 more than one validators has voted. 161 00:08:15,725 --> 00:08:19,970 Second, I have to know something about time because, 162 00:08:19,970 --> 00:08:26,555 in Tendermint, you don't put your money at stake forever. 163 00:08:26,555 --> 00:08:27,800 You put it for some time. 164 00:08:27,800 --> 00:08:29,555 It's just called unbonding period. 165 00:08:29,555 --> 00:08:32,810 If I'm running the validator for, say, Block 1, 166 00:08:32,810 --> 00:08:38,550 it means that I have staked some money, 167 00:08:38,550 --> 00:08:41,060 and this money is going to be lost if 168 00:08:41,060 --> 00:08:44,285 I do something wrong on the blockchain. 169 00:08:44,285 --> 00:08:46,970 That's why it's not sufficient just 170 00:08:46,970 --> 00:08:49,505 to check how many validators have voted, 171 00:08:49,505 --> 00:08:52,205 you also have to figure out that 172 00:08:52,205 --> 00:08:54,410 all these validators are still using 173 00:08:54,410 --> 00:08:56,750 what is called the trusting period. 174 00:08:56,750 --> 00:09:02,590 You have to find that more than one-third of validators has voted, 175 00:09:02,590 --> 00:09:07,380 and your time is in 176 00:09:07,380 --> 00:09:10,290 the trusting period of those validators 177 00:09:10,290 --> 00:09:14,280 who have voted on the trusted block. 178 00:09:14,280 --> 00:09:17,805 As we have seen, Tendermint Consensus, 179 00:09:17,805 --> 00:09:20,700 the assumptions about the blockchain, 180 00:09:20,700 --> 00:09:24,835 and the Light Client protocol, they are not simple. 181 00:09:24,835 --> 00:09:28,360 That's why we have decided to formalize these assumptions 182 00:09:28,360 --> 00:09:33,590 in TLA+ and check them with a model checker. 183 00:09:33,590 --> 00:09:38,845 But before we actually sat down and wrote a TLA+ specification, 184 00:09:38,845 --> 00:09:40,360 we wrote another document, 185 00:09:40,360 --> 00:09:42,760 in which we called English Specification. 186 00:09:42,760 --> 00:09:46,270 That's how we do things at Informal Systems now. 187 00:09:46,270 --> 00:09:49,040 We write this document, 188 00:09:49,040 --> 00:09:54,025 it's a mixture of English prose and Mathematics campaigns, 189 00:09:54,025 --> 00:09:55,845 the sequential statement about 190 00:09:55,845 --> 00:09:58,310 the problem they are going to solve. 191 00:09:58,310 --> 00:10:03,140 It's a composition of this problem into distributed components. 192 00:10:03,140 --> 00:10:08,215 It also has basic specifications 193 00:10:08,215 --> 00:10:12,435 on the temporal purposes we believe this protocol should satisfy. 194 00:10:12,435 --> 00:10:15,870 It has the high level functions, 195 00:10:15,870 --> 00:10:19,475 the pre-conditions, the post-conditions, and more importantly, 196 00:10:19,475 --> 00:10:23,510 the error conditions that tell us what has 197 00:10:23,510 --> 00:10:28,030 to happen if some assumptions about our system are violated. 198 00:10:28,030 --> 00:10:30,980 The user's document has a base 199 00:10:30,980 --> 00:10:35,464 for discussions between [inaudible] with the system engineers, 200 00:10:35,464 --> 00:10:39,110 verification engineers, and researchers before 201 00:10:39,110 --> 00:10:42,630 we start diving into a formal specification, 202 00:10:42,630 --> 00:10:45,220 such as one in TLA+. 203 00:10:46,030 --> 00:10:48,950 I am not going to give you this document here, 204 00:10:48,950 --> 00:10:53,450 but you can find it on the web. Here's a link. 205 00:10:53,450 --> 00:10:57,110 You can see how one 206 00:10:57,110 --> 00:11:02,775 can understand this Light Client protocol without TLA+. 207 00:11:02,775 --> 00:11:05,340 Once this document is ready, 208 00:11:05,340 --> 00:11:08,150 we sit down and write a specification in TLA+. 209 00:11:08,150 --> 00:11:10,050 In case of Light Client, 210 00:11:10,050 --> 00:11:13,460 it actually took us several hours to translate 211 00:11:13,460 --> 00:11:15,755 the English specification into 212 00:11:15,755 --> 00:11:20,810 actual TLA+ that we could check with Apalache. 213 00:11:20,810 --> 00:11:26,115 What we focus on in the TLA+ specification, 214 00:11:26,115 --> 00:11:28,310 we focus on the model of 215 00:11:28,310 --> 00:11:31,655 the blockchain that I briefly explained to you before. 216 00:11:31,655 --> 00:11:35,510 This model captures these notions of validator sets, 217 00:11:35,510 --> 00:11:38,195 Byzantine faults, and the time assumptions 218 00:11:38,195 --> 00:11:41,340 that we have in Tendermint Consensus. 219 00:11:41,340 --> 00:11:46,175 We have a specification of the Light Client protocol in TLA+. 220 00:11:46,175 --> 00:11:51,255 In our case, the Light Client protocol is talking to a peer. 221 00:11:51,255 --> 00:11:54,340 Here is correct, or faulty, 222 00:11:54,340 --> 00:12:01,540 and it also has some time that it relates to the blockchain. 223 00:12:01,540 --> 00:12:04,300 Again, I'm not giving you the TLA specification, 224 00:12:04,300 --> 00:12:06,700 you can find that on the web. 225 00:12:06,700 --> 00:12:13,550 But I want to give you a bit of flavor about this specification. 226 00:12:14,070 --> 00:12:18,445 First of all, the light client can talk 227 00:12:18,445 --> 00:12:22,180 to either a faulty peer or a correct peer. 228 00:12:22,180 --> 00:12:23,695 The light client, of course, cannot 229 00:12:23,695 --> 00:12:26,485 know whether the peer is faulty or not, 230 00:12:26,485 --> 00:12:27,940 but it will set up 231 00:12:27,940 --> 00:12:30,550 a parameter in the specification it is telling us, 232 00:12:30,550 --> 00:12:32,290 whether the peer is faulty or not. 233 00:12:32,290 --> 00:12:34,270 If the peer is correct, 234 00:12:34,270 --> 00:12:36,130 we just copy a block from 235 00:12:36,130 --> 00:12:38,545 the blockchain and give it to the light client. 236 00:12:38,545 --> 00:12:41,170 If the peer is faulty, 237 00:12:41,170 --> 00:12:43,840 we produce a non-premise block 238 00:12:43,840 --> 00:12:46,840 that passes some basic validation tests, 239 00:12:46,840 --> 00:12:52,060 but it might have the incorrect set of validators, 240 00:12:52,060 --> 00:12:55,840 or it may have an incorrect set of commits, 241 00:12:55,840 --> 00:13:00,220 it might have some skewed time and so on and so forth. 242 00:13:00,220 --> 00:13:04,270 I just want to show you here the structure 243 00:13:04,270 --> 00:13:08,440 of the blocks that you have because it might be interesting. 244 00:13:08,440 --> 00:13:12,235 Blocks are basically records sincerely, 245 00:13:12,235 --> 00:13:14,380 they have the height, the time, 246 00:13:14,380 --> 00:13:16,915 which is just an integer now in coding. 247 00:13:16,915 --> 00:13:20,155 They have a set of commits. 248 00:13:20,155 --> 00:13:24,970 These are the validators who have voted on the previous block, 249 00:13:24,970 --> 00:13:28,465 they have this set of validators for the current block, 250 00:13:28,465 --> 00:13:32,470 and they have this set of validators for the next block. 251 00:13:32,470 --> 00:13:37,405 When a client is receiving a LightBlock, 252 00:13:37,405 --> 00:13:41,560 it gets this header plus this set of commits, 253 00:13:41,560 --> 00:13:44,620 who must have voted on that header. 254 00:13:44,620 --> 00:13:49,090 If you think about checking these data structures, 255 00:13:49,090 --> 00:13:51,610 you'll see a margin, you have a sequence of logs there. 256 00:13:51,610 --> 00:13:56,245 You can see that there is a lot of space for state explosion. 257 00:13:56,245 --> 00:14:02,005 Here you have integer time and you have four sets of validators. 258 00:14:02,005 --> 00:14:05,025 That's why they are usually Apalache. 259 00:14:05,025 --> 00:14:06,765 Again, I'm not giving you details, 260 00:14:06,765 --> 00:14:08,595 I have all details here. 261 00:14:08,595 --> 00:14:12,390 But basically, before you get on the light client, 262 00:14:12,390 --> 00:14:15,450 we initialize the blockchain just in one step, 263 00:14:15,450 --> 00:14:20,530 and when we initialize it we just guess the validator sets, 264 00:14:20,530 --> 00:14:23,815 the sets of commits and time stamps, 265 00:14:23,815 --> 00:14:26,890 so they form a correct blockchain 266 00:14:26,890 --> 00:14:31,435 where more than two-thirds of validators have voted, 267 00:14:31,435 --> 00:14:35,460 also correct and not correct timestamps 268 00:14:35,460 --> 00:14:38,040 are grown on a time period. 269 00:14:38,040 --> 00:14:41,175 Let's see how we can verify 270 00:14:41,175 --> 00:14:45,010 the specification on the light client with our model checker. 271 00:14:45,010 --> 00:14:48,070 Apalache, if you haven't seen this tool before, 272 00:14:48,070 --> 00:14:51,655 you can go on that webpage that's mention above. 273 00:14:51,655 --> 00:14:56,725 Read the manual and see for yourself how it's working. 274 00:14:56,725 --> 00:14:58,810 From the bird's eye perspective, 275 00:14:58,810 --> 00:15:01,480 what it does, it takes the TLA specification, 276 00:15:01,480 --> 00:15:04,855 it unfolds all the operator definitions, 277 00:15:04,855 --> 00:15:07,810 including the recursive operator definitions up to 278 00:15:07,810 --> 00:15:12,310 certain bounds and finds what we call assignments. 279 00:15:12,310 --> 00:15:15,355 Similar to what you'll see this dynamically. 280 00:15:15,355 --> 00:15:17,680 It decomposes a specification into 281 00:15:17,680 --> 00:15:20,394 smaller pieces that we call symbolic transitions, 282 00:15:20,394 --> 00:15:23,815 roughly corresponds to actions and the specification, 283 00:15:23,815 --> 00:15:26,545 then it does some simple tab checking. 284 00:15:26,545 --> 00:15:28,960 Although what it does now is going to be 285 00:15:28,960 --> 00:15:30,760 superseded by the new type checker 286 00:15:30,760 --> 00:15:32,725 and type inference you heard about. 287 00:15:32,725 --> 00:15:34,840 This means magnet as well. 288 00:15:34,840 --> 00:15:36,999 Then it does a translational, 289 00:15:36,999 --> 00:15:40,540 particularly specification into SMT solver. 290 00:15:40,540 --> 00:15:44,830 As you probably know, SMT solver is quite powerful and we want to 291 00:15:44,830 --> 00:15:49,900 utilize their power for verification questions in TLA+. 292 00:15:49,900 --> 00:15:53,590 What Apalache does in a bit more detail, 293 00:15:53,590 --> 00:15:56,620 it actually performs bounded model checking. 294 00:15:56,620 --> 00:16:00,850 Given a specification in the following canonical form that tells, 295 00:16:00,850 --> 00:16:04,690 you also expect Apalache 296 00:16:04,690 --> 00:16:08,770 first checks whether the environment can be evaluated. 297 00:16:08,770 --> 00:16:13,030 The initial state translates this formula into SMT and 298 00:16:13,030 --> 00:16:14,830 pass this SMT solver if it can 299 00:16:14,830 --> 00:16:17,800 find a satisfying assignment for this query. 300 00:16:17,800 --> 00:16:20,229 If it has found a satisfying assignment, 301 00:16:20,229 --> 00:16:23,365 Apalache will produce a counterexample in TLA form. 302 00:16:23,365 --> 00:16:26,739 Otherwise, it will continue to unroll executions, 303 00:16:26,739 --> 00:16:29,620 it'll introduce one step, two steps, 304 00:16:29,620 --> 00:16:33,310 three steps, until it has either found a counterexample with 305 00:16:33,310 --> 00:16:37,345 SMT or it has reached a certain depth. 306 00:16:37,345 --> 00:16:39,700 That's why it's called Bound Model Checking. 307 00:16:39,700 --> 00:16:42,084 Fortunately, for the light client's specification, 308 00:16:42,084 --> 00:16:46,930 you can provide the depths given the length of the blockchain. 309 00:16:46,930 --> 00:16:51,280 Re-run Apalache on different models 310 00:16:51,280 --> 00:16:53,920 produced from the light client specification. 311 00:16:53,920 --> 00:16:58,720 There we had from 4-7 validators and 312 00:16:58,720 --> 00:17:04,045 from 3-5 blocks in the blockchain. 313 00:17:04,045 --> 00:17:07,495 Also, we had either faulty or correct peers, 314 00:17:07,495 --> 00:17:09,940 the light client was talking to. 315 00:17:09,940 --> 00:17:12,490 This set of huge numbers, however, 316 00:17:12,490 --> 00:17:16,900 recalls that we had all this power sets in 317 00:17:16,900 --> 00:17:18,280 the blockchains and 318 00:17:18,280 --> 00:17:21,445 three key relations including cardinality comparisons. 319 00:17:21,445 --> 00:17:24,865 We also have found some violations of the purposes, 320 00:17:24,865 --> 00:17:28,000 which is interesting because these violations were 321 00:17:28,000 --> 00:17:31,450 predicted by our researchers. 322 00:17:31,450 --> 00:17:36,590 I just want to show you how Apalache runs. 323 00:17:36,840 --> 00:17:40,210 Without any compilation or whatever, 324 00:17:40,210 --> 00:17:43,615 you can just pull the tool from Docker, 325 00:17:43,615 --> 00:17:45,910 it will pull a Docker image. 326 00:17:45,910 --> 00:17:49,870 In my case, it's fresh, 327 00:17:49,870 --> 00:17:51,625 so it doesn't do anything. 328 00:17:51,625 --> 00:17:55,105 Then they run the Docker image around Apalache. 329 00:17:55,105 --> 00:17:59,320 The Docker image asks you to check this variant and 330 00:17:59,320 --> 00:18:04,210 give it the model of five validators and five blocks. 331 00:18:04,210 --> 00:18:07,705 What Apalache does now inside the Docker image, 332 00:18:07,705 --> 00:18:10,915 it does the pre-processing I told you about. 333 00:18:10,915 --> 00:18:15,380 Then it checks the invariant for the initial state, 334 00:18:16,140 --> 00:18:20,950 all possible states after one step, after two steps, 335 00:18:20,950 --> 00:18:27,560 after three steps, and soon it's going to report a counterexample. 336 00:18:29,160 --> 00:18:31,300 Here's a counter-example. 337 00:18:31,300 --> 00:18:34,465 We can go to counter-example of a TLA and 338 00:18:34,465 --> 00:18:38,185 see the exact values produced by Apalache, 339 00:18:38,185 --> 00:18:44,150 and you see that violates these specific invariants. 340 00:18:44,220 --> 00:18:48,440 Now, I give the floor to Andrey. 341 00:18:50,820 --> 00:18:54,430 >> Now, we continue with the model-based testing. 342 00:18:54,430 --> 00:18:55,870 But before diving into it, 343 00:18:55,870 --> 00:19:03,100 I want to describe how standard testing looks as before. 344 00:19:03,100 --> 00:19:06,130 A test engineer who uses a tendermint-go 345 00:19:06,130 --> 00:19:08,920 implementation to manually write 346 00:19:08,920 --> 00:19:12,040 integration tests that goes round 347 00:19:12,040 --> 00:19:16,760 against both tendermint-rs and tendermint-go implementations. 348 00:19:16,920 --> 00:19:19,570 This is an example of 349 00:19:19,570 --> 00:19:24,295 a pretty simple manually written integration test. 350 00:19:24,295 --> 00:19:28,495 It just constructs a test case, 351 00:19:28,495 --> 00:19:33,025 which is a sequence of steps for the light plan protocol. 352 00:19:33,025 --> 00:19:37,825 Then this test case is translated into the JSON and 353 00:19:37,825 --> 00:19:40,390 all the necessary data structures which contain 354 00:19:40,390 --> 00:19:44,420 such details as public keys and hashes. 355 00:19:44,460 --> 00:19:49,704 It's executed by the static driver 356 00:19:49,704 --> 00:19:52,330 and this static driver is a pretty simple piece 357 00:19:52,330 --> 00:19:55,075 of code which just takes this case, 358 00:19:55,075 --> 00:19:57,580 executes the light plan verifier and 359 00:19:57,580 --> 00:20:00,475 compares it to returns at each step, 360 00:20:00,475 --> 00:20:03,680 the deep sets expected from the test. 361 00:20:06,570 --> 00:20:10,660 Standard testing has a number of drawbacks. 362 00:20:10,660 --> 00:20:12,250 First of all, it's manual generation. 363 00:20:12,250 --> 00:20:13,660 With lots of effort, 364 00:20:13,660 --> 00:20:17,290 this test that we solved was pretty simple, actually. 365 00:20:17,290 --> 00:20:19,960 More complicated cases, it's non-trivial logic 366 00:20:19,960 --> 00:20:22,480 required and it's inflexible. 367 00:20:22,480 --> 00:20:23,890 It fixes all the parameters, 368 00:20:23,890 --> 00:20:27,400 non determinism left or exploration. 369 00:20:27,400 --> 00:20:31,375 It's also hard to understand what the particular test case does. 370 00:20:31,375 --> 00:20:34,660 Its hard to maintain them and it's hard to cover 371 00:20:34,660 --> 00:20:38,735 all the case because we will need hundreds of those tests. 372 00:20:38,735 --> 00:20:42,390 Next, I will switch to model-based testing. 373 00:20:42,390 --> 00:20:46,350 Now, we will look at its components. 374 00:20:46,350 --> 00:20:49,380 But before looking at that particular view, 375 00:20:49,380 --> 00:20:52,030 we will run a short demo. 376 00:20:52,770 --> 00:20:58,885 This is a short demo of how our model-based testing is working. 377 00:20:58,885 --> 00:21:00,760 It's fully integrated into 378 00:21:00,760 --> 00:21:05,590 the standard cargo test command as a test. 379 00:21:05,590 --> 00:21:07,840 But when we run this cargo test, 380 00:21:07,840 --> 00:21:12,895 among what happens for each of the TLA+ tests, 381 00:21:12,895 --> 00:21:14,800 in this case it's a success, 382 00:21:14,800 --> 00:21:17,485 it runs Apalache on this test. 383 00:21:17,485 --> 00:21:20,440 Apalache produces a counterexample 384 00:21:20,440 --> 00:21:22,855 which contains the example execution. 385 00:21:22,855 --> 00:21:27,940 This execution is then translated into the static test, 386 00:21:27,940 --> 00:21:32,740 and this test is run against the implementation. 387 00:21:32,740 --> 00:21:39,860 In this case, we have the light client protocol and we execute 388 00:21:39,860 --> 00:21:43,800 this light client protocol against the static test and compare 389 00:21:43,800 --> 00:21:48,690 the verdict that we obtain against the expected verdict. 390 00:21:48,690 --> 00:21:51,360 What we can see in this case is that 391 00:21:51,360 --> 00:21:54,155 running Apalache does take some time. 392 00:21:54,155 --> 00:21:58,765 But as a byproduct of those model-based test, 393 00:21:58,765 --> 00:22:00,325 we obtain a static tests, 394 00:22:00,325 --> 00:22:02,710 and this static test can then be executed 395 00:22:02,710 --> 00:22:06,470 independently in almost no time. 396 00:22:10,800 --> 00:22:15,040 Now, how does a TLA+ test look like? 397 00:22:15,040 --> 00:22:19,550 TLA+ test is a pretty simple TLA+ resource. 398 00:22:20,010 --> 00:22:23,275 It's very easy to understand for 399 00:22:23,275 --> 00:22:26,035 each particular test what it does. 400 00:22:26,035 --> 00:22:27,610 I very much like, for example. 401 00:22:27,610 --> 00:22:36,040 This test says that for each state in the computational history, 402 00:22:36,040 --> 00:22:40,405 the validator sets would be different. 403 00:22:40,405 --> 00:22:44,440 You can also express assertions about time, 404 00:22:44,440 --> 00:22:49,165 for example, or about the final state of the protocol. 405 00:22:49,165 --> 00:22:51,970 How do we integrate this test 406 00:22:51,970 --> 00:22:54,535 with the LightClient model you saw before? 407 00:22:54,535 --> 00:22:57,070 This is also very easy due to 408 00:22:57,070 --> 00:22:59,740 the modular nature of the TLA+ language. 409 00:22:59,740 --> 00:23:04,405 We simply extend the Init and Next predicates, 410 00:23:04,405 --> 00:23:06,310 we use the history tracking. 411 00:23:06,310 --> 00:23:09,010 I've introduced this additional history variable, 412 00:23:09,010 --> 00:23:11,020 and this variable just tracks 413 00:23:11,020 --> 00:23:14,379 the necessary state components of the LightClient protocol, 414 00:23:14,379 --> 00:23:16,420 and at each step of 415 00:23:16,420 --> 00:23:20,860 the model execution besides executing this model itself, 416 00:23:20,860 --> 00:23:26,090 we're also updating the history for the next step. 417 00:23:27,450 --> 00:23:31,960 For the model checker, we needed only to extend it to 418 00:23:31,960 --> 00:23:36,040 this JSON input-output support for integration purposes. 419 00:23:36,040 --> 00:23:38,680 In fact, you can now use Apalache 420 00:23:38,680 --> 00:23:41,365 as a translator between TLA+ and JSON, 421 00:23:41,365 --> 00:23:45,385 and we highly encourage everyone interested to use this feature. 422 00:23:45,385 --> 00:23:46,960 You could use it, for example, 423 00:23:46,960 --> 00:23:50,270 to integrate TLA+ with your tools. 424 00:23:50,790 --> 00:23:53,890 Now, how counterexample looks like, 425 00:23:53,890 --> 00:23:58,345 a counterexample is a sequence of history states. 426 00:23:58,345 --> 00:24:03,715 At each state of this history we have the current block, 427 00:24:03,715 --> 00:24:04,990 the current time point, 428 00:24:04,990 --> 00:24:08,095 the verdict that we expect from the LightClient, 429 00:24:08,095 --> 00:24:11,200 and the latest verified block. 430 00:24:11,200 --> 00:24:13,600 This has covered a small portion of 431 00:24:13,600 --> 00:24:17,200 this history it translated into JSON. 432 00:24:17,200 --> 00:24:19,540 Don't be discouraged by the large size of the encoding, 433 00:24:19,540 --> 00:24:22,400 this is meant only for automatic purchase. 434 00:24:22,860 --> 00:24:26,395 Now, as a next step we transform 435 00:24:26,395 --> 00:24:28,960 our counterexample with the help of 436 00:24:28,960 --> 00:24:30,790 another component which is called 437 00:24:30,790 --> 00:24:34,630 JSONata into the static integration test. 438 00:24:34,630 --> 00:24:37,060 This transformation happens with 439 00:24:37,060 --> 00:24:39,685 the help of so-called transformation specification, 440 00:24:39,685 --> 00:24:44,920 which is a small domain-specific language. 441 00:24:44,920 --> 00:24:49,705 We described how to transform one JSON into another JSON. 442 00:24:49,705 --> 00:24:54,040 An important part of this translation is that it calls 443 00:24:54,040 --> 00:24:58,570 so-called tendermint-testgen command which allows us to 444 00:24:58,570 --> 00:25:02,500 produce whole Tender mint data structures 445 00:25:02,500 --> 00:25:07,555 from the abstract data structures that we have in the TLA+ model. 446 00:25:07,555 --> 00:25:11,245 Let's take a look at how TestGen works. 447 00:25:11,245 --> 00:25:15,160 Effectively, you can see it as a prob into 448 00:25:15,160 --> 00:25:19,960 the concrete implementations space from the abstract TLA+ space. 449 00:25:19,960 --> 00:25:22,180 For example, in the abstract space it will 450 00:25:22,180 --> 00:25:24,700 identify as a simple string identifier. 451 00:25:24,700 --> 00:25:27,880 In the concrete implementation space, 452 00:25:27,880 --> 00:25:32,680 it contains private and public keys and additional details. 453 00:25:32,680 --> 00:25:36,145 A header in the abstract space just lists 454 00:25:36,145 --> 00:25:42,100 all simple abstract details like the validator sets, 455 00:25:42,100 --> 00:25:43,240 the Next validator sets, 456 00:25:43,240 --> 00:25:45,535 the height, the time, and the lastCommit. 457 00:25:45,535 --> 00:25:50,650 This translates into a fully-fledged data structure 458 00:25:50,650 --> 00:25:53,510 with all the necessary hashes. 459 00:25:54,720 --> 00:25:57,490 Now, to the results. One of 460 00:25:57,490 --> 00:26:01,060 the most important result about model-based testing is, 461 00:26:01,060 --> 00:26:02,380 from my point of view, 462 00:26:02,380 --> 00:26:04,390 that it help to eliminate 463 00:26:04,390 --> 00:26:08,000 the divergence between the spec and the code. 464 00:26:08,220 --> 00:26:12,550 Each case that you see here results 465 00:26:12,550 --> 00:26:15,925 from a fail in model-based test for example, 466 00:26:15,925 --> 00:26:17,680 it was able to gauge 467 00:26:17,680 --> 00:26:20,635 such corner cases as the difference between less, 468 00:26:20,635 --> 00:26:22,345 and less than or equal. 469 00:26:22,345 --> 00:26:24,980 In this particular case, 470 00:26:25,740 --> 00:26:29,890 the TLA+ specification actually diverts both from 471 00:26:29,890 --> 00:26:34,090 the English specification and from the source code. 472 00:26:34,090 --> 00:26:38,620 In another cases, we edit, for example, 473 00:26:38,620 --> 00:26:43,330 additional condition to the specification 474 00:26:43,330 --> 00:26:44,815 because it will generate in 475 00:26:44,815 --> 00:26:48,115 negative times because it was simply integers. 476 00:26:48,115 --> 00:26:53,125 Then another corner case in less than equal and less. 477 00:26:53,125 --> 00:26:57,700 It was also missing another check in the specification, 478 00:26:57,700 --> 00:27:01,880 that the headers are not coming from the future. 479 00:27:02,670 --> 00:27:06,340 Model-based testing helped us to substantially 480 00:27:06,340 --> 00:27:08,500 improve the goal coverage as you can 481 00:27:08,500 --> 00:27:11,530 see for the relevant parts of the source tree, 482 00:27:11,530 --> 00:27:15,220 and it already helped us to find the real bug. 483 00:27:15,220 --> 00:27:17,965 All of this is on the first iteration. 484 00:27:17,965 --> 00:27:19,660 Recently, we have updated 485 00:27:19,660 --> 00:27:24,955 the Tendermint protocol from version 033 to version 034, 486 00:27:24,955 --> 00:27:30,595 and the validators are set ordering change between the version. 487 00:27:30,595 --> 00:27:36,400 An important aspect here is that this bug was missed both by 488 00:27:36,400 --> 00:27:39,010 the standard testing and it was also 489 00:27:39,010 --> 00:27:42,520 not properly documented in the specifications of the product, 490 00:27:42,520 --> 00:27:45,700 but it was caught by the model-based test. 491 00:27:45,700 --> 00:27:48,985 It was a pretty simple fix, as you can see here. 492 00:27:48,985 --> 00:27:52,960 Now, to conclude, model-based testing 493 00:27:52,960 --> 00:27:58,075 substantially improves ease of writing and uses of tests. 494 00:27:58,075 --> 00:28:00,580 It's much easier to maintain 495 00:28:00,580 --> 00:28:04,030 them and it improves the code coverage. 496 00:28:04,030 --> 00:28:06,070 The most important aspect I think is 497 00:28:06,070 --> 00:28:08,620 that it makes the specifications live. 498 00:28:08,620 --> 00:28:10,780 Now, a specification evolves 499 00:28:10,780 --> 00:28:13,990 continuously with the code that it specifies. 500 00:28:13,990 --> 00:28:18,655 Although we have invested some time into infrastructure, 501 00:28:18,655 --> 00:28:23,080 I think the benefits substantially outweigh those investments. 502 00:28:23,080 --> 00:28:25,120 As a future work, 503 00:28:25,120 --> 00:28:29,005 currently we are working on adding fuzzing to model-based testing, 504 00:28:29,005 --> 00:28:30,805 and the fuzzing will understand 505 00:28:30,805 --> 00:28:34,029 additional mutation on the level of data structures. 506 00:28:34,029 --> 00:28:36,580 This allows us to cover scenarios which 507 00:28:36,580 --> 00:28:39,160 are currently inexpressible in the abstract model. 508 00:28:39,160 --> 00:28:41,095 For example, in the abstract model 509 00:28:41,095 --> 00:28:44,290 a validator set is a mathematical set, 510 00:28:44,290 --> 00:28:47,425 so it can't contain two identical validators. 511 00:28:47,425 --> 00:28:49,240 At the same time for the source code, 512 00:28:49,240 --> 00:28:51,715 you want to test such cases. 513 00:28:51,715 --> 00:28:56,860 We plan to add extensions to the Apalache model checker for 514 00:28:56,860 --> 00:28:59,650 faster counterexample search because counterexamples 515 00:28:59,650 --> 00:29:03,145 can be found using specialized techniques. 516 00:29:03,145 --> 00:29:08,485 Currently, Apalache is more attached towards invariant checking. 517 00:29:08,485 --> 00:29:13,705 We plan to generate executable code from TLA+ specifications, 518 00:29:13,705 --> 00:29:16,450 and this means just resolving 519 00:29:16,450 --> 00:29:20,500 the old monitoring mechanisms that is present in the TLA+. 520 00:29:20,500 --> 00:29:24,400 This will allow us to substantially speed up testing. 521 00:29:24,400 --> 00:29:26,020 We plan to extend 522 00:29:26,020 --> 00:29:27,820 the model-based testing framework to 523 00:29:27,820 --> 00:29:30,190 distributed setting where we will cut 524 00:29:30,190 --> 00:29:32,890 the system at the interface points and 525 00:29:32,890 --> 00:29:36,025 replace some of the models with executable specifications, 526 00:29:36,025 --> 00:29:38,770 and that's where we will test 527 00:29:38,770 --> 00:29:44,590 a mixture of the specs and the real code in the running system. 528 00:29:44,590 --> 00:29:47,210 Thank you for your attention.