1
00:00:00,710 --> 00:00:02,820
>> Welcome to our talk about

2
00:00:02,820 --> 00:00:05,570
model-based testing
with TLA+ and Apalache.

3
00:00:05,570 --> 00:00:08,795
This is a joint work of
Igor Konnov and myself.

4
00:00:08,795 --> 00:00:12,755
We both are proud to be
members of Informal Systems.

5
00:00:12,755 --> 00:00:14,935
Igor Konnov is the
Principal Scientist

6
00:00:14,935 --> 00:00:17,700
developing next generation
verification methods.

7
00:00:17,700 --> 00:00:20,970
He's also the Leading Developer
of Apalache model checker.

8
00:00:20,970 --> 00:00:24,660
I'm a Research Engineer at
Informal who also contributes to

9
00:00:24,660 --> 00:00:29,220
Apalache and leads the product
on model-based testing.

10
00:00:29,220 --> 00:00:33,690
[MUSIC]

11
00:00:33,690 --> 00:00:35,910
>> We're using Rust Informal
verification tools to

12
00:00:35,910 --> 00:00:38,530
build open source fault-tolerant
distributed systems,

13
00:00:38,530 --> 00:00:41,270
like the Tenement Byzantine
Fault Tolerant Consensus engine

14
00:00:41,270 --> 00:00:43,610
and the Inter-Blockchain
Communication protocol.

15
00:00:43,610 --> 00:00:46,300
We're cooperatively owned and
governed by our employees.

16
00:00:46,300 --> 00:00:47,805
We have offices in Toronto,

17
00:00:47,805 --> 00:00:49,685
Berlin, Vienna, and Lausanne.

18
00:00:49,685 --> 00:00:52,835
In fact, a few of the folks at
Informal would love to say hello.

19
00:00:52,835 --> 00:00:54,785
>> Hello.

20
00:00:54,785 --> 00:00:57,055
>> Hello, world.

21
00:00:57,055 --> 00:00:58,605
>> Hi.

22
00:00:58,605 --> 00:00:59,730
>> Hi, everyone.

23
00:00:59,730 --> 00:01:01,170
>> Hello.

24
00:01:01,170 --> 00:01:02,625
>> Hey, everyone.

25
00:01:02,625 --> 00:01:03,900
>> Hello.

26
00:01:03,900 --> 00:01:04,845
>> Hi.

27
00:01:04,845 --> 00:01:06,015
>> Hey, everyone.

28
00:01:06,015 --> 00:01:08,070
>> You can learn more about
us, check out our work,

29
00:01:08,070 --> 00:01:11,845
and contribute online at
github.com/informalsystems.

30
00:01:11,845 --> 00:01:13,340
We hope you enjoy the conference.

31
00:01:13,340 --> 00:01:19,440
[MUSIC]

32
00:01:19,440 --> 00:01:21,405
>> Who we are. At Informal Systems,

33
00:01:21,405 --> 00:01:24,455
we envision the bright future

34
00:01:24,455 --> 00:01:27,875
of verifiable distributed
systems and organizations.

35
00:01:27,875 --> 00:01:31,620
Nowadays, everything
becomes distributed.

36
00:01:32,030 --> 00:01:36,335
At the same time, when we interact
with systems and organizations,

37
00:01:36,335 --> 00:01:38,810
we need a certain level of trust.

38
00:01:38,810 --> 00:01:42,950
Our grand vision is that formal
verification can help us to bring

39
00:01:42,950 --> 00:01:44,930
this necessary level of trust to

40
00:01:44,930 --> 00:01:49,140
the systems and organizations
with whom we interact.

41
00:01:49,690 --> 00:01:53,330
We actively contribute
to this future vision by

42
00:01:53,330 --> 00:01:56,360
developing the next-generation
blockchain infrastructure,

43
00:01:56,360 --> 00:02:00,725
named as a Tendermint
blockchain consensus protocol

44
00:02:00,725 --> 00:02:04,505
and the Inter-Blockchain
Communication protocols.

45
00:02:04,505 --> 00:02:07,250
Our formal verification portfolio

46
00:02:07,250 --> 00:02:09,020
consists of the
Apalache model checker,

47
00:02:09,020 --> 00:02:11,865
one of the leading model
checkers for TLA+.

48
00:02:11,865 --> 00:02:14,180
We also contribute to

49
00:02:14,180 --> 00:02:18,335
this verifiable organization's
vision with Themis Contract.

50
00:02:18,335 --> 00:02:21,605
Our infrastructure power
is the Cosmos network.

51
00:02:21,605 --> 00:02:23,900
The Cosmos network
is a dream vision of

52
00:02:23,900 --> 00:02:26,615
the next generation
Internet of blockchains,

53
00:02:26,615 --> 00:02:30,755
where each blockchain can
operate independently.

54
00:02:30,755 --> 00:02:32,780
At the same time, these blockchains

55
00:02:32,780 --> 00:02:35,730
can interact and trust each other.

56
00:02:37,300 --> 00:02:40,220
How do we choose this trust?

57
00:02:40,220 --> 00:02:44,405
We propose verification-driven
development framework.

58
00:02:44,405 --> 00:02:48,000
We have applied this framework
to our own products,

59
00:02:48,000 --> 00:02:51,045
namely, to the Tendermint
family of protocols.

60
00:02:51,045 --> 00:02:52,530
We have developed English

61
00:02:52,530 --> 00:02:54,840
and TLA+ specifications
of these protocols,

62
00:02:54,840 --> 00:02:57,010
model checks, and with Apalache.

63
00:02:57,010 --> 00:02:59,285
This has helped us to fix

64
00:02:59,285 --> 00:03:02,690
some protocol issues,
but, most importantly,

65
00:03:02,690 --> 00:03:04,190
it helped to clarify

66
00:03:04,190 --> 00:03:07,460
the mutual protocol understanding

67
00:03:07,460 --> 00:03:10,805
between the research and
the development teams.

68
00:03:10,805 --> 00:03:15,529
What's not so nice about this
is that we have realized that,

69
00:03:15,529 --> 00:03:18,065
after we start to develop code,

70
00:03:18,065 --> 00:03:22,470
the code starts to diverge
from the specifications.

71
00:03:22,610 --> 00:03:27,390
You see that the necessary
feedback loop is missing here.

72
00:03:27,390 --> 00:03:29,180
The way we propose to address

73
00:03:29,180 --> 00:03:31,820
this issue is by model-based testing.

74
00:03:31,820 --> 00:03:34,740
In model-based testing, we generate

75
00:03:34,740 --> 00:03:40,020
integration tests from
simple TLA+ assertions.

76
00:03:40,020 --> 00:03:42,690
This helps us to eliminate

77
00:03:42,690 --> 00:03:45,150
the divergence between
the spec and the code,

78
00:03:45,150 --> 00:03:48,330
they now evolve in sync.

79
00:03:48,330 --> 00:03:52,950
This significantly improves ease
of writing and using the tests.

80
00:03:52,950 --> 00:03:55,385
Also, it improves the
test maintainability and

81
00:03:55,385 --> 00:03:59,070
helps us to achieve
higher code coverage.

82
00:03:59,360 --> 00:04:04,370
This is a teaser of our
model-based testing process.

83
00:04:04,370 --> 00:04:06,845
But before diving into it,

84
00:04:06,845 --> 00:04:10,670
Igor will tell you more about
the Tendermint protocols,

85
00:04:10,670 --> 00:04:12,004
the Light Client protocol,

86
00:04:12,004 --> 00:04:14,130
and the Apalache model.

87
00:04:15,230 --> 00:04:18,295
>> Hey, my name is Igor Konnov.

88
00:04:18,295 --> 00:04:21,875
I'm a Principal Researcher
at Informal Systems.

89
00:04:21,875 --> 00:04:25,715
I will give you a brief introduction
into Tendermint light client,

90
00:04:25,715 --> 00:04:27,875
how we specify it in TLA+,

91
00:04:27,875 --> 00:04:30,230
and I'll give you an interaction in

92
00:04:30,230 --> 00:04:32,875
the model checking
results with Apalache.

93
00:04:32,875 --> 00:04:35,190
Then Andrey Kuprianov
will continue to

94
00:04:35,190 --> 00:04:38,175
talk with model-based testing.

95
00:04:38,175 --> 00:04:42,885
Let's have a look at Tendermint
blockchain. What is that?

96
00:04:42,885 --> 00:04:44,400
That's a sequence of blocks,

97
00:04:44,400 --> 00:04:48,230
and every block has
a certain structure.

98
00:04:48,230 --> 00:04:51,170
There are interesting fields here,

99
00:04:51,170 --> 00:04:53,555
one of them is a validator set.

100
00:04:53,555 --> 00:04:57,960
This is set of computers that
are going to vote on this block.

101
00:04:57,960 --> 00:05:02,335
When more than two-thirds
of them vote on the block,

102
00:05:02,335 --> 00:05:05,660
this block is going to be
added on the blockchain.

103
00:05:05,660 --> 00:05:08,780
There is an algorithm which is
called Tendermint Consensus,

104
00:05:08,780 --> 00:05:13,195
and all of these validators
are running this algorithm.

105
00:05:13,195 --> 00:05:16,490
What these validators
are deciding on,

106
00:05:16,490 --> 00:05:21,785
they decide on the
contents of this block,

107
00:05:21,785 --> 00:05:25,290
the application state in this block.

108
00:05:25,290 --> 00:05:29,810
They also decide on who the next
validators are going to be.

109
00:05:29,810 --> 00:05:34,385
When they have decided that,
the validators change.

110
00:05:34,385 --> 00:05:37,000
The set of validators might change.

111
00:05:37,000 --> 00:05:40,125
When this block has been committed,

112
00:05:40,125 --> 00:05:44,660
there is another record that
is formed in the next block,

113
00:05:44,660 --> 00:05:46,435
which is called LastCommit.

114
00:05:46,435 --> 00:05:48,650
This record contains the votes

115
00:05:48,650 --> 00:05:50,900
of the validators from
the previous block,

116
00:05:50,900 --> 00:05:55,775
and the cache that this
points into previous block.

117
00:05:55,775 --> 00:05:58,820
If you're running Tendermint
Consensus as a validator,

118
00:05:58,820 --> 00:06:02,030
you can make sure that your

119
00:06:02,030 --> 00:06:06,260
transactions that you have
submitted on the blockchain,

120
00:06:06,260 --> 00:06:09,875
you can figure out whether they
have been committed or not.

121
00:06:09,875 --> 00:06:13,280
But what happens if you
just have a smartphone?

122
00:06:13,280 --> 00:06:15,745
You cannot run Tendermint
Consensus on it.

123
00:06:15,745 --> 00:06:18,800
It's just too expensive
and you don't want to

124
00:06:18,800 --> 00:06:22,820
invest a lot of money into
running the validator.

125
00:06:22,820 --> 00:06:26,765
That is the question, how
the light client is solving.

126
00:06:26,765 --> 00:06:30,355
The light client who have some
trust in block, say Block 1.

127
00:06:30,355 --> 00:06:34,325
It doesn't matter how you figured
out whether to trust it or not,

128
00:06:34,325 --> 00:06:36,230
but you know you can trust it.

129
00:06:36,230 --> 00:06:37,580
Then you want to download

130
00:06:37,580 --> 00:06:40,475
another block that is
far on the blockchain.

131
00:06:40,475 --> 00:06:42,170
You cannot trust anyone.

132
00:06:42,170 --> 00:06:45,945
You cannot trust the peers
that give you the blocks.

133
00:06:45,945 --> 00:06:49,355
You have to use some knowledge
about these blockchain

134
00:06:49,355 --> 00:06:53,730
to figure out whether you can
trust the block. What can you do?

135
00:06:54,290 --> 00:06:56,880
Say, if you want to download Block 9,

136
00:06:56,880 --> 00:07:00,304
the obvious solution would be to
download the whole blockchain,

137
00:07:00,304 --> 00:07:02,210
check all these digital signatures,

138
00:07:02,210 --> 00:07:04,100
check how many validators have

139
00:07:04,100 --> 00:07:06,125
voted for each block,
and so on and so forth.

140
00:07:06,125 --> 00:07:08,030
But that's too expensive, especially

141
00:07:08,030 --> 00:07:09,875
if you do it on a mobile phone.

142
00:07:09,875 --> 00:07:14,360
Fortunately, we can do what is
called skipping verification.

143
00:07:14,360 --> 00:07:19,790
It's using the knowledge about
how this consensus is working.

144
00:07:19,790 --> 00:07:23,105
First of all, there is one
interesting observation here.

145
00:07:23,105 --> 00:07:28,669
If I know that my block was
committed by a set of validators,

146
00:07:28,669 --> 00:07:31,775
and more than one-third of

147
00:07:31,775 --> 00:07:36,030
these validators using the
validators in a trusted block,

148
00:07:36,030 --> 00:07:37,835
then I know that at least one of

149
00:07:37,835 --> 00:07:40,145
the validators can be
trusted because they

150
00:07:40,145 --> 00:07:45,505
have less than one series of
false validators in our system.

151
00:07:45,505 --> 00:07:47,535
That's basically the idea.

152
00:07:47,535 --> 00:07:50,160
If you find a pair of blocks,

153
00:07:50,160 --> 00:07:52,740
such that you can establish
this root of trust,

154
00:07:52,740 --> 00:07:56,025
you can trust the new block.

155
00:07:56,025 --> 00:07:59,420
Then do bisection again and find

156
00:07:59,420 --> 00:08:03,305
these pairs of blocks that
can establish the trust.

157
00:08:03,305 --> 00:08:07,785
Basically, skipping verifications
doing some formal bisection.

158
00:08:07,785 --> 00:08:10,260
There is one complication though.

159
00:08:10,260 --> 00:08:12,545
First of all, I have
to make sure that

160
00:08:12,545 --> 00:08:15,725
more than one validators has voted.

161
00:08:15,725 --> 00:08:19,970
Second, I have to know
something about time because,

162
00:08:19,970 --> 00:08:26,555
in Tendermint, you don't put
your money at stake forever.

163
00:08:26,555 --> 00:08:27,800
You put it for some time.

164
00:08:27,800 --> 00:08:29,555
It's just called unbonding period.

165
00:08:29,555 --> 00:08:32,810
If I'm running the validator
for, say, Block 1,

166
00:08:32,810 --> 00:08:38,550
it means that I have
staked some money,

167
00:08:38,550 --> 00:08:41,060
and this money is going to be lost if

168
00:08:41,060 --> 00:08:44,285
I do something wrong
on the blockchain.

169
00:08:44,285 --> 00:08:46,970
That's why it's not sufficient just

170
00:08:46,970 --> 00:08:49,505
to check how many
validators have voted,

171
00:08:49,505 --> 00:08:52,205
you also have to figure out that

172
00:08:52,205 --> 00:08:54,410
all these validators are still using

173
00:08:54,410 --> 00:08:56,750
what is called the trusting period.

174
00:08:56,750 --> 00:09:02,590
You have to find that more than
one-third of validators has voted,

175
00:09:02,590 --> 00:09:07,380
and your time is in

176
00:09:07,380 --> 00:09:10,290
the trusting period
of those validators

177
00:09:10,290 --> 00:09:14,280
who have voted on the trusted block.

178
00:09:14,280 --> 00:09:17,805
As we have seen,
Tendermint Consensus,

179
00:09:17,805 --> 00:09:20,700
the assumptions about the blockchain,

180
00:09:20,700 --> 00:09:24,835
and the Light Client protocol,
they are not simple.

181
00:09:24,835 --> 00:09:28,360
That's why we have decided to
formalize these assumptions

182
00:09:28,360 --> 00:09:33,590
in TLA+ and check them
with a model checker.

183
00:09:33,590 --> 00:09:38,845
But before we actually sat down
and wrote a TLA+ specification,

184
00:09:38,845 --> 00:09:40,360
we wrote another document,

185
00:09:40,360 --> 00:09:42,760
in which we called
English Specification.

186
00:09:42,760 --> 00:09:46,270
That's how we do things
at Informal Systems now.

187
00:09:46,270 --> 00:09:49,040
We write this document,

188
00:09:49,040 --> 00:09:54,025
it's a mixture of English prose
and Mathematics campaigns,

189
00:09:54,025 --> 00:09:55,845
the sequential statement about

190
00:09:55,845 --> 00:09:58,310
the problem they are going to solve.

191
00:09:58,310 --> 00:10:03,140
It's a composition of this problem
into distributed components.

192
00:10:03,140 --> 00:10:08,215
It also has basic specifications

193
00:10:08,215 --> 00:10:12,435
on the temporal purposes we believe
this protocol should satisfy.

194
00:10:12,435 --> 00:10:15,870
It has the high level functions,

195
00:10:15,870 --> 00:10:19,475
the pre-conditions, the
post-conditions, and more importantly,

196
00:10:19,475 --> 00:10:23,510
the error conditions
that tell us what has

197
00:10:23,510 --> 00:10:28,030
to happen if some assumptions
about our system are violated.

198
00:10:28,030 --> 00:10:30,980
The user's document has a base

199
00:10:30,980 --> 00:10:35,464
for discussions between [inaudible]
with the system engineers,

200
00:10:35,464 --> 00:10:39,110
verification engineers,
and researchers before

201
00:10:39,110 --> 00:10:42,630
we start diving into a
formal specification,

202
00:10:42,630 --> 00:10:45,220
such as one in TLA+.

203
00:10:46,030 --> 00:10:48,950
I am not going to give
you this document here,

204
00:10:48,950 --> 00:10:53,450
but you can find it on
the web. Here's a link.

205
00:10:53,450 --> 00:10:57,110
You can see how one

206
00:10:57,110 --> 00:11:02,775
can understand this Light
Client protocol without TLA+.

207
00:11:02,775 --> 00:11:05,340
Once this document is ready,

208
00:11:05,340 --> 00:11:08,150
we sit down and write a
specification in TLA+.

209
00:11:08,150 --> 00:11:10,050
In case of Light Client,

210
00:11:10,050 --> 00:11:13,460
it actually took us
several hours to translate

211
00:11:13,460 --> 00:11:15,755
the English specification into

212
00:11:15,755 --> 00:11:20,810
actual TLA+ that we could
check with Apalache.

213
00:11:20,810 --> 00:11:26,115
What we focus on in the
TLA+ specification,

214
00:11:26,115 --> 00:11:28,310
we focus on the model of

215
00:11:28,310 --> 00:11:31,655
the blockchain that I briefly
explained to you before.

216
00:11:31,655 --> 00:11:35,510
This model captures these
notions of validator sets,

217
00:11:35,510 --> 00:11:38,195
Byzantine faults, and
the time assumptions

218
00:11:38,195 --> 00:11:41,340
that we have in Tendermint Consensus.

219
00:11:41,340 --> 00:11:46,175
We have a specification of the
Light Client protocol in TLA+.

220
00:11:46,175 --> 00:11:51,255
In our case, the Light Client
protocol is talking to a peer.

221
00:11:51,255 --> 00:11:54,340
Here is correct, or faulty,

222
00:11:54,340 --> 00:12:01,540
and it also has some time that
it relates to the blockchain.

223
00:12:01,540 --> 00:12:04,300
Again, I'm not giving you
the TLA specification,

224
00:12:04,300 --> 00:12:06,700
you can find that on the web.

225
00:12:06,700 --> 00:12:13,550
But I want to give you a bit of
flavor about this specification.

226
00:12:14,070 --> 00:12:18,445
First of all, the
light client can talk

227
00:12:18,445 --> 00:12:22,180
to either a faulty peer
or a correct peer.

228
00:12:22,180 --> 00:12:23,695
The light client, of course, cannot

229
00:12:23,695 --> 00:12:26,485
know whether the peer
is faulty or not,

230
00:12:26,485 --> 00:12:27,940
but it will set up

231
00:12:27,940 --> 00:12:30,550
a parameter in the
specification it is telling us,

232
00:12:30,550 --> 00:12:32,290
whether the peer is faulty or not.

233
00:12:32,290 --> 00:12:34,270
If the peer is correct,

234
00:12:34,270 --> 00:12:36,130
we just copy a block from

235
00:12:36,130 --> 00:12:38,545
the blockchain and give
it to the light client.

236
00:12:38,545 --> 00:12:41,170
If the peer is faulty,

237
00:12:41,170 --> 00:12:43,840
we produce a non-premise block

238
00:12:43,840 --> 00:12:46,840
that passes some basic
validation tests,

239
00:12:46,840 --> 00:12:52,060
but it might have the
incorrect set of validators,

240
00:12:52,060 --> 00:12:55,840
or it may have an
incorrect set of commits,

241
00:12:55,840 --> 00:13:00,220
it might have some skewed
time and so on and so forth.

242
00:13:00,220 --> 00:13:04,270
I just want to show
you here the structure

243
00:13:04,270 --> 00:13:08,440
of the blocks that you have
because it might be interesting.

244
00:13:08,440 --> 00:13:12,235
Blocks are basically
records sincerely,

245
00:13:12,235 --> 00:13:14,380
they have the height, the time,

246
00:13:14,380 --> 00:13:16,915
which is just an
integer now in coding.

247
00:13:16,915 --> 00:13:20,155
They have a set of commits.

248
00:13:20,155 --> 00:13:24,970
These are the validators who have
voted on the previous block,

249
00:13:24,970 --> 00:13:28,465
they have this set of validators
for the current block,

250
00:13:28,465 --> 00:13:32,470
and they have this set of
validators for the next block.

251
00:13:32,470 --> 00:13:37,405
When a client is
receiving a LightBlock,

252
00:13:37,405 --> 00:13:41,560
it gets this header plus
this set of commits,

253
00:13:41,560 --> 00:13:44,620
who must have voted on that header.

254
00:13:44,620 --> 00:13:49,090
If you think about checking
these data structures,

255
00:13:49,090 --> 00:13:51,610
you'll see a margin, you have
a sequence of logs there.

256
00:13:51,610 --> 00:13:56,245
You can see that there is a lot
of space for state explosion.

257
00:13:56,245 --> 00:14:02,005
Here you have integer time and
you have four sets of validators.

258
00:14:02,005 --> 00:14:05,025
That's why they are usually Apalache.

259
00:14:05,025 --> 00:14:06,765
Again, I'm not giving you details,

260
00:14:06,765 --> 00:14:08,595
I have all details here.

261
00:14:08,595 --> 00:14:12,390
But basically, before you
get on the light client,

262
00:14:12,390 --> 00:14:15,450
we initialize the blockchain
just in one step,

263
00:14:15,450 --> 00:14:20,530
and when we initialize it we
just guess the validator sets,

264
00:14:20,530 --> 00:14:23,815
the sets of commits and time stamps,

265
00:14:23,815 --> 00:14:26,890
so they form a correct blockchain

266
00:14:26,890 --> 00:14:31,435
where more than two-thirds
of validators have voted,

267
00:14:31,435 --> 00:14:35,460
also correct and not
correct timestamps

268
00:14:35,460 --> 00:14:38,040
are grown on a time period.

269
00:14:38,040 --> 00:14:41,175
Let's see how we can verify

270
00:14:41,175 --> 00:14:45,010
the specification on the light
client with our model checker.

271
00:14:45,010 --> 00:14:48,070
Apalache, if you haven't
seen this tool before,

272
00:14:48,070 --> 00:14:51,655
you can go on that webpage
that's mention above.

273
00:14:51,655 --> 00:14:56,725
Read the manual and see for
yourself how it's working.

274
00:14:56,725 --> 00:14:58,810
From the bird's eye perspective,

275
00:14:58,810 --> 00:15:01,480
what it does, it takes
the TLA specification,

276
00:15:01,480 --> 00:15:04,855
it unfolds all the
operator definitions,

277
00:15:04,855 --> 00:15:07,810
including the recursive
operator definitions up to

278
00:15:07,810 --> 00:15:12,310
certain bounds and finds
what we call assignments.

279
00:15:12,310 --> 00:15:15,355
Similar to what you'll
see this dynamically.

280
00:15:15,355 --> 00:15:17,680
It decomposes a specification into

281
00:15:17,680 --> 00:15:20,394
smaller pieces that we
call symbolic transitions,

282
00:15:20,394 --> 00:15:23,815
roughly corresponds to actions
and the specification,

283
00:15:23,815 --> 00:15:26,545
then it does some
simple tab checking.

284
00:15:26,545 --> 00:15:28,960
Although what it does
now is going to be

285
00:15:28,960 --> 00:15:30,760
superseded by the new type checker

286
00:15:30,760 --> 00:15:32,725
and type inference you heard about.

287
00:15:32,725 --> 00:15:34,840
This means magnet as well.

288
00:15:34,840 --> 00:15:36,999
Then it does a translational,

289
00:15:36,999 --> 00:15:40,540
particularly specification
into SMT solver.

290
00:15:40,540 --> 00:15:44,830
As you probably know, SMT solver
is quite powerful and we want to

291
00:15:44,830 --> 00:15:49,900
utilize their power for
verification questions in TLA+.

292
00:15:49,900 --> 00:15:53,590
What Apalache does in
a bit more detail,

293
00:15:53,590 --> 00:15:56,620
it actually performs
bounded model checking.

294
00:15:56,620 --> 00:16:00,850
Given a specification in the
following canonical form that tells,

295
00:16:00,850 --> 00:16:04,690
you also expect Apalache

296
00:16:04,690 --> 00:16:08,770
first checks whether the
environment can be evaluated.

297
00:16:08,770 --> 00:16:13,030
The initial state translates
this formula into SMT and

298
00:16:13,030 --> 00:16:14,830
pass this SMT solver if it can

299
00:16:14,830 --> 00:16:17,800
find a satisfying
assignment for this query.

300
00:16:17,800 --> 00:16:20,229
If it has found a
satisfying assignment,

301
00:16:20,229 --> 00:16:23,365
Apalache will produce a
counterexample in TLA form.

302
00:16:23,365 --> 00:16:26,739
Otherwise, it will continue
to unroll executions,

303
00:16:26,739 --> 00:16:29,620
it'll introduce one step, two steps,

304
00:16:29,620 --> 00:16:33,310
three steps, until it has either
found a counterexample with

305
00:16:33,310 --> 00:16:37,345
SMT or it has reached
a certain depth.

306
00:16:37,345 --> 00:16:39,700
That's why it's called
Bound Model Checking.

307
00:16:39,700 --> 00:16:42,084
Fortunately, for the light
client's specification,

308
00:16:42,084 --> 00:16:46,930
you can provide the depths given
the length of the blockchain.

309
00:16:46,930 --> 00:16:51,280
Re-run Apalache on different models

310
00:16:51,280 --> 00:16:53,920
produced from the light
client specification.

311
00:16:53,920 --> 00:16:58,720
There we had from 4-7 validators and

312
00:16:58,720 --> 00:17:04,045
from 3-5 blocks in the blockchain.

313
00:17:04,045 --> 00:17:07,495
Also, we had either
faulty or correct peers,

314
00:17:07,495 --> 00:17:09,940
the light client was talking to.

315
00:17:09,940 --> 00:17:12,490
This set of huge numbers, however,

316
00:17:12,490 --> 00:17:16,900
recalls that we had
all this power sets in

317
00:17:16,900 --> 00:17:18,280
the blockchains and

318
00:17:18,280 --> 00:17:21,445
three key relations including
cardinality comparisons.

319
00:17:21,445 --> 00:17:24,865
We also have found some
violations of the purposes,

320
00:17:24,865 --> 00:17:28,000
which is interesting because
these violations were

321
00:17:28,000 --> 00:17:31,450
predicted by our researchers.

322
00:17:31,450 --> 00:17:36,590
I just want to show
you how Apalache runs.

323
00:17:36,840 --> 00:17:40,210
Without any compilation or whatever,

324
00:17:40,210 --> 00:17:43,615
you can just pull the
tool from Docker,

325
00:17:43,615 --> 00:17:45,910
it will pull a Docker image.

326
00:17:45,910 --> 00:17:49,870
In my case, it's fresh,

327
00:17:49,870 --> 00:17:51,625
so it doesn't do anything.

328
00:17:51,625 --> 00:17:55,105
Then they run the Docker
image around Apalache.

329
00:17:55,105 --> 00:17:59,320
The Docker image asks you
to check this variant and

330
00:17:59,320 --> 00:18:04,210
give it the model of five
validators and five blocks.

331
00:18:04,210 --> 00:18:07,705
What Apalache does now
inside the Docker image,

332
00:18:07,705 --> 00:18:10,915
it does the pre-processing
I told you about.

333
00:18:10,915 --> 00:18:15,380
Then it checks the invariant
for the initial state,

334
00:18:16,140 --> 00:18:20,950
all possible states after
one step, after two steps,

335
00:18:20,950 --> 00:18:27,560
after three steps, and soon it's
going to report a counterexample.

336
00:18:29,160 --> 00:18:31,300
Here's a counter-example.

337
00:18:31,300 --> 00:18:34,465
We can go to counter-example
of a TLA and

338
00:18:34,465 --> 00:18:38,185
see the exact values
produced by Apalache,

339
00:18:38,185 --> 00:18:44,150
and you see that violates
these specific invariants.

340
00:18:44,220 --> 00:18:48,440
Now, I give the floor to Andrey.

341
00:18:50,820 --> 00:18:54,430
>> Now, we continue with
the model-based testing.

342
00:18:54,430 --> 00:18:55,870
But before diving into it,

343
00:18:55,870 --> 00:19:03,100
I want to describe how standard
testing looks as before.

344
00:19:03,100 --> 00:19:06,130
A test engineer who
uses a tendermint-go

345
00:19:06,130 --> 00:19:08,920
implementation to manually write

346
00:19:08,920 --> 00:19:12,040
integration tests that goes round

347
00:19:12,040 --> 00:19:16,760
against both tendermint-rs and
tendermint-go implementations.

348
00:19:16,920 --> 00:19:19,570
This is an example of

349
00:19:19,570 --> 00:19:24,295
a pretty simple manually
written integration test.

350
00:19:24,295 --> 00:19:28,495
It just constructs a test case,

351
00:19:28,495 --> 00:19:33,025
which is a sequence of steps
for the light plan protocol.

352
00:19:33,025 --> 00:19:37,825
Then this test case is
translated into the JSON and

353
00:19:37,825 --> 00:19:40,390
all the necessary data
structures which contain

354
00:19:40,390 --> 00:19:44,420
such details as public
keys and hashes.

355
00:19:44,460 --> 00:19:49,704
It's executed by the static driver

356
00:19:49,704 --> 00:19:52,330
and this static driver
is a pretty simple piece

357
00:19:52,330 --> 00:19:55,075
of code which just takes this case,

358
00:19:55,075 --> 00:19:57,580
executes the light plan verifier and

359
00:19:57,580 --> 00:20:00,475
compares it to returns at each step,

360
00:20:00,475 --> 00:20:03,680
the deep sets expected from the test.

361
00:20:06,570 --> 00:20:10,660
Standard testing has a
number of drawbacks.

362
00:20:10,660 --> 00:20:12,250
First of all, it's manual generation.

363
00:20:12,250 --> 00:20:13,660
With lots of effort,

364
00:20:13,660 --> 00:20:17,290
this test that we solved was
pretty simple, actually.

365
00:20:17,290 --> 00:20:19,960
More complicated cases,
it's non-trivial logic

366
00:20:19,960 --> 00:20:22,480
required and it's inflexible.

367
00:20:22,480 --> 00:20:23,890
It fixes all the parameters,

368
00:20:23,890 --> 00:20:27,400
non determinism left or exploration.

369
00:20:27,400 --> 00:20:31,375
It's also hard to understand what
the particular test case does.

370
00:20:31,375 --> 00:20:34,660
Its hard to maintain them
and it's hard to cover

371
00:20:34,660 --> 00:20:38,735
all the case because we will
need hundreds of those tests.

372
00:20:38,735 --> 00:20:42,390
Next, I will switch to
model-based testing.

373
00:20:42,390 --> 00:20:46,350
Now, we will look at its components.

374
00:20:46,350 --> 00:20:49,380
But before looking at
that particular view,

375
00:20:49,380 --> 00:20:52,030
we will run a short demo.

376
00:20:52,770 --> 00:20:58,885
This is a short demo of how our
model-based testing is working.

377
00:20:58,885 --> 00:21:00,760
It's fully integrated into

378
00:21:00,760 --> 00:21:05,590
the standard cargo test
command as a test.

379
00:21:05,590 --> 00:21:07,840
But when we run this cargo test,

380
00:21:07,840 --> 00:21:12,895
among what happens for
each of the TLA+ tests,

381
00:21:12,895 --> 00:21:14,800
in this case it's a success,

382
00:21:14,800 --> 00:21:17,485
it runs Apalache on this test.

383
00:21:17,485 --> 00:21:20,440
Apalache produces a counterexample

384
00:21:20,440 --> 00:21:22,855
which contains the example execution.

385
00:21:22,855 --> 00:21:27,940
This execution is then
translated into the static test,

386
00:21:27,940 --> 00:21:32,740
and this test is run
against the implementation.

387
00:21:32,740 --> 00:21:39,860
In this case, we have the light
client protocol and we execute

388
00:21:39,860 --> 00:21:43,800
this light client protocol against
the static test and compare

389
00:21:43,800 --> 00:21:48,690
the verdict that we obtain
against the expected verdict.

390
00:21:48,690 --> 00:21:51,360
What we can see in this case is that

391
00:21:51,360 --> 00:21:54,155
running Apalache does take some time.

392
00:21:54,155 --> 00:21:58,765
But as a byproduct of
those model-based test,

393
00:21:58,765 --> 00:22:00,325
we obtain a static tests,

394
00:22:00,325 --> 00:22:02,710
and this static test
can then be executed

395
00:22:02,710 --> 00:22:06,470
independently in almost no time.

396
00:22:10,800 --> 00:22:15,040
Now, how does a TLA+ test look like?

397
00:22:15,040 --> 00:22:19,550
TLA+ test is a pretty
simple TLA+ resource.

398
00:22:20,010 --> 00:22:23,275
It's very easy to understand for

399
00:22:23,275 --> 00:22:26,035
each particular test what it does.

400
00:22:26,035 --> 00:22:27,610
I very much like, for example.

401
00:22:27,610 --> 00:22:36,040
This test says that for each state
in the computational history,

402
00:22:36,040 --> 00:22:40,405
the validator sets
would be different.

403
00:22:40,405 --> 00:22:44,440
You can also express
assertions about time,

404
00:22:44,440 --> 00:22:49,165
for example, or about the
final state of the protocol.

405
00:22:49,165 --> 00:22:51,970
How do we integrate this test

406
00:22:51,970 --> 00:22:54,535
with the LightClient
model you saw before?

407
00:22:54,535 --> 00:22:57,070
This is also very easy due to

408
00:22:57,070 --> 00:22:59,740
the modular nature of
the TLA+ language.

409
00:22:59,740 --> 00:23:04,405
We simply extend the Init
and Next predicates,

410
00:23:04,405 --> 00:23:06,310
we use the history tracking.

411
00:23:06,310 --> 00:23:09,010
I've introduced this
additional history variable,

412
00:23:09,010 --> 00:23:11,020
and this variable just tracks

413
00:23:11,020 --> 00:23:14,379
the necessary state components
of the LightClient protocol,

414
00:23:14,379 --> 00:23:16,420
and at each step of

415
00:23:16,420 --> 00:23:20,860
the model execution besides
executing this model itself,

416
00:23:20,860 --> 00:23:26,090
we're also updating the
history for the next step.

417
00:23:27,450 --> 00:23:31,960
For the model checker, we
needed only to extend it to

418
00:23:31,960 --> 00:23:36,040
this JSON input-output support
for integration purposes.

419
00:23:36,040 --> 00:23:38,680
In fact, you can now use Apalache

420
00:23:38,680 --> 00:23:41,365
as a translator
between TLA+ and JSON,

421
00:23:41,365 --> 00:23:45,385
and we highly encourage everyone
interested to use this feature.

422
00:23:45,385 --> 00:23:46,960
You could use it, for example,

423
00:23:46,960 --> 00:23:50,270
to integrate TLA+ with your tools.

424
00:23:50,790 --> 00:23:53,890
Now, how counterexample looks like,

425
00:23:53,890 --> 00:23:58,345
a counterexample is a
sequence of history states.

426
00:23:58,345 --> 00:24:03,715
At each state of this history
we have the current block,

427
00:24:03,715 --> 00:24:04,990
the current time point,

428
00:24:04,990 --> 00:24:08,095
the verdict that we expect
from the LightClient,

429
00:24:08,095 --> 00:24:11,200
and the latest verified block.

430
00:24:11,200 --> 00:24:13,600
This has covered a small portion of

431
00:24:13,600 --> 00:24:17,200
this history it translated into JSON.

432
00:24:17,200 --> 00:24:19,540
Don't be discouraged by the
large size of the encoding,

433
00:24:19,540 --> 00:24:22,400
this is meant only for
automatic purchase.

434
00:24:22,860 --> 00:24:26,395
Now, as a next step we transform

435
00:24:26,395 --> 00:24:28,960
our counterexample with the help of

436
00:24:28,960 --> 00:24:30,790
another component which is called

437
00:24:30,790 --> 00:24:34,630
JSONata into the static
integration test.

438
00:24:34,630 --> 00:24:37,060
This transformation happens with

439
00:24:37,060 --> 00:24:39,685
the help of so-called
transformation specification,

440
00:24:39,685 --> 00:24:44,920
which is a small
domain-specific language.

441
00:24:44,920 --> 00:24:49,705
We described how to transform
one JSON into another JSON.

442
00:24:49,705 --> 00:24:54,040
An important part of this
translation is that it calls

443
00:24:54,040 --> 00:24:58,570
so-called tendermint-testgen
command which allows us to

444
00:24:58,570 --> 00:25:02,500
produce whole Tender
mint data structures

445
00:25:02,500 --> 00:25:07,555
from the abstract data structures
that we have in the TLA+ model.

446
00:25:07,555 --> 00:25:11,245
Let's take a look at
how TestGen works.

447
00:25:11,245 --> 00:25:15,160
Effectively, you can
see it as a prob into

448
00:25:15,160 --> 00:25:19,960
the concrete implementations space
from the abstract TLA+ space.

449
00:25:19,960 --> 00:25:22,180
For example, in the
abstract space it will

450
00:25:22,180 --> 00:25:24,700
identify as a simple
string identifier.

451
00:25:24,700 --> 00:25:27,880
In the concrete implementation space,

452
00:25:27,880 --> 00:25:32,680
it contains private and public
keys and additional details.

453
00:25:32,680 --> 00:25:36,145
A header in the abstract
space just lists

454
00:25:36,145 --> 00:25:42,100
all simple abstract details
like the validator sets,

455
00:25:42,100 --> 00:25:43,240
the Next validator sets,

456
00:25:43,240 --> 00:25:45,535
the height, the time,
and the lastCommit.

457
00:25:45,535 --> 00:25:50,650
This translates into a
fully-fledged data structure

458
00:25:50,650 --> 00:25:53,510
with all the necessary hashes.

459
00:25:54,720 --> 00:25:57,490
Now, to the results. One of

460
00:25:57,490 --> 00:26:01,060
the most important result
about model-based testing is,

461
00:26:01,060 --> 00:26:02,380
from my point of view,

462
00:26:02,380 --> 00:26:04,390
that it help to eliminate

463
00:26:04,390 --> 00:26:08,000
the divergence between
the spec and the code.

464
00:26:08,220 --> 00:26:12,550
Each case that you see here results

465
00:26:12,550 --> 00:26:15,925
from a fail in model-based
test for example,

466
00:26:15,925 --> 00:26:17,680
it was able to gauge

467
00:26:17,680 --> 00:26:20,635
such corner cases as the
difference between less,

468
00:26:20,635 --> 00:26:22,345
and less than or equal.

469
00:26:22,345 --> 00:26:24,980
In this particular case,

470
00:26:25,740 --> 00:26:29,890
the TLA+ specification
actually diverts both from

471
00:26:29,890 --> 00:26:34,090
the English specification
and from the source code.

472
00:26:34,090 --> 00:26:38,620
In another cases, we
edit, for example,

473
00:26:38,620 --> 00:26:43,330
additional condition
to the specification

474
00:26:43,330 --> 00:26:44,815
because it will generate in

475
00:26:44,815 --> 00:26:48,115
negative times because
it was simply integers.

476
00:26:48,115 --> 00:26:53,125
Then another corner case in
less than equal and less.

477
00:26:53,125 --> 00:26:57,700
It was also missing another
check in the specification,

478
00:26:57,700 --> 00:27:01,880
that the headers are not
coming from the future.

479
00:27:02,670 --> 00:27:06,340
Model-based testing helped
us to substantially

480
00:27:06,340 --> 00:27:08,500
improve the goal coverage as you can

481
00:27:08,500 --> 00:27:11,530
see for the relevant
parts of the source tree,

482
00:27:11,530 --> 00:27:15,220
and it already helped us
to find the real bug.

483
00:27:15,220 --> 00:27:17,965
All of this is on
the first iteration.

484
00:27:17,965 --> 00:27:19,660
Recently, we have updated

485
00:27:19,660 --> 00:27:24,955
the Tendermint protocol from
version 033 to version 034,

486
00:27:24,955 --> 00:27:30,595
and the validators are set ordering
change between the version.

487
00:27:30,595 --> 00:27:36,400
An important aspect here is that
this bug was missed both by

488
00:27:36,400 --> 00:27:39,010
the standard testing and it was also

489
00:27:39,010 --> 00:27:42,520
not properly documented in the
specifications of the product,

490
00:27:42,520 --> 00:27:45,700
but it was caught by
the model-based test.

491
00:27:45,700 --> 00:27:48,985
It was a pretty simple
fix, as you can see here.

492
00:27:48,985 --> 00:27:52,960
Now, to conclude, model-based testing

493
00:27:52,960 --> 00:27:58,075
substantially improves ease
of writing and uses of tests.

494
00:27:58,075 --> 00:28:00,580
It's much easier to maintain

495
00:28:00,580 --> 00:28:04,030
them and it improves
the code coverage.

496
00:28:04,030 --> 00:28:06,070
The most important aspect I think is

497
00:28:06,070 --> 00:28:08,620
that it makes the
specifications live.

498
00:28:08,620 --> 00:28:10,780
Now, a specification evolves

499
00:28:10,780 --> 00:28:13,990
continuously with the
code that it specifies.

500
00:28:13,990 --> 00:28:18,655
Although we have invested some
time into infrastructure,

501
00:28:18,655 --> 00:28:23,080
I think the benefits substantially
outweigh those investments.

502
00:28:23,080 --> 00:28:25,120
As a future work,

503
00:28:25,120 --> 00:28:29,005
currently we are working on adding
fuzzing to model-based testing,

504
00:28:29,005 --> 00:28:30,805
and the fuzzing will understand

505
00:28:30,805 --> 00:28:34,029
additional mutation on the
level of data structures.

506
00:28:34,029 --> 00:28:36,580
This allows us to
cover scenarios which

507
00:28:36,580 --> 00:28:39,160
are currently inexpressible
in the abstract model.

508
00:28:39,160 --> 00:28:41,095
For example, in the abstract model

509
00:28:41,095 --> 00:28:44,290
a validator set is
a mathematical set,

510
00:28:44,290 --> 00:28:47,425
so it can't contain two
identical validators.

511
00:28:47,425 --> 00:28:49,240
At the same time for the source code,

512
00:28:49,240 --> 00:28:51,715
you want to test such cases.

513
00:28:51,715 --> 00:28:56,860
We plan to add extensions to
the Apalache model checker for

514
00:28:56,860 --> 00:28:59,650
faster counterexample search
because counterexamples

515
00:28:59,650 --> 00:29:03,145
can be found using
specialized techniques.

516
00:29:03,145 --> 00:29:08,485
Currently, Apalache is more attached
towards invariant checking.

517
00:29:08,485 --> 00:29:13,705
We plan to generate executable
code from TLA+ specifications,

518
00:29:13,705 --> 00:29:16,450
and this means just resolving

519
00:29:16,450 --> 00:29:20,500
the old monitoring mechanisms
that is present in the TLA+.

520
00:29:20,500 --> 00:29:24,400
This will allow us to
substantially speed up testing.

521
00:29:24,400 --> 00:29:26,020
We plan to extend

522
00:29:26,020 --> 00:29:27,820
the model-based testing framework to

523
00:29:27,820 --> 00:29:30,190
distributed setting where we will cut

524
00:29:30,190 --> 00:29:32,890
the system at the
interface points and

525
00:29:32,890 --> 00:29:36,025
replace some of the models with
executable specifications,

526
00:29:36,025 --> 00:29:38,770
and that's where we will test

527
00:29:38,770 --> 00:29:44,590
a mixture of the specs and the
real code in the running system.

528
00:29:44,590 --> 00:29:47,210
Thank you for your attention.