1
00:00:00,000 --> 00:00:02,910
>> Hello. My name is Igor Konnov,

2
00:00:02,910 --> 00:00:06,420
I'm working at Informal Systems.

3
00:00:06,420 --> 00:00:09,570
I'm going to give you
the first part of

4
00:00:09,570 --> 00:00:12,960
this talk on type inference
for TLA+ in Apalache.

5
00:00:12,960 --> 00:00:16,395
The second part will begin
by Jure Kukovec who's

6
00:00:16,395 --> 00:00:20,160
finishing his visit at Turin.

7
00:00:20,160 --> 00:00:24,660
We decided to split this
job of recording the video.

8
00:00:24,660 --> 00:00:29,350
That's why we have basically
two talks for the price of one.

9
00:00:31,640 --> 00:00:35,700
The title says we do type
inference in Apalache.

10
00:00:35,700 --> 00:00:36,930
If you haven't seen this tool,

11
00:00:36,930 --> 00:00:40,140
you can go on the Apalache
webpage on GitHub.

12
00:00:40,140 --> 00:00:42,390
That's a symbolic model checker

13
00:00:42,390 --> 00:00:45,690
for TLA+ as you develop
them at Informal Systems,

14
00:00:45,690 --> 00:00:48,460
and it happened to be
developed at Turin.

15
00:00:50,240 --> 00:00:53,670
In a nutshell, we have been

16
00:00:53,670 --> 00:00:57,295
developing this model checker
for about five years.

17
00:00:57,295 --> 00:01:00,845
It reduces the verification
questions too,

18
00:01:00,845 --> 00:01:02,900
since you solve them and we had been

19
00:01:02,900 --> 00:01:06,080
focusing on application of

20
00:01:06,080 --> 00:01:08,780
this model checker in the
context of fault tolerance,

21
00:01:08,780 --> 00:01:11,675
distributed algorithms,
and blockchains.

22
00:01:11,675 --> 00:01:13,940
That's what we are doing
at Informal Systems.

23
00:01:13,940 --> 00:01:17,990
Now, if you want to find more
information about how it's working,

24
00:01:17,990 --> 00:01:23,395
you can check that OOPSLA paper
that is mentioned on the slide.

25
00:01:23,395 --> 00:01:26,585
However, this talk is not
about the model checker,

26
00:01:26,585 --> 00:01:30,320
although that's a very
interesting topic for me.

27
00:01:30,320 --> 00:01:34,685
This talk is about how you go from

28
00:01:34,685 --> 00:01:40,985
an untyped language like TLA+
to a typed language like SMT.

29
00:01:40,985 --> 00:01:45,050
I guess SMT solvers
require types and codings.

30
00:01:45,050 --> 00:01:52,455
Essentially, this talk is about
how you type an untyped language.

31
00:01:52,455 --> 00:01:55,595
In this talk, they're going
to present to you two tools.

32
00:01:55,595 --> 00:01:57,830
One is the Type checker,

33
00:01:57,830 --> 00:01:59,435
the new Type checker we have,

34
00:01:59,435 --> 00:02:01,990
and another one is a
Type inference tool.

35
00:02:01,990 --> 00:02:04,310
The type checker is planned
for the next release.

36
00:02:04,310 --> 00:02:06,640
It's breaking out.

37
00:02:06,640 --> 00:02:10,080
That's some local analysis
of operator definitions.

38
00:02:10,080 --> 00:02:12,335
If it cannot deliver,

39
00:02:12,335 --> 00:02:15,690
it asks the user to give
it to some annotations.

40
00:02:15,690 --> 00:02:18,520
Type inferer doesn't
require any annotations and

41
00:02:18,520 --> 00:02:21,775
thus global analysis
of the specification.

42
00:02:21,775 --> 00:02:25,720
However, it's much harder
to make it robust tool.

43
00:02:25,720 --> 00:02:27,975
That's why it's still a prototype.

44
00:02:27,975 --> 00:02:31,840
It's actually working but it will
take a bit more time to make

45
00:02:31,840 --> 00:02:36,850
it a tool that would be
used for either origins.

46
00:02:36,850 --> 00:02:39,160
What's important here,
that the both tools are

47
00:02:39,160 --> 00:02:41,495
completely independent
of the model checker.

48
00:02:41,495 --> 00:02:43,930
You don't have to understand
the model checker.

49
00:02:43,930 --> 00:02:45,970
You don't have to use
the model checker in

50
00:02:45,970 --> 00:02:49,490
order to do type inference tools.

51
00:02:49,680 --> 00:02:51,910
Let's talk about the type checker.

52
00:02:51,910 --> 00:02:55,505
I will focus on it then Jure will
talk about the type inference.

53
00:02:55,505 --> 00:02:58,800
Okay, let's have a quick demo here.

54
00:02:58,800 --> 00:03:03,310
I'll show you how the type checker
is working on a simple example.

55
00:03:03,310 --> 00:03:05,905
It's just called CarTalkPuzzle.

56
00:03:05,905 --> 00:03:10,705
I took it from the repository
of the TLA specifications.

57
00:03:10,705 --> 00:03:14,425
You can look the
specification yourself.

58
00:03:14,425 --> 00:03:16,945
If you look at it, you'll
see that there are

59
00:03:16,945 --> 00:03:21,370
some non-trivial definitions
that use recursion,

60
00:03:21,370 --> 00:03:23,960
that use functions sets.

61
00:03:23,960 --> 00:03:26,020
It's not always clear

62
00:03:26,020 --> 00:03:28,900
what the arguments are
that breakers are doing,

63
00:03:28,900 --> 00:03:32,650
until you read the recommendations,

64
00:03:32,650 --> 00:03:34,370
if you read the comments here.

65
00:03:34,370 --> 00:03:37,025
Let's see what our type checker can

66
00:03:37,025 --> 00:03:40,440
tell us about this specification.

67
00:03:40,440 --> 00:03:44,960
We'll just run the type checker
result writes in annotations

68
00:03:44,960 --> 00:03:51,450
and see where it complains
about the specification.

69
00:03:53,270 --> 00:03:57,095
It complains in line 51 here,

70
00:03:57,095 --> 00:04:01,870
it says that it found a polymorphic
type, basically some constant.

71
00:04:01,870 --> 00:04:05,770
The tasks has to give
the concrete type here.

72
00:04:05,770 --> 00:04:08,390
We don't allow polymorphic types,

73
00:04:08,390 --> 00:04:10,340
user-defined operators,

74
00:04:10,340 --> 00:04:14,595
although TLA operators are
obviously polymorphic.

75
00:04:14,595 --> 00:04:16,470
What's going on here?

76
00:04:16,470 --> 00:04:18,660
You have to choose operator that just

77
00:04:18,660 --> 00:04:22,260
provides some elements of a set.

78
00:04:22,260 --> 00:04:26,325
Of course, that checker cannot
figure out the exact type.

79
00:04:26,325 --> 00:04:28,110
What we can do here, we can

80
00:04:28,110 --> 00:04:32,660
define the signature
of this operator sum.

81
00:04:32,660 --> 00:04:37,485
In order to do that, I will
have to extend the module,

82
00:04:37,485 --> 00:04:39,875
type inlet extended Apalache module

83
00:04:39,875 --> 00:04:42,555
and to write the type annotation.

84
00:04:42,555 --> 00:04:43,830
Let's go and we do it now.

85
00:04:43,830 --> 00:04:47,550
I will just copy this type annotation

86
00:04:47,550 --> 00:04:53,690
from my cheat sheet because I
figured out the types before.

87
00:04:54,350 --> 00:04:57,465
Say here that the operator, Sum,

88
00:04:57,465 --> 00:05:00,530
is taking a function of integers to

89
00:05:00,530 --> 00:05:05,195
integers and it's taking
a set of integers.

90
00:05:05,195 --> 00:05:08,000
As a result, it returns an integer.

91
00:05:08,000 --> 00:05:10,745
That's our syntax for operators.

92
00:05:10,745 --> 00:05:15,085
Let's see if default type
checker is convinced here.

93
00:05:15,085 --> 00:05:18,350
Yes. Now it complains
about another thing,

94
00:05:18,350 --> 00:05:20,600
it says that it doesn't
know the type of

95
00:05:20,600 --> 00:05:23,990
P. Indeed P is just a constant,

96
00:05:23,990 --> 00:05:27,140
as well as N. What we can do here,

97
00:05:27,140 --> 00:05:30,590
we can introduce the assumptions,

98
00:05:30,590 --> 00:05:34,970
I'll just say one
and fill right here,

99
00:05:34,970 --> 00:05:41,100
the assumptions about
types of the parameters,

100
00:05:41,100 --> 00:05:43,760
N and P are integers.

101
00:05:43,760 --> 00:05:46,460
Let's run the type checker again.

102
00:05:46,460 --> 00:05:48,390
Now it goes through,

103
00:05:48,390 --> 00:05:53,230
but now it complains in line 169.

104
00:05:53,720 --> 00:05:56,360
Here it's actually interesting.

105
00:05:56,360 --> 00:06:04,405
It says that it cannot choose
between a function and a sequence,

106
00:06:04,405 --> 00:06:06,670
because B can be either a sequence or

107
00:06:06,670 --> 00:06:10,060
a function given the current
types in this operator.

108
00:06:10,060 --> 00:06:13,360
It cannot figure it
out. You can again help

109
00:06:13,360 --> 00:06:17,785
the type checker and give
a concrete type for image.

110
00:06:17,785 --> 00:06:20,140
This time it's not so trivial.

111
00:06:20,140 --> 00:06:24,560
It took me a while to
figure out the exact type.

112
00:06:24,560 --> 00:06:29,780
But it actually takes a set
of integers and, again,

113
00:06:29,780 --> 00:06:31,925
a function of integers to integers,

114
00:06:31,925 --> 00:06:36,170
and then again returns
a set of integers.

115
00:06:36,170 --> 00:06:38,595
Let's run the type checker again.

116
00:06:38,595 --> 00:06:41,805
Now the type checker has
happy. What does it mean?

117
00:06:41,805 --> 00:06:47,060
It means that our type
checker managed to compute

118
00:06:47,060 --> 00:06:49,325
the types of all expressions in

119
00:06:49,325 --> 00:06:52,640
all operator definitions
in this specification.

120
00:06:52,640 --> 00:06:56,780
All of you wrote just four
type annotations here.

121
00:06:56,780 --> 00:06:59,620
We gave the type annotations
for the parameters,

122
00:06:59,620 --> 00:07:02,105
and we gave type annotations for

123
00:07:02,105 --> 00:07:04,640
two operators that are actually quite

124
00:07:04,640 --> 00:07:08,210
trickier and the type
checker gave up on them.

125
00:07:08,210 --> 00:07:12,720
That ends this small
demo. Let's continue.

126
00:07:13,050 --> 00:07:18,040
I hope I have your attention
now after that short demo,

127
00:07:18,040 --> 00:07:20,140
and now I'd like to explain to you

128
00:07:20,140 --> 00:07:22,585
how this type checker
is actually working.

129
00:07:22,585 --> 00:07:27,400
I believe it's very simple and
I think you'll agree with me.

130
00:07:27,400 --> 00:07:30,250
Before we can do that, let's
talk about the type syntax.

131
00:07:30,250 --> 00:07:33,009
That's actually the syntax
we use for type annotations,

132
00:07:33,009 --> 00:07:38,080
but it also will help you in
understanding how the tools run.

133
00:07:38,080 --> 00:07:41,380
First of all, we have
basic types like Booleans,

134
00:07:41,380 --> 00:07:44,605
integers, strings, and
reals, nothing unexpected.

135
00:07:44,605 --> 00:07:46,210
We have uninterpreted types,

136
00:07:46,210 --> 00:07:49,735
and wait until the next slide,
I will explain all this.

137
00:07:49,735 --> 00:07:54,550
We also have functions from
sum types to sum types.

138
00:07:54,550 --> 00:07:57,265
We have sets and sequences.

139
00:07:57,265 --> 00:08:00,820
They're actually homogeneous
in the sense that you cannot

140
00:08:00,820 --> 00:08:04,585
put objects of different types
into a set or a sequence.

141
00:08:04,585 --> 00:08:07,030
However, there is an
exception for records.

142
00:08:07,030 --> 00:08:10,870
Wait until the slide when
we talk about records.

143
00:08:10,870 --> 00:08:16,270
There are tuples and
records, and of course,

144
00:08:16,270 --> 00:08:19,465
there are operators that
may have a fixed number of

145
00:08:19,465 --> 00:08:23,185
arguments to the results.

146
00:08:23,185 --> 00:08:26,825
There are type variables
that allow us to talk about

147
00:08:26,825 --> 00:08:31,890
polymorphic types and type
annotations and type inference.

148
00:08:31,890 --> 00:08:33,480
If you want to see more details,

149
00:08:33,480 --> 00:08:36,660
go to the Apalache
repository and check

150
00:08:36,660 --> 00:08:41,545
these architectural
decision record types.

151
00:08:41,545 --> 00:08:44,770
Uninterpreted types, I
promised them to you.

152
00:08:44,770 --> 00:08:46,930
Often, if you look at
the specification,

153
00:08:46,930 --> 00:08:48,910
you see constants, and

154
00:08:48,910 --> 00:08:52,780
these constants are often
sets or just some values.

155
00:08:52,780 --> 00:08:55,765
Usually, it actually doesn't matter

156
00:08:55,765 --> 00:08:57,970
whether these sets
are sets of integers,

157
00:08:57,970 --> 00:09:00,610
Boolean, strings, so on, so forth.

158
00:09:00,610 --> 00:09:03,550
You can see these examples
in Paxos, for instance,

159
00:09:03,550 --> 00:09:06,520
there are sets of values,
sets of acceptors.

160
00:09:06,520 --> 00:09:08,710
You see it in other examples like

161
00:09:08,710 --> 00:09:11,200
prisoners, missionaries,
and cannibals.

162
00:09:11,200 --> 00:09:13,300
In all these cases, actually,

163
00:09:13,300 --> 00:09:15,070
you should use uninterpreted types

164
00:09:15,070 --> 00:09:16,840
because the only thing you can do

165
00:09:16,840 --> 00:09:21,055
about uninterpreted types
is to compare objects,

166
00:09:21,055 --> 00:09:24,685
say, inside sets of
uninterpreted types.

167
00:09:24,685 --> 00:09:28,959
In this case, acceptors will
be just the set of processes,

168
00:09:28,959 --> 00:09:30,460
you don't know what processes are,

169
00:09:30,460 --> 00:09:33,670
you can always say if two
processes are equal or not.

170
00:09:33,670 --> 00:09:35,920
The same applies to values.

171
00:09:35,920 --> 00:09:37,510
If you have seen model values,

172
00:09:37,510 --> 00:09:41,275
then you'll see that's
actually uninterpreted values.

173
00:09:41,275 --> 00:09:43,930
You can say that these are
model values of types A,

174
00:09:43,930 --> 00:09:46,000
B, C, D, and so on.

175
00:09:46,000 --> 00:09:52,225
A, B, C, D, and so on are actually
uninterpreted types in TLC.

176
00:09:52,225 --> 00:09:56,260
So that's how you can
express these things.

177
00:09:56,260 --> 00:09:59,725
How does our type checker works?

178
00:09:59,725 --> 00:10:02,470
Good to say it is
embarrassingly simple.

179
00:10:02,470 --> 00:10:04,555
Let's see if I convince you.

180
00:10:04,555 --> 00:10:09,190
First of all, before the type
checker [inaudible] should annotate

181
00:10:09,190 --> 00:10:11,200
all the built-in operators with

182
00:10:11,200 --> 00:10:13,930
some signatures, operator signatures.

183
00:10:13,930 --> 00:10:17,365
Plenty of TLA operators are
quite simple in that sense.

184
00:10:17,365 --> 00:10:23,395
Say arithmetic operators take two
integers and return an integer.

185
00:10:23,395 --> 00:10:26,230
Some operators are also simple,

186
00:10:26,230 --> 00:10:29,410
but they're polymorphic, and
they're also very [inaudible].

187
00:10:29,410 --> 00:10:31,900
For instance, set constructors take

188
00:10:31,900 --> 00:10:36,655
several elements and return
a set of lows elements.

189
00:10:36,655 --> 00:10:38,470
Say, there are two elements,

190
00:10:38,470 --> 00:10:40,225
it will return a set
of the same type.

191
00:10:40,225 --> 00:10:43,075
You can argue that in TLA,

192
00:10:43,075 --> 00:10:45,580
set elements might
have different types,

193
00:10:45,580 --> 00:10:47,590
yes, but not on our type system,

194
00:10:47,590 --> 00:10:48,925
not in our type checker.

195
00:10:48,925 --> 00:10:50,800
Records are a bit of an exception,

196
00:10:50,800 --> 00:10:52,945
you'll see it later.

197
00:10:52,945 --> 00:10:57,820
Several operators actually can be
treated as overloaded operators.

198
00:10:57,820 --> 00:11:03,085
For instance, the tuple operator
creates either a tuple,

199
00:11:03,085 --> 00:11:04,960
so in this case a pair,

200
00:11:04,960 --> 00:11:07,945
or a sequence of elements
of the same type.

201
00:11:07,945 --> 00:11:10,450
Again, you can say
sequences are tuples,

202
00:11:10,450 --> 00:11:12,040
tuples are sequences, yes.

203
00:11:12,040 --> 00:11:15,355
In untyped TLA, that is true,

204
00:11:15,355 --> 00:11:17,110
not in our type system.

205
00:11:17,110 --> 00:11:19,090
We will reject some specifications

206
00:11:19,090 --> 00:11:21,550
that you believe are actually okay.

207
00:11:21,550 --> 00:11:24,880
There are a few hardcore
operators that actually

208
00:11:24,880 --> 00:11:30,610
bind some values to the variables.

209
00:11:30,610 --> 00:11:33,670
For instance, a set
filter in our system

210
00:11:33,670 --> 00:11:38,050
becomes this huge operator that
says you have an operator,

211
00:11:38,050 --> 00:11:40,195
it takes some type a,

212
00:11:40,195 --> 00:11:42,445
and it returns a Boolean
that's a predicate.

213
00:11:42,445 --> 00:11:47,005
Given this predicate, it
returns a set of a's.

214
00:11:47,005 --> 00:11:49,315
Actually, what is happening here,

215
00:11:49,315 --> 00:11:50,500
how does it do that?

216
00:11:50,500 --> 00:11:55,810
It binds a name x to the
elements of the set S,

217
00:11:55,810 --> 00:12:00,850
gives this x to P and P has to
figure out what to do about this x.

218
00:12:00,850 --> 00:12:03,690
If you looked at all

219
00:12:03,690 --> 00:12:06,840
these annotation set we wanted
to have and figured out

220
00:12:06,840 --> 00:12:10,050
that there is a very
simple language that can

221
00:12:10,050 --> 00:12:13,600
describe all of the TLA operators,

222
00:12:13,600 --> 00:12:16,870
if you think about them in
terms of a type system.

223
00:12:16,870 --> 00:12:20,830
That's what we call a simple
lambda-calculus over types.

224
00:12:20,830 --> 00:12:24,730
It has just six constructs
that you see here.

225
00:12:24,730 --> 00:12:27,985
First of all, you can have
a type, that's obvious.

226
00:12:27,985 --> 00:12:31,135
You just have to type like int
or a function from instance.

227
00:12:31,135 --> 00:12:32,950
You can have a variable name.

228
00:12:32,950 --> 00:12:35,200
You can have a lambda
abstraction as you have

229
00:12:35,200 --> 00:12:37,750
seen in the previous slide just

230
00:12:37,750 --> 00:12:41,890
binds the variables x_1 to x_k to

231
00:12:41,890 --> 00:12:43,780
the elements of the sets that are

232
00:12:43,780 --> 00:12:46,840
presented by expressions e_1 to e_K,

233
00:12:46,840 --> 00:12:49,960
and these names can be used inside

234
00:12:49,960 --> 00:12:54,040
expression e. Now that
definition, not surprisingly,

235
00:12:54,040 --> 00:12:57,460
binds expression e_1 to the name

236
00:12:57,460 --> 00:13:02,350
x and this name x can be used
inside the expression e_2.

237
00:13:02,350 --> 00:13:06,145
Application by name, like if you
have a user-defined operator,

238
00:13:06,145 --> 00:13:08,980
if you just take the operator name

239
00:13:08,980 --> 00:13:11,920
and apply it to the
arguments e_1 to e_k.

240
00:13:11,920 --> 00:13:14,020
The most interesting case here is,

241
00:13:14,020 --> 00:13:16,330
of course, application by type.

242
00:13:16,330 --> 00:13:18,850
You have a choice of types,

243
00:13:18,850 --> 00:13:21,910
operator types to choose
from and you apply one

244
00:13:21,910 --> 00:13:25,030
of these operators to the
arguments e_1 to e_k.

245
00:13:25,030 --> 00:13:27,070
It's a very simple language,

246
00:13:27,070 --> 00:13:28,795
much simpler than TLA.

247
00:13:28,795 --> 00:13:32,679
However, you can translate
every TLA expression,

248
00:13:32,679 --> 00:13:34,120
all the built-in operators and

249
00:13:34,120 --> 00:13:38,710
user-defined operators
into this simple language.

250
00:13:38,710 --> 00:13:41,440
What can we do about
this simple language?

251
00:13:41,440 --> 00:13:43,990
Well, we can take an expression in

252
00:13:43,990 --> 00:13:47,560
this language and do further
types by doing equation solving.

253
00:13:47,560 --> 00:13:49,585
I'll just show you an example here.

254
00:13:49,585 --> 00:13:53,140
Here's a TLA operator,
a very simple one.

255
00:13:53,140 --> 00:13:56,050
You translate it into our calculus.

256
00:13:56,050 --> 00:13:58,345
You have a Lambda expression,

257
00:13:58,345 --> 00:14:00,250
a signature int into bool,

258
00:14:00,250 --> 00:14:03,565
and you pass x and int.
What do you do then?

259
00:14:03,565 --> 00:14:05,245
Well, you make equations,

260
00:14:05,245 --> 00:14:07,750
introduce equations
for this operator,

261
00:14:07,750 --> 00:14:08,860
for this slide expression.

262
00:14:08,860 --> 00:14:11,200
You say there is a type a_1 that

263
00:14:11,200 --> 00:14:14,425
should have this shape because
I know there is lambda inside.

264
00:14:14,425 --> 00:14:20,560
For the lambda, I know that this
expression should be a set.

265
00:14:20,560 --> 00:14:23,440
So I say a_4 is a set and it has

266
00:14:23,440 --> 00:14:26,845
this shape and it actually
happens to be equal to set of a.

267
00:14:26,845 --> 00:14:29,900
Because, well, that's
going to do here and

268
00:14:29,900 --> 00:14:35,600
a_2 expresses the elements that
are going to be bound to x.

269
00:14:35,600 --> 00:14:38,510
Then we jump to inside
this expression.

270
00:14:38,510 --> 00:14:41,330
You see an operator application here,

271
00:14:41,330 --> 00:14:43,055
you know the number of arguments.

272
00:14:43,055 --> 00:14:44,880
You say it's a_5.

273
00:14:44,880 --> 00:14:47,785
It has two arguments and returns a_3.

274
00:14:47,785 --> 00:14:49,455
What are these arguments?

275
00:14:49,455 --> 00:14:52,130
Well, they have to be equal
to the value of x and

276
00:14:52,130 --> 00:14:55,400
x is encoded by the variable a_2,

277
00:14:55,400 --> 00:15:00,015
and the second argument
is int, right?

278
00:15:00,015 --> 00:15:02,210
You have the signatures,

279
00:15:02,210 --> 00:15:06,260
so this operator actually has to
match this signature as well.

280
00:15:06,260 --> 00:15:09,495
We solve these equations
and you have the answer.

281
00:15:09,495 --> 00:15:15,670
Well, I was using this kind
of special squiggly equality,

282
00:15:15,670 --> 00:15:17,200
it's not exactly the equality.

283
00:15:17,200 --> 00:15:20,740
What we mean by this is unification.

284
00:15:20,740 --> 00:15:23,860
In basic cases like ints,

285
00:15:23,860 --> 00:15:26,485
you just say they're
equal, like terms.

286
00:15:26,485 --> 00:15:28,690
If you have variables unification,

287
00:15:28,690 --> 00:15:30,440
you can say whether
they're equal or not

288
00:15:30,440 --> 00:15:34,295
and return a binding as a result.

289
00:15:34,295 --> 00:15:35,510
Or in this case,

290
00:15:35,510 --> 00:15:37,774
you can have two bindings.

291
00:15:37,774 --> 00:15:40,385
We solve some of the equations here.

292
00:15:40,385 --> 00:15:43,100
The most interesting
case here is records.

293
00:15:43,100 --> 00:15:44,735
We actually can unify

294
00:15:44,735 --> 00:15:47,345
records that have a
different number of fields.

295
00:15:47,345 --> 00:15:49,685
In this case, we would produce

296
00:15:49,685 --> 00:15:53,780
joint records that has all
these fields that are unified,

297
00:15:53,780 --> 00:15:57,020
but we don't allow to
unify records that

298
00:15:57,020 --> 00:16:03,155
have the sign types different
types to the same name.

299
00:16:03,155 --> 00:16:05,975
So in this way, we can have

300
00:16:05,975 --> 00:16:09,635
records of different types
mixed into the same set,

301
00:16:09,635 --> 00:16:11,479
like you can see in Paxos,

302
00:16:11,479 --> 00:16:16,540
but we still stay relatively
precise in our type checking.

303
00:16:16,540 --> 00:16:20,990
As a side effect, since we solve
equations of a type unification,

304
00:16:20,990 --> 00:16:23,525
we can do some local type inference.

305
00:16:23,525 --> 00:16:24,920
So there are some boundaries of

306
00:16:24,920 --> 00:16:26,720
this type checking and we think it's

307
00:16:26,720 --> 00:16:31,820
actually quite good for the user
to understand what's going on.

308
00:16:31,820 --> 00:16:34,130
The limit type computation to operate

309
00:16:34,130 --> 00:16:36,940
their boundaries if a type checker

310
00:16:36,940 --> 00:16:38,680
cannot find the types precisely in

311
00:16:38,680 --> 00:16:42,110
that definition or operator definition
just fails as [inaudible] ,

312
00:16:42,110 --> 00:16:45,860
we don't allow polymorphism
in user-defined operators,

313
00:16:45,860 --> 00:16:50,310
and that's actually a very good
topic for the meeting to discuss.

314
00:16:50,690 --> 00:16:54,090
At the same time, they allow
polymorphic, monomorphic,

315
00:16:54,090 --> 00:16:57,480
and overloaded built-in operators.

316
00:16:57,480 --> 00:17:00,965
Otherwise, we wouldn't be
able to do the type checking.

317
00:17:00,965 --> 00:17:03,640
So that finishes my part
about the type checker and

318
00:17:03,640 --> 00:17:07,030
now [inaudible] continues
with type inference.

319
00:17:07,030 --> 00:17:09,635
>> Hello everyone, welcome
to this part of the talk

320
00:17:09,635 --> 00:17:12,510
where we discuss type
inference in TLA.

321
00:17:13,890 --> 00:17:17,005
This talk is going to be
about five things roughly.

322
00:17:17,005 --> 00:17:19,630
First, it's going to be about
our Type System in practice.

323
00:17:19,630 --> 00:17:21,080
I'm going to walk you
through some examples

324
00:17:21,080 --> 00:17:23,300
and for some of the
experiments we've done,

325
00:17:23,300 --> 00:17:27,565
and then I'll talk a bit more about
the behind-the-scenes theory,

326
00:17:27,565 --> 00:17:31,790
how we actually do this at
the first-order logic level,

327
00:17:31,790 --> 00:17:33,500
how we encode types of strings,

328
00:17:33,500 --> 00:17:36,875
and then finally how encode
types of TLA expressions.

329
00:17:36,875 --> 00:17:40,430
So here's what we do
at a conceptual level.

330
00:17:40,430 --> 00:17:41,630
So first, we use

331
00:17:41,630 --> 00:17:45,180
the TLA specification to get
some bounds information.

332
00:17:45,180 --> 00:17:47,825
So we're not going to use all of
the types in the type system.

333
00:17:47,825 --> 00:17:49,760
For a concrete specification,

334
00:17:49,760 --> 00:17:52,715
we only have to care about a finite
number of them, specifically,

335
00:17:52,715 --> 00:17:54,365
finite [inaudible]
records and tuples,

336
00:17:54,365 --> 00:17:56,250
it's going to simplify
a lot of things.

337
00:17:56,250 --> 00:17:57,920
Then we keep the specification or

338
00:17:57,920 --> 00:18:00,170
any sub-expression we're
interested in and we

339
00:18:00,170 --> 00:18:04,795
create the SMT constraints from
that expression's syntax tree.

340
00:18:04,795 --> 00:18:06,625
Then we have two options,

341
00:18:06,625 --> 00:18:11,010
either we have a satisfiable set
of constraints, we get a model,

342
00:18:11,010 --> 00:18:12,310
and from that model,

343
00:18:12,310 --> 00:18:15,065
we reconstruct the types maybe
with some post-processing

344
00:18:15,065 --> 00:18:18,125
where we make this
more user-presentable.

345
00:18:18,125 --> 00:18:20,150
Or alternatively, we
have a type error,

346
00:18:20,150 --> 00:18:22,925
in which case the user has to
go and fix the specification.

347
00:18:22,925 --> 00:18:25,150
All right. So let's walk
through an example.

348
00:18:25,150 --> 00:18:28,610
So here's a very simple
specification that does

349
00:18:28,610 --> 00:18:30,470
almost nothing and
then we'll maybe see

350
00:18:30,470 --> 00:18:35,860
how we get constraints
from this [inaudible]

351
00:18:35,860 --> 00:18:40,570
Basically, here's the
approach in a sentence.

352
00:18:40,570 --> 00:18:43,045
We're going to look
at every expression,

353
00:18:43,045 --> 00:18:44,950
and every sub-expression
of that expression.

354
00:18:44,950 --> 00:18:47,080
We're going to assign
it an SMT variable.

355
00:18:47,080 --> 00:18:50,080
In this case, if p then y else of

356
00:18:50,080 --> 00:18:54,280
A is assigned the SMT variable t_1.

357
00:18:54,280 --> 00:18:56,470
When we're going to
read the solution,

358
00:18:56,470 --> 00:19:01,550
we're going to say our t_1
holds the body of p especially.

359
00:19:01,710 --> 00:19:04,870
The top-level body of p is

360
00:19:04,870 --> 00:19:07,825
defined with an if-then-else
operator which has a known type.

361
00:19:07,825 --> 00:19:11,590
In order to encode
this type, we take,

362
00:19:11,590 --> 00:19:13,135
for example, in this case,

363
00:19:13,135 --> 00:19:16,240
the polymorphic type,
any Alpha, bool Alpha,

364
00:19:16,240 --> 00:19:19,450
Alpha to Alpha, and we walk through

365
00:19:19,450 --> 00:19:23,800
the arguments of p and we
encode each of these arguments,

366
00:19:23,800 --> 00:19:27,250
the type that the operator
says the argument has to have.

367
00:19:27,250 --> 00:19:29,545
For example, p will
have to have type bool,

368
00:19:29,545 --> 00:19:31,690
y and set Alpha will
have to have type Alpha,

369
00:19:31,690 --> 00:19:34,540
and the entire expression
will have to have type Alpha.

370
00:19:34,540 --> 00:19:35,710
Here, in this case,

371
00:19:35,710 --> 00:19:37,750
we use the variable c_1 to represent

372
00:19:37,750 --> 00:19:41,050
the type Alpha because we don't
know what Alpha is at this point.

373
00:19:41,050 --> 00:19:44,635
We use bool for t_2 because
p has to be a boolean.

374
00:19:44,635 --> 00:19:47,140
Next, we just walk
through the arguments.

375
00:19:47,140 --> 00:19:48,880
For example, p and y,

376
00:19:48,880 --> 00:19:50,260
they're handled in the same way.

377
00:19:50,260 --> 00:19:51,700
One is a constant, one is variable,

378
00:19:51,700 --> 00:19:53,470
but at the type inference level,

379
00:19:53,470 --> 00:19:55,555
there is no difference
between the two.

380
00:19:55,555 --> 00:19:58,450
Each constant and variable
gets a pre-assigned SMT

381
00:19:58,450 --> 00:20:01,465
variable so that anywhere in the
specification where we use p,

382
00:20:01,465 --> 00:20:06,230
we use the same type variable
to refer to it in the encoding.

383
00:20:06,420 --> 00:20:09,460
For the set constructor,

384
00:20:09,460 --> 00:20:11,980
the story is similar to
the analysis constructors.

385
00:20:11,980 --> 00:20:13,225
It's a built-in operator.

386
00:20:13,225 --> 00:20:15,025
It has a known type.

387
00:20:15,025 --> 00:20:17,395
It's one argument that creates a set.

388
00:20:17,395 --> 00:20:19,060
The process is very similar,

389
00:20:19,060 --> 00:20:20,875
so t_4 has to be a set type.

390
00:20:20,875 --> 00:20:23,425
T_5 has to be the type of
the content of the set,

391
00:20:23,425 --> 00:20:24,490
and because Alpha is,

392
00:20:24,490 --> 00:20:26,485
again, unconstrained at this point,

393
00:20:26,485 --> 00:20:29,000
we use a fresh variable in c_2.

394
00:20:29,070 --> 00:20:31,165
Then we move to A.

395
00:20:31,165 --> 00:20:32,710
A is a user-defined operator,

396
00:20:32,710 --> 00:20:35,350
so we're going to need to
create a fresh instance of A

397
00:20:35,350 --> 00:20:39,085
where we replace the
arguments data if it has any,

398
00:20:39,085 --> 00:20:43,735
correctly with the
arguments passed 2A.

399
00:20:43,735 --> 00:20:46,330
In this case, A is null, so
this isn't a consideration,

400
00:20:46,330 --> 00:20:50,065
but we would have to do this
at this point in general.

401
00:20:50,065 --> 00:20:52,150
To analyze the type of A,

402
00:20:52,150 --> 00:20:53,500
we have to look at the body of A.

403
00:20:53,500 --> 00:20:55,960
So we repeat the process we
just performed on the body

404
00:20:55,960 --> 00:20:58,470
of B on the body of A.

405
00:20:58,470 --> 00:21:00,930
But this time we're going
to take instead of t_1

406
00:21:00,930 --> 00:21:03,420
as at the top level term
to encode the body of A,

407
00:21:03,420 --> 00:21:07,045
we're going to use the
here freshly created c_3

408
00:21:07,045 --> 00:21:09,190
for this particular instance of A,

409
00:21:09,190 --> 00:21:10,900
which might be important if they had

410
00:21:10,900 --> 00:21:13,670
different ways of passing parameters.

411
00:21:13,920 --> 00:21:18,115
For plus, plus, again
has a known type.

412
00:21:18,115 --> 00:21:19,600
It's int and int to int, so we

413
00:21:19,600 --> 00:21:21,655
create the constraints
that mirror that.

414
00:21:21,655 --> 00:21:23,875
Then we finish off
by looking at the x,

415
00:21:23,875 --> 00:21:27,535
which has the SMT variable v_x,

416
00:21:27,535 --> 00:21:30,190
and 2, which is an integer literal.

417
00:21:30,190 --> 00:21:32,185
So it just becomes if.

418
00:21:32,185 --> 00:21:34,660
You have this set of constraints.

419
00:21:34,660 --> 00:21:36,550
What we do with these constraints is

420
00:21:36,550 --> 00:21:38,500
we pass it to an SMT solver and

421
00:21:38,500 --> 00:21:40,510
the SMT solver will find that this is

422
00:21:40,510 --> 00:21:43,075
a satisfiable set of constraints.

423
00:21:43,075 --> 00:21:45,970
Our recovery will then
from the SMT model,

424
00:21:45,970 --> 00:21:48,880
recover the following types
for all of the variables,

425
00:21:48,880 --> 00:21:51,910
constants, and operators
in this specification.

426
00:21:51,910 --> 00:21:54,220
As you can see, p has to be Boolean

427
00:21:54,220 --> 00:21:56,335
because it's part of the
if and else condition.

428
00:21:56,335 --> 00:22:00,685
Similarly, A is a nullary
operator that prefers an integer.

429
00:22:00,685 --> 00:22:03,700
The other ones follow from
bool of if then else.

430
00:22:03,700 --> 00:22:06,925
Both branches of the nulls have
to return the set of integers,

431
00:22:06,925 --> 00:22:09,625
and then the body of B has
types of integers as this

432
00:22:09,625 --> 00:22:14,290
y because set the of A
has type set of integer.

433
00:22:14,290 --> 00:22:16,540
This is exactly what
you would expect.

434
00:22:16,540 --> 00:22:18,160
Of course this happens automatically.

435
00:22:18,160 --> 00:22:19,510
I didn't have to think
about any of this.

436
00:22:19,510 --> 00:22:23,035
I just had to push the "Run"
button and I got this for free.

437
00:22:23,035 --> 00:22:24,730
You can see how this would work.

438
00:22:24,730 --> 00:22:27,385
In larger specifications,
the constraints would be

439
00:22:27,385 --> 00:22:28,630
much bigger and it will be much

440
00:22:28,630 --> 00:22:31,195
harder for me to manually solve them.

441
00:22:31,195 --> 00:22:33,850
So such a tool that
would be very welcome,

442
00:22:33,850 --> 00:22:36,430
especially in the designing phase

443
00:22:36,430 --> 00:22:39,320
of the specification before
it's fully completed.

444
00:22:40,980 --> 00:22:44,020
Next, let's look at some experiments.

445
00:22:44,020 --> 00:22:47,500
We took 35 specifications from the
community repository you see on

446
00:22:47,500 --> 00:22:51,805
the screen and we basically
ran them through the tool.

447
00:22:51,805 --> 00:22:54,400
Here you see the benchmark sizes.

448
00:22:54,400 --> 00:22:57,100
The reason you see 42
instead of 35 is because

449
00:22:57,100 --> 00:23:00,010
the specifications highlighted
in red had type errors.

450
00:23:00,010 --> 00:23:02,950
So we added a minimal
fixed specification to

451
00:23:02,950 --> 00:23:05,125
this benchmark to show

452
00:23:05,125 --> 00:23:07,660
how you can make those
specification step correct.

453
00:23:07,660 --> 00:23:09,970
We ended up with most specifications

454
00:23:09,970 --> 00:23:12,729
being under 200, maybe under 400,

455
00:23:12,729 --> 00:23:15,580
and then this one
outlier specification

456
00:23:15,580 --> 00:23:19,240
reaches roughly 1,100-1,200
lines of coding.

457
00:23:19,240 --> 00:23:21,370
These are realistic sizes.

458
00:23:21,370 --> 00:23:25,135
These are the specifications you
would come across in real life.

459
00:23:25,135 --> 00:23:28,670
So think it's quite the
representative benchmarks.

460
00:23:29,040 --> 00:23:32,150
Let's look at how the tool did.

461
00:23:32,190 --> 00:23:35,650
Good news, the tool took less

462
00:23:35,650 --> 00:23:38,170
than a second for the majority
of the specifications,

463
00:23:38,170 --> 00:23:39,670
so with this outlier being

464
00:23:39,670 --> 00:23:41,710
roughly 10 seconds for the

465
00:23:41,710 --> 00:23:44,155
largest 1,100 lines of
code specification,

466
00:23:44,155 --> 00:23:49,200
which is the protocol
802.16-AuthorizationPKMv35.

467
00:23:49,200 --> 00:23:52,030
[inaudible].

468
00:23:52,030 --> 00:23:53,665
There's benchmarks that includes

469
00:23:53,665 --> 00:23:56,890
many other well-known
other specification,

470
00:23:56,890 --> 00:24:01,150
it includes the CarTalkPuzzle
that Igor showed you in his demo.

471
00:24:01,150 --> 00:24:03,850
Only in this case, we didn't
have to manually annotate

472
00:24:03,850 --> 00:24:06,715
anything and the tool
still found the type of,

473
00:24:06,715 --> 00:24:09,025
for example, the recursive
function sum to be

474
00:24:09,025 --> 00:24:12,550
the same type that the
annotation was used.

475
00:24:12,550 --> 00:24:17,155
But it did so automatically
without need of user interference.

476
00:24:17,155 --> 00:24:21,370
But it also managed to type
check in for types work,

477
00:24:21,370 --> 00:24:24,295
Pascos, and [inaudible]
algorithm, Bosco.

478
00:24:24,295 --> 00:24:26,770
Interestingly enough, our
Raft had a type error which

479
00:24:26,770 --> 00:24:30,385
we corrected as in specification 39.

480
00:24:30,385 --> 00:24:32,470
These are the benchmarks.

481
00:24:32,470 --> 00:24:37,885
The times look to be very good
for this [inaudible] tool.

482
00:24:37,885 --> 00:24:39,520
There were some quotes,

483
00:24:39,520 --> 00:24:42,670
some interesting site
cases, let's say.

484
00:24:42,670 --> 00:24:44,470
One benchmark, for example,

485
00:24:44,470 --> 00:24:47,560
Queens.tla did not type check.

486
00:24:47,560 --> 00:24:51,130
But for the reason that it

487
00:24:51,130 --> 00:24:53,635
uses queens in one instance

488
00:24:53,635 --> 00:24:56,140
as a sequence and then the
other instances as a function,

489
00:24:56,140 --> 00:24:58,810
which in our type system,

490
00:24:58,810 --> 00:25:02,695
does not permit it because we
treat those things as separate,

491
00:25:02,695 --> 00:25:08,140
independent objects that
you cannot unify through.

492
00:25:08,140 --> 00:25:11,110
But of course, in the untyped
interpretation of tla,

493
00:25:11,110 --> 00:25:13,945
functions and sequences
are one and the same,

494
00:25:13,945 --> 00:25:20,350
so using either syntax layer

495
00:25:20,350 --> 00:25:24,940
nor the function sets syntax
interchangeably is perfectly valid.

496
00:25:24,940 --> 00:25:27,220
What you could do in
our framework to solve

497
00:25:27,220 --> 00:25:28,900
this concrete problem
is you could enter

498
00:25:28,900 --> 00:25:30,910
this casting operator, in which case,

499
00:25:30,910 --> 00:25:33,040
you cast on the two expressions,

500
00:25:33,040 --> 00:25:35,290
and you would quickly find that this

501
00:25:35,290 --> 00:25:39,550
is a then valid
undercasting [inaudible] ,

502
00:25:39,550 --> 00:25:41,080
which we don't currently support.

503
00:25:41,080 --> 00:25:43,374
The other one which is interesting

504
00:25:43,374 --> 00:25:46,105
is the DijkstraMutex specification.

505
00:25:46,105 --> 00:25:48,580
This one is interesting
because it's an instance of

506
00:25:48,580 --> 00:25:51,790
a specification that is
statically not typeable,

507
00:25:51,790 --> 00:25:56,710
but would be correct under
a dynamic typing scheme.

508
00:25:56,710 --> 00:25:59,110
The way these operators are set up,

509
00:25:59,110 --> 00:26:01,795
so Ia3A to la4A,

510
00:26:01,795 --> 00:26:04,390
guarantee that any execution will

511
00:26:04,390 --> 00:26:06,685
traverse them in a
very specific order.

512
00:26:06,685 --> 00:26:08,920
The way this temp variable is being

513
00:26:08,920 --> 00:26:10,990
used is being used basically
as a place holder,

514
00:26:10,990 --> 00:26:13,045
as a miscellaneous, if you will,

515
00:26:13,045 --> 00:26:15,370
where a certain value
is just propagated into

516
00:26:15,370 --> 00:26:17,755
the next phase instead of having

517
00:26:17,755 --> 00:26:20,200
to define a fresh variable for

518
00:26:20,200 --> 00:26:23,140
each instance or for the
phase between la3A and la4A.

519
00:26:23,140 --> 00:26:25,015
Attempt is just being reused,

520
00:26:25,015 --> 00:26:29,320
but it's being used in such a way
that it's not consistent typed.

521
00:26:29,320 --> 00:26:32,170
For example, it's
initially defined to be

522
00:26:32,170 --> 00:26:34,930
an integer, but then at some point,

523
00:26:34,930 --> 00:26:39,130
it changes to be set of integers,

524
00:26:39,130 --> 00:26:40,900
which is not allowed in our system.

525
00:26:40,900 --> 00:26:46,420
This, you cannot just
fix with [inaudible].

526
00:26:46,420 --> 00:26:48,850
These are the experiments. Now, let's

527
00:26:48,850 --> 00:26:51,380
look at what happens under the hood.

528
00:26:52,110 --> 00:26:54,910
Under the hood, what we would want

529
00:26:54,910 --> 00:26:57,685
is to pick a type and return return.

530
00:26:57,685 --> 00:27:00,310
It seems like this will
work for most things,

531
00:27:00,310 --> 00:27:02,560
but unfortunately,
there are exceptions,

532
00:27:02,560 --> 00:27:04,555
and those exceptions are records.

533
00:27:04,555 --> 00:27:06,160
The problem with records is there's

534
00:27:06,160 --> 00:27:08,680
an infinite family of records
with arbitrary numbers of

535
00:27:08,680 --> 00:27:10,870
fields and you want to have

536
00:27:10,870 --> 00:27:12,190
a finite number of terms at

537
00:27:12,190 --> 00:27:14,680
the SMT encoding level.
How do we solve this?

538
00:27:14,680 --> 00:27:16,540
Well, we solve this by cheating.

539
00:27:16,540 --> 00:27:19,480
We start the process by looking at

540
00:27:19,480 --> 00:27:22,240
a specification and in
any given specification,

541
00:27:22,240 --> 00:27:23,860
there are only finitely
many couple of

542
00:27:23,860 --> 00:27:26,560
indices and only finitely
many record fields.

543
00:27:26,560 --> 00:27:30,280
We take that information and we
use this to modify our encoding on

544
00:27:30,280 --> 00:27:33,100
a specification-specification
basis to avoid

545
00:27:33,100 --> 00:27:37,210
having to introduce infinitely
many term constructors.

546
00:27:37,210 --> 00:27:42,085
This is our encoding. So we
define abstract datatype S_T.

547
00:27:42,085 --> 00:27:44,800
Most functions are
what you would expect.

548
00:27:44,800 --> 00:27:47,365
You have the classic
constants input string,

549
00:27:47,365 --> 00:27:50,170
and then the function
set sequence and fun.

550
00:27:50,170 --> 00:27:54,070
Then we have these special
constructors for records, and couples,

551
00:27:54,070 --> 00:27:59,545
and operators where we use one
constructor for, for example,

552
00:27:59,545 --> 00:28:01,705
all sizes of records,

553
00:28:01,705 --> 00:28:03,730
and that one constructor conceptually

554
00:28:03,730 --> 00:28:06,700
has all of the possible
fields in the specification.

555
00:28:06,700 --> 00:28:09,130
Then you use additional analysis in

556
00:28:09,130 --> 00:28:13,945
the recovery phase to figure out
when a record term is given,

557
00:28:13,945 --> 00:28:15,850
which record it actually represent

558
00:28:15,850 --> 00:28:17,560
and filter out per user fields so we

559
00:28:17,560 --> 00:28:22,195
don't get to be the record
type that is then unusable.

560
00:28:22,195 --> 00:28:24,850
But this allows us to
have one constructor.

561
00:28:24,850 --> 00:28:26,290
Or in the case of operators,

562
00:28:26,290 --> 00:28:31,340
finitely many constructors
per abstract type family.

563
00:28:31,440 --> 00:28:37,195
Some examples. Int obviously
is translated into int,

564
00:28:37,195 --> 00:28:40,135
so the value of 1 would be
encoded as lowercase int.

565
00:28:40,135 --> 00:28:41,785
But on the other end,

566
00:28:41,785 --> 00:28:44,860
tuples, for example,
the tuple 1 and TRUE,

567
00:28:44,860 --> 00:28:47,935
are encoded with the term tup,

568
00:28:47,935 --> 00:28:52,360
which takes more than just the
two arguments for 1 and TRUE.

569
00:28:52,360 --> 00:28:54,640
It fix the tuple size,

570
00:28:54,640 --> 00:28:55,975
which is known to be 2.

571
00:28:55,975 --> 00:28:59,260
Int and bool, which are the types
of the first two arguments,

572
00:28:59,260 --> 00:29:01,960
and then it takes the additional
arguments equal to the number

573
00:29:01,960 --> 00:29:06,070
of indices up to the maximal
index in the specification.

574
00:29:06,070 --> 00:29:08,110
The recovery will then take care

575
00:29:08,110 --> 00:29:11,080
of making sure that because
the tuple size is 2,

576
00:29:11,080 --> 00:29:13,390
we only recover the first
two fields because those are

577
00:29:13,390 --> 00:29:16,690
the actually present
fields in this tuple type.

578
00:29:16,690 --> 00:29:19,330
Now, let's talk about
actually encoding types.

579
00:29:19,330 --> 00:29:24,860
How do you do this,he process that
I've just hinted at previously?

580
00:29:25,470 --> 00:29:28,840
We use the type-term compatibility,

581
00:29:28,840 --> 00:29:32,920
the syntax that you see here on
the screen to encode that term.

582
00:29:32,920 --> 00:29:36,025
This x represents a type,
in this case, term.

583
00:29:36,025 --> 00:29:39,310
Let's give you an example and
not a definition. Let's see.

584
00:29:39,310 --> 00:29:44,125
You have the tuple type Alpha_1
int and record h to Alpha_2,

585
00:29:44,125 --> 00:29:47,110
and you're in a specification
with the maximal tuples as

586
00:29:47,110 --> 00:29:50,020
a four and only two record fields,

587
00:29:50,020 --> 00:29:51,730
of which h is the second.

588
00:29:51,730 --> 00:29:54,085
The encoding for this would be the t,

589
00:29:54,085 --> 00:29:57,430
which is the term that
represents the type-term,

590
00:29:57,430 --> 00:29:59,575
is a tuple of

591
00:29:59,575 --> 00:30:02,425
size three because that's the
number of fields [inaudible] has,

592
00:30:02,425 --> 00:30:04,465
but it takes four arguments,

593
00:30:04,465 --> 00:30:05,710
c_1 through c_4 because that's

594
00:30:05,710 --> 00:30:08,605
the maximal number of fields
in the specification.

595
00:30:08,605 --> 00:30:13,000
However, only c_1 through c_3
have additional constraints that

596
00:30:13,000 --> 00:30:17,830
come from the actual
arguments of the tuple type.

597
00:30:17,830 --> 00:30:19,630
C_1 has to be Alpha_1,

598
00:30:19,630 --> 00:30:21,415
and because Alpha_1
is a type variable,

599
00:30:21,415 --> 00:30:23,620
there's no much more we can
do besides assign it in

600
00:30:23,620 --> 00:30:26,440
SMT variable a_1 with no
additional constraints.

601
00:30:26,440 --> 00:30:30,905
C_2 is an integer because int
maps to lowercase integer,

602
00:30:30,905 --> 00:30:33,120
and c_3 is however you resolve

603
00:30:33,120 --> 00:30:35,550
the encoding of the
record type h to Alpha_2.

604
00:30:35,550 --> 00:30:41,295
In our case, this is by assigning
c_3 to be the record type c_5,

605
00:30:41,295 --> 00:30:43,750
c_6 because NF is 2,

606
00:30:43,750 --> 00:30:47,545
so we have two record fields of
which h is the second because

607
00:30:47,545 --> 00:30:52,345
the second c_6 has the additional
constraint that encodes Alpha_2.

608
00:30:52,345 --> 00:30:56,275
But because, again, Alpha_2 is
an unconstrained type variable,

609
00:30:56,275 --> 00:30:59,725
it just gets assigned
to variable a_2.

610
00:30:59,725 --> 00:31:03,350
This is how you would
encode type-term.

611
00:31:04,380 --> 00:31:06,850
You get these types that you wish to

612
00:31:06,850 --> 00:31:09,745
encode from built-in operators.

613
00:31:09,745 --> 00:31:13,000
Essentially, with the exception
of user-defined operators,

614
00:31:13,000 --> 00:31:15,835
the entirety of a TLS
specification is just each chain

615
00:31:15,835 --> 00:31:20,080
of the built-in
operator applications.

616
00:31:20,080 --> 00:31:24,190
If we're able to figure out the
types of those expressions,

617
00:31:24,190 --> 00:31:28,120
this should give us constraints
for the entire specification.

618
00:31:28,120 --> 00:31:32,185
Operators have types ranging
from simple to more complex.

619
00:31:32,185 --> 00:31:34,840
For example, plus has
the very simple type,

620
00:31:34,840 --> 00:31:37,645
it takes two integers and
it returns an integer.

621
00:31:37,645 --> 00:31:41,230
On the other hand, set
union is polymorphic,

622
00:31:41,230 --> 00:31:42,730
it takes two sets of

623
00:31:42,730 --> 00:31:45,440
the same kind and returns
another set of that same kind.

624
00:31:45,440 --> 00:31:46,860
But it could be anything,

625
00:31:46,860 --> 00:31:48,810
it can take two integer
sets or it can take

626
00:31:48,810 --> 00:31:51,570
two sets of strings or it can
take two sets of records.

627
00:31:51,570 --> 00:31:54,930
As long as the two arguments
are of the same type,

628
00:31:54,930 --> 00:31:58,180
it is valid to apply set union.

629
00:31:58,310 --> 00:32:03,850
The particularity with the set
construction operator is that

630
00:32:03,850 --> 00:32:09,220
here we do have an infinite
family of set constructors,

631
00:32:09,220 --> 00:32:10,780
but that's not a problem because it's

632
00:32:10,780 --> 00:32:13,180
an infinite family
in the type system,

633
00:32:13,180 --> 00:32:15,025
not in the SMT part.

634
00:32:15,025 --> 00:32:18,010
We have one operator priority,

635
00:32:18,010 --> 00:32:21,474
and all of them take n arguments

636
00:32:21,474 --> 00:32:23,305
of the same type and return a set.

637
00:32:23,305 --> 00:32:26,320
Now, what is interesting
and different from what

638
00:32:26,320 --> 00:32:28,960
you might have seen is
the treatment of certain,

639
00:32:28,960 --> 00:32:30,790
what we call overload operators.

640
00:32:30,790 --> 00:32:32,770
In this case, for example,
the tuple constructor.

641
00:32:32,770 --> 00:32:35,950
But overloaded means, is
that the single operator,

642
00:32:35,950 --> 00:32:38,290
syntactically, is able to produce

643
00:32:38,290 --> 00:32:41,005
expressions of two different
types that are incompatible.

644
00:32:41,005 --> 00:32:45,610
In this case, the
tuple constructor has,

645
00:32:45,610 --> 00:32:49,915
basically, two types
that make up its,

646
00:32:49,915 --> 00:32:52,060
what we call complex schema.

647
00:32:52,060 --> 00:32:54,910
The first one is a type that takes

648
00:32:54,910 --> 00:32:58,045
two arguments of any types
and returns a tuple,

649
00:32:58,045 --> 00:33:00,700
and the second one takes

650
00:33:00,700 --> 00:33:03,655
two arguments of the same
type and returns a sequence.

651
00:33:03,655 --> 00:33:05,890
What this basically means is that,

652
00:33:05,890 --> 00:33:08,320
in a vacuum, this operator
can be used for either,

653
00:33:08,320 --> 00:33:10,600
but we don't know which
it's actually being used

654
00:33:10,600 --> 00:33:13,990
for until we take additional
constraints from the context.

655
00:33:13,990 --> 00:33:17,530
I had already used the
word schema before.

656
00:33:17,530 --> 00:33:21,325
Schema is what we refer to
the type of an operator.

657
00:33:21,325 --> 00:33:24,250
So the type of an operator
is called a schema,

658
00:33:24,250 --> 00:33:27,805
and it's primitive if the
operator is not overloaded.

659
00:33:27,805 --> 00:33:29,844
The primitive schema basically

660
00:33:29,844 --> 00:33:33,850
declares the type variables that
the operator has as polymorphic,

661
00:33:33,850 --> 00:33:35,560
and then tells you the types of

662
00:33:35,560 --> 00:33:37,255
the arguments and
the type of returns.

663
00:33:37,255 --> 00:33:39,880
X_1 through X_n are the
types of the arguments,

664
00:33:39,880 --> 00:33:41,380
Y is the type of return,

665
00:33:41,380 --> 00:33:43,630
and this operator is

666
00:33:43,630 --> 00:33:46,645
polymorphic and it takes
k number of variables.

667
00:33:46,645 --> 00:33:49,000
Or it could be a complex schema which

668
00:33:49,000 --> 00:33:50,740
belongs to an overloaded operator,

669
00:33:50,740 --> 00:33:54,640
which is just a finite
collection of primitive schemes.

670
00:33:54,640 --> 00:33:58,495
We'll see later that the complex
schemas are really where the,

671
00:33:58,495 --> 00:34:03,830
pun intended, complexity of
solving this problem comes.

672
00:34:04,710 --> 00:34:08,410
How do we make a constraint
from a known schema?

673
00:34:08,410 --> 00:34:11,965
Well, we compute what is
called a schema instance.

674
00:34:11,965 --> 00:34:14,125
The schema instance encodes the fact

675
00:34:14,125 --> 00:34:16,690
that this particular instance of

676
00:34:16,690 --> 00:34:19,630
operator application
is compatible with

677
00:34:19,630 --> 00:34:22,870
one of the instantiations
of the operator types.

678
00:34:22,870 --> 00:34:25,765
Because the operator is
polymorphic in general,

679
00:34:25,765 --> 00:34:29,860
and any particular application
has complete arguments,

680
00:34:29,860 --> 00:34:30,925
we just have to make sure that

681
00:34:30,925 --> 00:34:38,280
the instance of this
operator is typed correctly.

682
00:34:38,280 --> 00:34:44,290
You can interpret this schema
instance as reading a term, t-hat,

683
00:34:44,290 --> 00:34:46,120
encodes the type of the result,

684
00:34:46,120 --> 00:34:48,670
and terms t_1 through
t_n encode the types of

685
00:34:48,670 --> 00:34:53,420
the n arguments of the operator
to which the schema belongs.

686
00:34:53,970 --> 00:34:57,100
Problem is that we have

687
00:34:57,100 --> 00:35:00,535
disjunctive constraints that
come from complex schemas,

688
00:35:00,535 --> 00:35:03,640
and this is really why we
need to use SMT because SMT

689
00:35:03,640 --> 00:35:07,570
gives us a nice way of solving
disjunctive constraints.

690
00:35:07,570 --> 00:35:10,420
Whereas, in a purely
conjunctive setting,

691
00:35:10,420 --> 00:35:12,505
we can maybe use something
else, something more efficient.

692
00:35:12,505 --> 00:35:14,350
However, because we're
not in that setting,

693
00:35:14,350 --> 00:35:17,110
we have to rely on
SMT solver to solve

694
00:35:17,110 --> 00:35:20,875
this set of potentially
disjunctive constraints for us.

695
00:35:20,875 --> 00:35:23,380
How do we put all that
we've learned together and

696
00:35:23,380 --> 00:35:25,945
actually determine the
types of TLA expressions?

697
00:35:25,945 --> 00:35:30,640
We use the similar syntax we
were using before to denote

698
00:35:30,640 --> 00:35:32,650
that a term x represents
the type of TLA

699
00:35:32,650 --> 00:35:35,260
expression e. The way we do this,

700
00:35:35,260 --> 00:35:36,340
I'm not going to give
you definitions,

701
00:35:36,340 --> 00:35:37,900
I'm going to show you by example.

702
00:35:37,900 --> 00:35:39,460
The way we do this, for example,

703
00:35:39,460 --> 00:35:42,340
for literals, is exactly
what you would expect.

704
00:35:42,340 --> 00:35:43,930
An integer literal, for example,

705
00:35:43,930 --> 00:35:46,030
42, you can just say that,

706
00:35:46,030 --> 00:35:48,520
if it's trying to be
represented by the term x,

707
00:35:48,520 --> 00:35:52,045
then x is an int, and
that's the end of it.

708
00:35:52,045 --> 00:35:54,625
More complex, for example,

709
00:35:54,625 --> 00:35:59,155
the application of plus or
any other built-in operator.

710
00:35:59,155 --> 00:36:02,410
In that case, we have to
analyze three things.

711
00:36:02,410 --> 00:36:05,320
First, we have to figure
out the schema instance for

712
00:36:05,320 --> 00:36:11,050
plus for which we compute
s plus c_1, c_2, x.

713
00:36:11,050 --> 00:36:13,390
The return of plus,

714
00:36:13,390 --> 00:36:14,920
which is equal to the body

715
00:36:14,920 --> 00:36:17,050
of the expression we're
currently analyzing,

716
00:36:17,050 --> 00:36:18,850
has to be represented by x.

717
00:36:18,850 --> 00:36:20,395
For the two arguments,

718
00:36:20,395 --> 00:36:21,835
you want to need two in this case,

719
00:36:21,835 --> 00:36:24,505
we have to create
fresh SMT variables,

720
00:36:24,505 --> 00:36:26,050
c_1 and c_2, to represent them.

721
00:36:26,050 --> 00:36:28,120
Then we compute the schema instance.

722
00:36:28,120 --> 00:36:30,010
In addition, we recurse over

723
00:36:30,010 --> 00:36:31,600
all the arguments
because the arguments

724
00:36:31,600 --> 00:36:33,250
themselves might be
complex expressions,

725
00:36:33,250 --> 00:36:35,575
and we compute their
representations as well.

726
00:36:35,575 --> 00:36:39,970
So we compute the
encoding of e_1 into c_1,

727
00:36:39,970 --> 00:36:42,770
and we compute the
encoding of e_2 into c_2.

728
00:36:43,020 --> 00:36:47,080
For plus, the schema
in question here,

729
00:36:47,080 --> 00:36:49,360
s plus, is just int, int to int.

730
00:36:49,360 --> 00:36:52,015
It would tell us that
c_1 has to be an int,

731
00:36:52,015 --> 00:36:54,235
c_2 has to be an int,
and x has to be an int,

732
00:36:54,235 --> 00:36:56,600
as expected for plus.

733
00:36:56,970 --> 00:36:59,620
The more interesting case is the one

734
00:36:59,620 --> 00:37:02,140
with an overloaded
operator, for example.

735
00:37:02,140 --> 00:37:04,450
If we had the expression tuple 1, 2,

736
00:37:04,450 --> 00:37:06,250
we do the same thing as before,

737
00:37:06,250 --> 00:37:08,620
but now the schema
instance is different.

738
00:37:08,620 --> 00:37:11,409
Because the operator
here is overloaded,

739
00:37:11,409 --> 00:37:13,854
the schema instance is essentially

740
00:37:13,854 --> 00:37:16,900
a disjunction of two
different things.

741
00:37:16,900 --> 00:37:18,730
It's a schema instance for s_1 and

742
00:37:18,730 --> 00:37:21,010
a schema instance for
s_2, where, here,

743
00:37:21,010 --> 00:37:24,175
s_1 and s_2 are the two
primitive schemas that make up

744
00:37:24,175 --> 00:37:29,830
the complex schema which
this operator holds.

745
00:37:29,830 --> 00:37:31,810
Everything else is exactly the same,

746
00:37:31,810 --> 00:37:33,430
so we recurse on one,

747
00:37:33,430 --> 00:37:35,890
we recurse on two in
exactly the same way.

748
00:37:35,890 --> 00:37:38,545
These two schemas, concretely,

749
00:37:38,545 --> 00:37:40,675
for the tuple constructor,

750
00:37:40,675 --> 00:37:42,745
are the tuple schema s_1,

751
00:37:42,745 --> 00:37:44,920
where we take two
arguments and return

752
00:37:44,920 --> 00:37:48,610
a tuple of potentially
different types,

753
00:37:48,610 --> 00:37:50,650
and the schema s_2, which we take

754
00:37:50,650 --> 00:37:53,320
two arguments of the same type
and return a sequence of that.

755
00:37:53,320 --> 00:37:56,755
We compute both of these
constraints for s_1 and for s_2,

756
00:37:56,755 --> 00:37:59,690
and then we just add the structure.

757
00:37:59,700 --> 00:38:04,420
Here is really where the
complexity comes from if anywhere.

758
00:38:04,420 --> 00:38:06,565
This is how you would do it.

759
00:38:06,565 --> 00:38:08,950
Of course, there are special cases.

760
00:38:08,950 --> 00:38:12,595
There are cases where you need to
manage user-defined operators.

761
00:38:12,595 --> 00:38:14,680
There are cases where you need to

762
00:38:14,680 --> 00:38:17,020
match [inaudible] expressions
and stuff like that.

763
00:38:17,020 --> 00:38:19,690
I'm not going to give you
expressions for those.

764
00:38:19,690 --> 00:38:21,340
If you're interested,
contact me or Igor,

765
00:38:21,340 --> 00:38:25,490
and we'll point you
towards our paper.

766
00:38:26,430 --> 00:38:28,810
Now, the question is,

767
00:38:28,810 --> 00:38:30,280
the thing we're doing,

768
00:38:30,280 --> 00:38:33,220
is it actually correct?
Or what does that mean?

769
00:38:33,220 --> 00:38:34,870
We have a theorem that says that,

770
00:38:34,870 --> 00:38:38,350
if we manage to find a model and
recover types from that model,

771
00:38:38,350 --> 00:38:40,480
the types we recover are sensible.

772
00:38:40,480 --> 00:38:42,820
What that means is
that every variable in

773
00:38:42,820 --> 00:38:46,060
the specification is type
consistently across all of its uses,

774
00:38:46,060 --> 00:38:47,830
across all of the operators.

775
00:38:47,830 --> 00:38:50,800
Second, although the
types that we do infer,

776
00:38:50,800 --> 00:38:53,380
because we infer types for every
intermediate expression as well,

777
00:38:53,380 --> 00:38:56,320
are compatible with what
the schemas accept.

778
00:38:56,320 --> 00:38:57,970
For example, the schema of

779
00:38:57,970 --> 00:39:00,300
an overloaded operator would

780
00:39:00,300 --> 00:39:03,150
accept a type that's a
model type of either one,

781
00:39:03,150 --> 00:39:08,835
and we always get one of those as
a result as stated by theorem.

782
00:39:08,835 --> 00:39:11,865
Essentially, this theorem
says that the types we get

783
00:39:11,865 --> 00:39:15,735
back from the model are
useful and are reasonable.

784
00:39:15,735 --> 00:39:18,030
In conclusion, we provide you

785
00:39:18,030 --> 00:39:19,815
with the tools for
either type checking

786
00:39:19,815 --> 00:39:23,700
or type inferring types in
your TLA+ specifications,

787
00:39:23,700 --> 00:39:26,235
which you can use to
make your work easier,

788
00:39:26,235 --> 00:39:29,030
to make fewer mistakes
and catch them quicker.

789
00:39:29,030 --> 00:39:31,390
These tools work in practice in

790
00:39:31,390 --> 00:39:34,150
a reasonable amount
of time and return

791
00:39:34,150 --> 00:39:39,550
types which we believe are useful
to TLA specification designers.

792
00:39:39,550 --> 00:39:41,710
Thank you for listening.
If you have any questions,

793
00:39:41,710 --> 00:39:43,270
please direct them to me or Igor,

794
00:39:43,270 --> 00:39:46,570
and we'll be happy to answer any
of them. Thank you very much.