1 00:00:00,000 --> 00:00:02,910 >> Hello. My name is Igor Konnov, 2 00:00:02,910 --> 00:00:06,420 I'm working at Informal Systems. 3 00:00:06,420 --> 00:00:09,570 I'm going to give you the first part of 4 00:00:09,570 --> 00:00:12,960 this talk on type inference for TLA+ in Apalache. 5 00:00:12,960 --> 00:00:16,395 The second part will begin by Jure Kukovec who's 6 00:00:16,395 --> 00:00:20,160 finishing his visit at Turin. 7 00:00:20,160 --> 00:00:24,660 We decided to split this job of recording the video. 8 00:00:24,660 --> 00:00:29,350 That's why we have basically two talks for the price of one. 9 00:00:31,640 --> 00:00:35,700 The title says we do type inference in Apalache. 10 00:00:35,700 --> 00:00:36,930 If you haven't seen this tool, 11 00:00:36,930 --> 00:00:40,140 you can go on the Apalache webpage on GitHub. 12 00:00:40,140 --> 00:00:42,390 That's a symbolic model checker 13 00:00:42,390 --> 00:00:45,690 for TLA+ as you develop them at Informal Systems, 14 00:00:45,690 --> 00:00:48,460 and it happened to be developed at Turin. 15 00:00:50,240 --> 00:00:53,670 In a nutshell, we have been 16 00:00:53,670 --> 00:00:57,295 developing this model checker for about five years. 17 00:00:57,295 --> 00:01:00,845 It reduces the verification questions too, 18 00:01:00,845 --> 00:01:02,900 since you solve them and we had been 19 00:01:02,900 --> 00:01:06,080 focusing on application of 20 00:01:06,080 --> 00:01:08,780 this model checker in the context of fault tolerance, 21 00:01:08,780 --> 00:01:11,675 distributed algorithms, and blockchains. 22 00:01:11,675 --> 00:01:13,940 That's what we are doing at Informal Systems. 23 00:01:13,940 --> 00:01:17,990 Now, if you want to find more information about how it's working, 24 00:01:17,990 --> 00:01:23,395 you can check that OOPSLA paper that is mentioned on the slide. 25 00:01:23,395 --> 00:01:26,585 However, this talk is not about the model checker, 26 00:01:26,585 --> 00:01:30,320 although that's a very interesting topic for me. 27 00:01:30,320 --> 00:01:34,685 This talk is about how you go from 28 00:01:34,685 --> 00:01:40,985 an untyped language like TLA+ to a typed language like SMT. 29 00:01:40,985 --> 00:01:45,050 I guess SMT solvers require types and codings. 30 00:01:45,050 --> 00:01:52,455 Essentially, this talk is about how you type an untyped language. 31 00:01:52,455 --> 00:01:55,595 In this talk, they're going to present to you two tools. 32 00:01:55,595 --> 00:01:57,830 One is the Type checker, 33 00:01:57,830 --> 00:01:59,435 the new Type checker we have, 34 00:01:59,435 --> 00:02:01,990 and another one is a Type inference tool. 35 00:02:01,990 --> 00:02:04,310 The type checker is planned for the next release. 36 00:02:04,310 --> 00:02:06,640 It's breaking out. 37 00:02:06,640 --> 00:02:10,080 That's some local analysis of operator definitions. 38 00:02:10,080 --> 00:02:12,335 If it cannot deliver, 39 00:02:12,335 --> 00:02:15,690 it asks the user to give it to some annotations. 40 00:02:15,690 --> 00:02:18,520 Type inferer doesn't require any annotations and 41 00:02:18,520 --> 00:02:21,775 thus global analysis of the specification. 42 00:02:21,775 --> 00:02:25,720 However, it's much harder to make it robust tool. 43 00:02:25,720 --> 00:02:27,975 That's why it's still a prototype. 44 00:02:27,975 --> 00:02:31,840 It's actually working but it will take a bit more time to make 45 00:02:31,840 --> 00:02:36,850 it a tool that would be used for either origins. 46 00:02:36,850 --> 00:02:39,160 What's important here, that the both tools are 47 00:02:39,160 --> 00:02:41,495 completely independent of the model checker. 48 00:02:41,495 --> 00:02:43,930 You don't have to understand the model checker. 49 00:02:43,930 --> 00:02:45,970 You don't have to use the model checker in 50 00:02:45,970 --> 00:02:49,490 order to do type inference tools. 51 00:02:49,680 --> 00:02:51,910 Let's talk about the type checker. 52 00:02:51,910 --> 00:02:55,505 I will focus on it then Jure will talk about the type inference. 53 00:02:55,505 --> 00:02:58,800 Okay, let's have a quick demo here. 54 00:02:58,800 --> 00:03:03,310 I'll show you how the type checker is working on a simple example. 55 00:03:03,310 --> 00:03:05,905 It's just called CarTalkPuzzle. 56 00:03:05,905 --> 00:03:10,705 I took it from the repository of the TLA specifications. 57 00:03:10,705 --> 00:03:14,425 You can look the specification yourself. 58 00:03:14,425 --> 00:03:16,945 If you look at it, you'll see that there are 59 00:03:16,945 --> 00:03:21,370 some non-trivial definitions that use recursion, 60 00:03:21,370 --> 00:03:23,960 that use functions sets. 61 00:03:23,960 --> 00:03:26,020 It's not always clear 62 00:03:26,020 --> 00:03:28,900 what the arguments are that breakers are doing, 63 00:03:28,900 --> 00:03:32,650 until you read the recommendations, 64 00:03:32,650 --> 00:03:34,370 if you read the comments here. 65 00:03:34,370 --> 00:03:37,025 Let's see what our type checker can 66 00:03:37,025 --> 00:03:40,440 tell us about this specification. 67 00:03:40,440 --> 00:03:44,960 We'll just run the type checker result writes in annotations 68 00:03:44,960 --> 00:03:51,450 and see where it complains about the specification. 69 00:03:53,270 --> 00:03:57,095 It complains in line 51 here, 70 00:03:57,095 --> 00:04:01,870 it says that it found a polymorphic type, basically some constant. 71 00:04:01,870 --> 00:04:05,770 The tasks has to give the concrete type here. 72 00:04:05,770 --> 00:04:08,390 We don't allow polymorphic types, 73 00:04:08,390 --> 00:04:10,340 user-defined operators, 74 00:04:10,340 --> 00:04:14,595 although TLA operators are obviously polymorphic. 75 00:04:14,595 --> 00:04:16,470 What's going on here? 76 00:04:16,470 --> 00:04:18,660 You have to choose operator that just 77 00:04:18,660 --> 00:04:22,260 provides some elements of a set. 78 00:04:22,260 --> 00:04:26,325 Of course, that checker cannot figure out the exact type. 79 00:04:26,325 --> 00:04:28,110 What we can do here, we can 80 00:04:28,110 --> 00:04:32,660 define the signature of this operator sum. 81 00:04:32,660 --> 00:04:37,485 In order to do that, I will have to extend the module, 82 00:04:37,485 --> 00:04:39,875 type inlet extended Apalache module 83 00:04:39,875 --> 00:04:42,555 and to write the type annotation. 84 00:04:42,555 --> 00:04:43,830 Let's go and we do it now. 85 00:04:43,830 --> 00:04:47,550 I will just copy this type annotation 86 00:04:47,550 --> 00:04:53,690 from my cheat sheet because I figured out the types before. 87 00:04:54,350 --> 00:04:57,465 Say here that the operator, Sum, 88 00:04:57,465 --> 00:05:00,530 is taking a function of integers to 89 00:05:00,530 --> 00:05:05,195 integers and it's taking a set of integers. 90 00:05:05,195 --> 00:05:08,000 As a result, it returns an integer. 91 00:05:08,000 --> 00:05:10,745 That's our syntax for operators. 92 00:05:10,745 --> 00:05:15,085 Let's see if default type checker is convinced here. 93 00:05:15,085 --> 00:05:18,350 Yes. Now it complains about another thing, 94 00:05:18,350 --> 00:05:20,600 it says that it doesn't know the type of 95 00:05:20,600 --> 00:05:23,990 P. Indeed P is just a constant, 96 00:05:23,990 --> 00:05:27,140 as well as N. What we can do here, 97 00:05:27,140 --> 00:05:30,590 we can introduce the assumptions, 98 00:05:30,590 --> 00:05:34,970 I'll just say one and fill right here, 99 00:05:34,970 --> 00:05:41,100 the assumptions about types of the parameters, 100 00:05:41,100 --> 00:05:43,760 N and P are integers. 101 00:05:43,760 --> 00:05:46,460 Let's run the type checker again. 102 00:05:46,460 --> 00:05:48,390 Now it goes through, 103 00:05:48,390 --> 00:05:53,230 but now it complains in line 169. 104 00:05:53,720 --> 00:05:56,360 Here it's actually interesting. 105 00:05:56,360 --> 00:06:04,405 It says that it cannot choose between a function and a sequence, 106 00:06:04,405 --> 00:06:06,670 because B can be either a sequence or 107 00:06:06,670 --> 00:06:10,060 a function given the current types in this operator. 108 00:06:10,060 --> 00:06:13,360 It cannot figure it out. You can again help 109 00:06:13,360 --> 00:06:17,785 the type checker and give a concrete type for image. 110 00:06:17,785 --> 00:06:20,140 This time it's not so trivial. 111 00:06:20,140 --> 00:06:24,560 It took me a while to figure out the exact type. 112 00:06:24,560 --> 00:06:29,780 But it actually takes a set of integers and, again, 113 00:06:29,780 --> 00:06:31,925 a function of integers to integers, 114 00:06:31,925 --> 00:06:36,170 and then again returns a set of integers. 115 00:06:36,170 --> 00:06:38,595 Let's run the type checker again. 116 00:06:38,595 --> 00:06:41,805 Now the type checker has happy. What does it mean? 117 00:06:41,805 --> 00:06:47,060 It means that our type checker managed to compute 118 00:06:47,060 --> 00:06:49,325 the types of all expressions in 119 00:06:49,325 --> 00:06:52,640 all operator definitions in this specification. 120 00:06:52,640 --> 00:06:56,780 All of you wrote just four type annotations here. 121 00:06:56,780 --> 00:06:59,620 We gave the type annotations for the parameters, 122 00:06:59,620 --> 00:07:02,105 and we gave type annotations for 123 00:07:02,105 --> 00:07:04,640 two operators that are actually quite 124 00:07:04,640 --> 00:07:08,210 trickier and the type checker gave up on them. 125 00:07:08,210 --> 00:07:12,720 That ends this small demo. Let's continue. 126 00:07:13,050 --> 00:07:18,040 I hope I have your attention now after that short demo, 127 00:07:18,040 --> 00:07:20,140 and now I'd like to explain to you 128 00:07:20,140 --> 00:07:22,585 how this type checker is actually working. 129 00:07:22,585 --> 00:07:27,400 I believe it's very simple and I think you'll agree with me. 130 00:07:27,400 --> 00:07:30,250 Before we can do that, let's talk about the type syntax. 131 00:07:30,250 --> 00:07:33,009 That's actually the syntax we use for type annotations, 132 00:07:33,009 --> 00:07:38,080 but it also will help you in understanding how the tools run. 133 00:07:38,080 --> 00:07:41,380 First of all, we have basic types like Booleans, 134 00:07:41,380 --> 00:07:44,605 integers, strings, and reals, nothing unexpected. 135 00:07:44,605 --> 00:07:46,210 We have uninterpreted types, 136 00:07:46,210 --> 00:07:49,735 and wait until the next slide, I will explain all this. 137 00:07:49,735 --> 00:07:54,550 We also have functions from sum types to sum types. 138 00:07:54,550 --> 00:07:57,265 We have sets and sequences. 139 00:07:57,265 --> 00:08:00,820 They're actually homogeneous in the sense that you cannot 140 00:08:00,820 --> 00:08:04,585 put objects of different types into a set or a sequence. 141 00:08:04,585 --> 00:08:07,030 However, there is an exception for records. 142 00:08:07,030 --> 00:08:10,870 Wait until the slide when we talk about records. 143 00:08:10,870 --> 00:08:16,270 There are tuples and records, and of course, 144 00:08:16,270 --> 00:08:19,465 there are operators that may have a fixed number of 145 00:08:19,465 --> 00:08:23,185 arguments to the results. 146 00:08:23,185 --> 00:08:26,825 There are type variables that allow us to talk about 147 00:08:26,825 --> 00:08:31,890 polymorphic types and type annotations and type inference. 148 00:08:31,890 --> 00:08:33,480 If you want to see more details, 149 00:08:33,480 --> 00:08:36,660 go to the Apalache repository and check 150 00:08:36,660 --> 00:08:41,545 these architectural decision record types. 151 00:08:41,545 --> 00:08:44,770 Uninterpreted types, I promised them to you. 152 00:08:44,770 --> 00:08:46,930 Often, if you look at the specification, 153 00:08:46,930 --> 00:08:48,910 you see constants, and 154 00:08:48,910 --> 00:08:52,780 these constants are often sets or just some values. 155 00:08:52,780 --> 00:08:55,765 Usually, it actually doesn't matter 156 00:08:55,765 --> 00:08:57,970 whether these sets are sets of integers, 157 00:08:57,970 --> 00:09:00,610 Boolean, strings, so on, so forth. 158 00:09:00,610 --> 00:09:03,550 You can see these examples in Paxos, for instance, 159 00:09:03,550 --> 00:09:06,520 there are sets of values, sets of acceptors. 160 00:09:06,520 --> 00:09:08,710 You see it in other examples like 161 00:09:08,710 --> 00:09:11,200 prisoners, missionaries, and cannibals. 162 00:09:11,200 --> 00:09:13,300 In all these cases, actually, 163 00:09:13,300 --> 00:09:15,070 you should use uninterpreted types 164 00:09:15,070 --> 00:09:16,840 because the only thing you can do 165 00:09:16,840 --> 00:09:21,055 about uninterpreted types is to compare objects, 166 00:09:21,055 --> 00:09:24,685 say, inside sets of uninterpreted types. 167 00:09:24,685 --> 00:09:28,959 In this case, acceptors will be just the set of processes, 168 00:09:28,959 --> 00:09:30,460 you don't know what processes are, 169 00:09:30,460 --> 00:09:33,670 you can always say if two processes are equal or not. 170 00:09:33,670 --> 00:09:35,920 The same applies to values. 171 00:09:35,920 --> 00:09:37,510 If you have seen model values, 172 00:09:37,510 --> 00:09:41,275 then you'll see that's actually uninterpreted values. 173 00:09:41,275 --> 00:09:43,930 You can say that these are model values of types A, 174 00:09:43,930 --> 00:09:46,000 B, C, D, and so on. 175 00:09:46,000 --> 00:09:52,225 A, B, C, D, and so on are actually uninterpreted types in TLC. 176 00:09:52,225 --> 00:09:56,260 So that's how you can express these things. 177 00:09:56,260 --> 00:09:59,725 How does our type checker works? 178 00:09:59,725 --> 00:10:02,470 Good to say it is embarrassingly simple. 179 00:10:02,470 --> 00:10:04,555 Let's see if I convince you. 180 00:10:04,555 --> 00:10:09,190 First of all, before the type checker [inaudible] should annotate 181 00:10:09,190 --> 00:10:11,200 all the built-in operators with 182 00:10:11,200 --> 00:10:13,930 some signatures, operator signatures. 183 00:10:13,930 --> 00:10:17,365 Plenty of TLA operators are quite simple in that sense. 184 00:10:17,365 --> 00:10:23,395 Say arithmetic operators take two integers and return an integer. 185 00:10:23,395 --> 00:10:26,230 Some operators are also simple, 186 00:10:26,230 --> 00:10:29,410 but they're polymorphic, and they're also very [inaudible]. 187 00:10:29,410 --> 00:10:31,900 For instance, set constructors take 188 00:10:31,900 --> 00:10:36,655 several elements and return a set of lows elements. 189 00:10:36,655 --> 00:10:38,470 Say, there are two elements, 190 00:10:38,470 --> 00:10:40,225 it will return a set of the same type. 191 00:10:40,225 --> 00:10:43,075 You can argue that in TLA, 192 00:10:43,075 --> 00:10:45,580 set elements might have different types, 193 00:10:45,580 --> 00:10:47,590 yes, but not on our type system, 194 00:10:47,590 --> 00:10:48,925 not in our type checker. 195 00:10:48,925 --> 00:10:50,800 Records are a bit of an exception, 196 00:10:50,800 --> 00:10:52,945 you'll see it later. 197 00:10:52,945 --> 00:10:57,820 Several operators actually can be treated as overloaded operators. 198 00:10:57,820 --> 00:11:03,085 For instance, the tuple operator creates either a tuple, 199 00:11:03,085 --> 00:11:04,960 so in this case a pair, 200 00:11:04,960 --> 00:11:07,945 or a sequence of elements of the same type. 201 00:11:07,945 --> 00:11:10,450 Again, you can say sequences are tuples, 202 00:11:10,450 --> 00:11:12,040 tuples are sequences, yes. 203 00:11:12,040 --> 00:11:15,355 In untyped TLA, that is true, 204 00:11:15,355 --> 00:11:17,110 not in our type system. 205 00:11:17,110 --> 00:11:19,090 We will reject some specifications 206 00:11:19,090 --> 00:11:21,550 that you believe are actually okay. 207 00:11:21,550 --> 00:11:24,880 There are a few hardcore operators that actually 208 00:11:24,880 --> 00:11:30,610 bind some values to the variables. 209 00:11:30,610 --> 00:11:33,670 For instance, a set filter in our system 210 00:11:33,670 --> 00:11:38,050 becomes this huge operator that says you have an operator, 211 00:11:38,050 --> 00:11:40,195 it takes some type a, 212 00:11:40,195 --> 00:11:42,445 and it returns a Boolean that's a predicate. 213 00:11:42,445 --> 00:11:47,005 Given this predicate, it returns a set of a's. 214 00:11:47,005 --> 00:11:49,315 Actually, what is happening here, 215 00:11:49,315 --> 00:11:50,500 how does it do that? 216 00:11:50,500 --> 00:11:55,810 It binds a name x to the elements of the set S, 217 00:11:55,810 --> 00:12:00,850 gives this x to P and P has to figure out what to do about this x. 218 00:12:00,850 --> 00:12:03,690 If you looked at all 219 00:12:03,690 --> 00:12:06,840 these annotation set we wanted to have and figured out 220 00:12:06,840 --> 00:12:10,050 that there is a very simple language that can 221 00:12:10,050 --> 00:12:13,600 describe all of the TLA operators, 222 00:12:13,600 --> 00:12:16,870 if you think about them in terms of a type system. 223 00:12:16,870 --> 00:12:20,830 That's what we call a simple lambda-calculus over types. 224 00:12:20,830 --> 00:12:24,730 It has just six constructs that you see here. 225 00:12:24,730 --> 00:12:27,985 First of all, you can have a type, that's obvious. 226 00:12:27,985 --> 00:12:31,135 You just have to type like int or a function from instance. 227 00:12:31,135 --> 00:12:32,950 You can have a variable name. 228 00:12:32,950 --> 00:12:35,200 You can have a lambda abstraction as you have 229 00:12:35,200 --> 00:12:37,750 seen in the previous slide just 230 00:12:37,750 --> 00:12:41,890 binds the variables x_1 to x_k to 231 00:12:41,890 --> 00:12:43,780 the elements of the sets that are 232 00:12:43,780 --> 00:12:46,840 presented by expressions e_1 to e_K, 233 00:12:46,840 --> 00:12:49,960 and these names can be used inside 234 00:12:49,960 --> 00:12:54,040 expression e. Now that definition, not surprisingly, 235 00:12:54,040 --> 00:12:57,460 binds expression e_1 to the name 236 00:12:57,460 --> 00:13:02,350 x and this name x can be used inside the expression e_2. 237 00:13:02,350 --> 00:13:06,145 Application by name, like if you have a user-defined operator, 238 00:13:06,145 --> 00:13:08,980 if you just take the operator name 239 00:13:08,980 --> 00:13:11,920 and apply it to the arguments e_1 to e_k. 240 00:13:11,920 --> 00:13:14,020 The most interesting case here is, 241 00:13:14,020 --> 00:13:16,330 of course, application by type. 242 00:13:16,330 --> 00:13:18,850 You have a choice of types, 243 00:13:18,850 --> 00:13:21,910 operator types to choose from and you apply one 244 00:13:21,910 --> 00:13:25,030 of these operators to the arguments e_1 to e_k. 245 00:13:25,030 --> 00:13:27,070 It's a very simple language, 246 00:13:27,070 --> 00:13:28,795 much simpler than TLA. 247 00:13:28,795 --> 00:13:32,679 However, you can translate every TLA expression, 248 00:13:32,679 --> 00:13:34,120 all the built-in operators and 249 00:13:34,120 --> 00:13:38,710 user-defined operators into this simple language. 250 00:13:38,710 --> 00:13:41,440 What can we do about this simple language? 251 00:13:41,440 --> 00:13:43,990 Well, we can take an expression in 252 00:13:43,990 --> 00:13:47,560 this language and do further types by doing equation solving. 253 00:13:47,560 --> 00:13:49,585 I'll just show you an example here. 254 00:13:49,585 --> 00:13:53,140 Here's a TLA operator, a very simple one. 255 00:13:53,140 --> 00:13:56,050 You translate it into our calculus. 256 00:13:56,050 --> 00:13:58,345 You have a Lambda expression, 257 00:13:58,345 --> 00:14:00,250 a signature int into bool, 258 00:14:00,250 --> 00:14:03,565 and you pass x and int. What do you do then? 259 00:14:03,565 --> 00:14:05,245 Well, you make equations, 260 00:14:05,245 --> 00:14:07,750 introduce equations for this operator, 261 00:14:07,750 --> 00:14:08,860 for this slide expression. 262 00:14:08,860 --> 00:14:11,200 You say there is a type a_1 that 263 00:14:11,200 --> 00:14:14,425 should have this shape because I know there is lambda inside. 264 00:14:14,425 --> 00:14:20,560 For the lambda, I know that this expression should be a set. 265 00:14:20,560 --> 00:14:23,440 So I say a_4 is a set and it has 266 00:14:23,440 --> 00:14:26,845 this shape and it actually happens to be equal to set of a. 267 00:14:26,845 --> 00:14:29,900 Because, well, that's going to do here and 268 00:14:29,900 --> 00:14:35,600 a_2 expresses the elements that are going to be bound to x. 269 00:14:35,600 --> 00:14:38,510 Then we jump to inside this expression. 270 00:14:38,510 --> 00:14:41,330 You see an operator application here, 271 00:14:41,330 --> 00:14:43,055 you know the number of arguments. 272 00:14:43,055 --> 00:14:44,880 You say it's a_5. 273 00:14:44,880 --> 00:14:47,785 It has two arguments and returns a_3. 274 00:14:47,785 --> 00:14:49,455 What are these arguments? 275 00:14:49,455 --> 00:14:52,130 Well, they have to be equal to the value of x and 276 00:14:52,130 --> 00:14:55,400 x is encoded by the variable a_2, 277 00:14:55,400 --> 00:15:00,015 and the second argument is int, right? 278 00:15:00,015 --> 00:15:02,210 You have the signatures, 279 00:15:02,210 --> 00:15:06,260 so this operator actually has to match this signature as well. 280 00:15:06,260 --> 00:15:09,495 We solve these equations and you have the answer. 281 00:15:09,495 --> 00:15:15,670 Well, I was using this kind of special squiggly equality, 282 00:15:15,670 --> 00:15:17,200 it's not exactly the equality. 283 00:15:17,200 --> 00:15:20,740 What we mean by this is unification. 284 00:15:20,740 --> 00:15:23,860 In basic cases like ints, 285 00:15:23,860 --> 00:15:26,485 you just say they're equal, like terms. 286 00:15:26,485 --> 00:15:28,690 If you have variables unification, 287 00:15:28,690 --> 00:15:30,440 you can say whether they're equal or not 288 00:15:30,440 --> 00:15:34,295 and return a binding as a result. 289 00:15:34,295 --> 00:15:35,510 Or in this case, 290 00:15:35,510 --> 00:15:37,774 you can have two bindings. 291 00:15:37,774 --> 00:15:40,385 We solve some of the equations here. 292 00:15:40,385 --> 00:15:43,100 The most interesting case here is records. 293 00:15:43,100 --> 00:15:44,735 We actually can unify 294 00:15:44,735 --> 00:15:47,345 records that have a different number of fields. 295 00:15:47,345 --> 00:15:49,685 In this case, we would produce 296 00:15:49,685 --> 00:15:53,780 joint records that has all these fields that are unified, 297 00:15:53,780 --> 00:15:57,020 but we don't allow to unify records that 298 00:15:57,020 --> 00:16:03,155 have the sign types different types to the same name. 299 00:16:03,155 --> 00:16:05,975 So in this way, we can have 300 00:16:05,975 --> 00:16:09,635 records of different types mixed into the same set, 301 00:16:09,635 --> 00:16:11,479 like you can see in Paxos, 302 00:16:11,479 --> 00:16:16,540 but we still stay relatively precise in our type checking. 303 00:16:16,540 --> 00:16:20,990 As a side effect, since we solve equations of a type unification, 304 00:16:20,990 --> 00:16:23,525 we can do some local type inference. 305 00:16:23,525 --> 00:16:24,920 So there are some boundaries of 306 00:16:24,920 --> 00:16:26,720 this type checking and we think it's 307 00:16:26,720 --> 00:16:31,820 actually quite good for the user to understand what's going on. 308 00:16:31,820 --> 00:16:34,130 The limit type computation to operate 309 00:16:34,130 --> 00:16:36,940 their boundaries if a type checker 310 00:16:36,940 --> 00:16:38,680 cannot find the types precisely in 311 00:16:38,680 --> 00:16:42,110 that definition or operator definition just fails as [inaudible] , 312 00:16:42,110 --> 00:16:45,860 we don't allow polymorphism in user-defined operators, 313 00:16:45,860 --> 00:16:50,310 and that's actually a very good topic for the meeting to discuss. 314 00:16:50,690 --> 00:16:54,090 At the same time, they allow polymorphic, monomorphic, 315 00:16:54,090 --> 00:16:57,480 and overloaded built-in operators. 316 00:16:57,480 --> 00:17:00,965 Otherwise, we wouldn't be able to do the type checking. 317 00:17:00,965 --> 00:17:03,640 So that finishes my part about the type checker and 318 00:17:03,640 --> 00:17:07,030 now [inaudible] continues with type inference. 319 00:17:07,030 --> 00:17:09,635 >> Hello everyone, welcome to this part of the talk 320 00:17:09,635 --> 00:17:12,510 where we discuss type inference in TLA. 321 00:17:13,890 --> 00:17:17,005 This talk is going to be about five things roughly. 322 00:17:17,005 --> 00:17:19,630 First, it's going to be about our Type System in practice. 323 00:17:19,630 --> 00:17:21,080 I'm going to walk you through some examples 324 00:17:21,080 --> 00:17:23,300 and for some of the experiments we've done, 325 00:17:23,300 --> 00:17:27,565 and then I'll talk a bit more about the behind-the-scenes theory, 326 00:17:27,565 --> 00:17:31,790 how we actually do this at the first-order logic level, 327 00:17:31,790 --> 00:17:33,500 how we encode types of strings, 328 00:17:33,500 --> 00:17:36,875 and then finally how encode types of TLA expressions. 329 00:17:36,875 --> 00:17:40,430 So here's what we do at a conceptual level. 330 00:17:40,430 --> 00:17:41,630 So first, we use 331 00:17:41,630 --> 00:17:45,180 the TLA specification to get some bounds information. 332 00:17:45,180 --> 00:17:47,825 So we're not going to use all of the types in the type system. 333 00:17:47,825 --> 00:17:49,760 For a concrete specification, 334 00:17:49,760 --> 00:17:52,715 we only have to care about a finite number of them, specifically, 335 00:17:52,715 --> 00:17:54,365 finite [inaudible] records and tuples, 336 00:17:54,365 --> 00:17:56,250 it's going to simplify a lot of things. 337 00:17:56,250 --> 00:17:57,920 Then we keep the specification or 338 00:17:57,920 --> 00:18:00,170 any sub-expression we're interested in and we 339 00:18:00,170 --> 00:18:04,795 create the SMT constraints from that expression's syntax tree. 340 00:18:04,795 --> 00:18:06,625 Then we have two options, 341 00:18:06,625 --> 00:18:11,010 either we have a satisfiable set of constraints, we get a model, 342 00:18:11,010 --> 00:18:12,310 and from that model, 343 00:18:12,310 --> 00:18:15,065 we reconstruct the types maybe with some post-processing 344 00:18:15,065 --> 00:18:18,125 where we make this more user-presentable. 345 00:18:18,125 --> 00:18:20,150 Or alternatively, we have a type error, 346 00:18:20,150 --> 00:18:22,925 in which case the user has to go and fix the specification. 347 00:18:22,925 --> 00:18:25,150 All right. So let's walk through an example. 348 00:18:25,150 --> 00:18:28,610 So here's a very simple specification that does 349 00:18:28,610 --> 00:18:30,470 almost nothing and then we'll maybe see 350 00:18:30,470 --> 00:18:35,860 how we get constraints from this [inaudible] 351 00:18:35,860 --> 00:18:40,570 Basically, here's the approach in a sentence. 352 00:18:40,570 --> 00:18:43,045 We're going to look at every expression, 353 00:18:43,045 --> 00:18:44,950 and every sub-expression of that expression. 354 00:18:44,950 --> 00:18:47,080 We're going to assign it an SMT variable. 355 00:18:47,080 --> 00:18:50,080 In this case, if p then y else of 356 00:18:50,080 --> 00:18:54,280 A is assigned the SMT variable t_1. 357 00:18:54,280 --> 00:18:56,470 When we're going to read the solution, 358 00:18:56,470 --> 00:19:01,550 we're going to say our t_1 holds the body of p especially. 359 00:19:01,710 --> 00:19:04,870 The top-level body of p is 360 00:19:04,870 --> 00:19:07,825 defined with an if-then-else operator which has a known type. 361 00:19:07,825 --> 00:19:11,590 In order to encode this type, we take, 362 00:19:11,590 --> 00:19:13,135 for example, in this case, 363 00:19:13,135 --> 00:19:16,240 the polymorphic type, any Alpha, bool Alpha, 364 00:19:16,240 --> 00:19:19,450 Alpha to Alpha, and we walk through 365 00:19:19,450 --> 00:19:23,800 the arguments of p and we encode each of these arguments, 366 00:19:23,800 --> 00:19:27,250 the type that the operator says the argument has to have. 367 00:19:27,250 --> 00:19:29,545 For example, p will have to have type bool, 368 00:19:29,545 --> 00:19:31,690 y and set Alpha will have to have type Alpha, 369 00:19:31,690 --> 00:19:34,540 and the entire expression will have to have type Alpha. 370 00:19:34,540 --> 00:19:35,710 Here, in this case, 371 00:19:35,710 --> 00:19:37,750 we use the variable c_1 to represent 372 00:19:37,750 --> 00:19:41,050 the type Alpha because we don't know what Alpha is at this point. 373 00:19:41,050 --> 00:19:44,635 We use bool for t_2 because p has to be a boolean. 374 00:19:44,635 --> 00:19:47,140 Next, we just walk through the arguments. 375 00:19:47,140 --> 00:19:48,880 For example, p and y, 376 00:19:48,880 --> 00:19:50,260 they're handled in the same way. 377 00:19:50,260 --> 00:19:51,700 One is a constant, one is variable, 378 00:19:51,700 --> 00:19:53,470 but at the type inference level, 379 00:19:53,470 --> 00:19:55,555 there is no difference between the two. 380 00:19:55,555 --> 00:19:58,450 Each constant and variable gets a pre-assigned SMT 381 00:19:58,450 --> 00:20:01,465 variable so that anywhere in the specification where we use p, 382 00:20:01,465 --> 00:20:06,230 we use the same type variable to refer to it in the encoding. 383 00:20:06,420 --> 00:20:09,460 For the set constructor, 384 00:20:09,460 --> 00:20:11,980 the story is similar to the analysis constructors. 385 00:20:11,980 --> 00:20:13,225 It's a built-in operator. 386 00:20:13,225 --> 00:20:15,025 It has a known type. 387 00:20:15,025 --> 00:20:17,395 It's one argument that creates a set. 388 00:20:17,395 --> 00:20:19,060 The process is very similar, 389 00:20:19,060 --> 00:20:20,875 so t_4 has to be a set type. 390 00:20:20,875 --> 00:20:23,425 T_5 has to be the type of the content of the set, 391 00:20:23,425 --> 00:20:24,490 and because Alpha is, 392 00:20:24,490 --> 00:20:26,485 again, unconstrained at this point, 393 00:20:26,485 --> 00:20:29,000 we use a fresh variable in c_2. 394 00:20:29,070 --> 00:20:31,165 Then we move to A. 395 00:20:31,165 --> 00:20:32,710 A is a user-defined operator, 396 00:20:32,710 --> 00:20:35,350 so we're going to need to create a fresh instance of A 397 00:20:35,350 --> 00:20:39,085 where we replace the arguments data if it has any, 398 00:20:39,085 --> 00:20:43,735 correctly with the arguments passed 2A. 399 00:20:43,735 --> 00:20:46,330 In this case, A is null, so this isn't a consideration, 400 00:20:46,330 --> 00:20:50,065 but we would have to do this at this point in general. 401 00:20:50,065 --> 00:20:52,150 To analyze the type of A, 402 00:20:52,150 --> 00:20:53,500 we have to look at the body of A. 403 00:20:53,500 --> 00:20:55,960 So we repeat the process we just performed on the body 404 00:20:55,960 --> 00:20:58,470 of B on the body of A. 405 00:20:58,470 --> 00:21:00,930 But this time we're going to take instead of t_1 406 00:21:00,930 --> 00:21:03,420 as at the top level term to encode the body of A, 407 00:21:03,420 --> 00:21:07,045 we're going to use the here freshly created c_3 408 00:21:07,045 --> 00:21:09,190 for this particular instance of A, 409 00:21:09,190 --> 00:21:10,900 which might be important if they had 410 00:21:10,900 --> 00:21:13,670 different ways of passing parameters. 411 00:21:13,920 --> 00:21:18,115 For plus, plus, again has a known type. 412 00:21:18,115 --> 00:21:19,600 It's int and int to int, so we 413 00:21:19,600 --> 00:21:21,655 create the constraints that mirror that. 414 00:21:21,655 --> 00:21:23,875 Then we finish off by looking at the x, 415 00:21:23,875 --> 00:21:27,535 which has the SMT variable v_x, 416 00:21:27,535 --> 00:21:30,190 and 2, which is an integer literal. 417 00:21:30,190 --> 00:21:32,185 So it just becomes if. 418 00:21:32,185 --> 00:21:34,660 You have this set of constraints. 419 00:21:34,660 --> 00:21:36,550 What we do with these constraints is 420 00:21:36,550 --> 00:21:38,500 we pass it to an SMT solver and 421 00:21:38,500 --> 00:21:40,510 the SMT solver will find that this is 422 00:21:40,510 --> 00:21:43,075 a satisfiable set of constraints. 423 00:21:43,075 --> 00:21:45,970 Our recovery will then from the SMT model, 424 00:21:45,970 --> 00:21:48,880 recover the following types for all of the variables, 425 00:21:48,880 --> 00:21:51,910 constants, and operators in this specification. 426 00:21:51,910 --> 00:21:54,220 As you can see, p has to be Boolean 427 00:21:54,220 --> 00:21:56,335 because it's part of the if and else condition. 428 00:21:56,335 --> 00:22:00,685 Similarly, A is a nullary operator that prefers an integer. 429 00:22:00,685 --> 00:22:03,700 The other ones follow from bool of if then else. 430 00:22:03,700 --> 00:22:06,925 Both branches of the nulls have to return the set of integers, 431 00:22:06,925 --> 00:22:09,625 and then the body of B has types of integers as this 432 00:22:09,625 --> 00:22:14,290 y because set the of A has type set of integer. 433 00:22:14,290 --> 00:22:16,540 This is exactly what you would expect. 434 00:22:16,540 --> 00:22:18,160 Of course this happens automatically. 435 00:22:18,160 --> 00:22:19,510 I didn't have to think about any of this. 436 00:22:19,510 --> 00:22:23,035 I just had to push the "Run" button and I got this for free. 437 00:22:23,035 --> 00:22:24,730 You can see how this would work. 438 00:22:24,730 --> 00:22:27,385 In larger specifications, the constraints would be 439 00:22:27,385 --> 00:22:28,630 much bigger and it will be much 440 00:22:28,630 --> 00:22:31,195 harder for me to manually solve them. 441 00:22:31,195 --> 00:22:33,850 So such a tool that would be very welcome, 442 00:22:33,850 --> 00:22:36,430 especially in the designing phase 443 00:22:36,430 --> 00:22:39,320 of the specification before it's fully completed. 444 00:22:40,980 --> 00:22:44,020 Next, let's look at some experiments. 445 00:22:44,020 --> 00:22:47,500 We took 35 specifications from the community repository you see on 446 00:22:47,500 --> 00:22:51,805 the screen and we basically ran them through the tool. 447 00:22:51,805 --> 00:22:54,400 Here you see the benchmark sizes. 448 00:22:54,400 --> 00:22:57,100 The reason you see 42 instead of 35 is because 449 00:22:57,100 --> 00:23:00,010 the specifications highlighted in red had type errors. 450 00:23:00,010 --> 00:23:02,950 So we added a minimal fixed specification to 451 00:23:02,950 --> 00:23:05,125 this benchmark to show 452 00:23:05,125 --> 00:23:07,660 how you can make those specification step correct. 453 00:23:07,660 --> 00:23:09,970 We ended up with most specifications 454 00:23:09,970 --> 00:23:12,729 being under 200, maybe under 400, 455 00:23:12,729 --> 00:23:15,580 and then this one outlier specification 456 00:23:15,580 --> 00:23:19,240 reaches roughly 1,100-1,200 lines of coding. 457 00:23:19,240 --> 00:23:21,370 These are realistic sizes. 458 00:23:21,370 --> 00:23:25,135 These are the specifications you would come across in real life. 459 00:23:25,135 --> 00:23:28,670 So think it's quite the representative benchmarks. 460 00:23:29,040 --> 00:23:32,150 Let's look at how the tool did. 461 00:23:32,190 --> 00:23:35,650 Good news, the tool took less 462 00:23:35,650 --> 00:23:38,170 than a second for the majority of the specifications, 463 00:23:38,170 --> 00:23:39,670 so with this outlier being 464 00:23:39,670 --> 00:23:41,710 roughly 10 seconds for the 465 00:23:41,710 --> 00:23:44,155 largest 1,100 lines of code specification, 466 00:23:44,155 --> 00:23:49,200 which is the protocol 802.16-AuthorizationPKMv35. 467 00:23:49,200 --> 00:23:52,030 [inaudible]. 468 00:23:52,030 --> 00:23:53,665 There's benchmarks that includes 469 00:23:53,665 --> 00:23:56,890 many other well-known other specification, 470 00:23:56,890 --> 00:24:01,150 it includes the CarTalkPuzzle that Igor showed you in his demo. 471 00:24:01,150 --> 00:24:03,850 Only in this case, we didn't have to manually annotate 472 00:24:03,850 --> 00:24:06,715 anything and the tool still found the type of, 473 00:24:06,715 --> 00:24:09,025 for example, the recursive function sum to be 474 00:24:09,025 --> 00:24:12,550 the same type that the annotation was used. 475 00:24:12,550 --> 00:24:17,155 But it did so automatically without need of user interference. 476 00:24:17,155 --> 00:24:21,370 But it also managed to type check in for types work, 477 00:24:21,370 --> 00:24:24,295 Pascos, and [inaudible] algorithm, Bosco. 478 00:24:24,295 --> 00:24:26,770 Interestingly enough, our Raft had a type error which 479 00:24:26,770 --> 00:24:30,385 we corrected as in specification 39. 480 00:24:30,385 --> 00:24:32,470 These are the benchmarks. 481 00:24:32,470 --> 00:24:37,885 The times look to be very good for this [inaudible] tool. 482 00:24:37,885 --> 00:24:39,520 There were some quotes, 483 00:24:39,520 --> 00:24:42,670 some interesting site cases, let's say. 484 00:24:42,670 --> 00:24:44,470 One benchmark, for example, 485 00:24:44,470 --> 00:24:47,560 Queens.tla did not type check. 486 00:24:47,560 --> 00:24:51,130 But for the reason that it 487 00:24:51,130 --> 00:24:53,635 uses queens in one instance 488 00:24:53,635 --> 00:24:56,140 as a sequence and then the other instances as a function, 489 00:24:56,140 --> 00:24:58,810 which in our type system, 490 00:24:58,810 --> 00:25:02,695 does not permit it because we treat those things as separate, 491 00:25:02,695 --> 00:25:08,140 independent objects that you cannot unify through. 492 00:25:08,140 --> 00:25:11,110 But of course, in the untyped interpretation of tla, 493 00:25:11,110 --> 00:25:13,945 functions and sequences are one and the same, 494 00:25:13,945 --> 00:25:20,350 so using either syntax layer 495 00:25:20,350 --> 00:25:24,940 nor the function sets syntax interchangeably is perfectly valid. 496 00:25:24,940 --> 00:25:27,220 What you could do in our framework to solve 497 00:25:27,220 --> 00:25:28,900 this concrete problem is you could enter 498 00:25:28,900 --> 00:25:30,910 this casting operator, in which case, 499 00:25:30,910 --> 00:25:33,040 you cast on the two expressions, 500 00:25:33,040 --> 00:25:35,290 and you would quickly find that this 501 00:25:35,290 --> 00:25:39,550 is a then valid undercasting [inaudible] , 502 00:25:39,550 --> 00:25:41,080 which we don't currently support. 503 00:25:41,080 --> 00:25:43,374 The other one which is interesting 504 00:25:43,374 --> 00:25:46,105 is the DijkstraMutex specification. 505 00:25:46,105 --> 00:25:48,580 This one is interesting because it's an instance of 506 00:25:48,580 --> 00:25:51,790 a specification that is statically not typeable, 507 00:25:51,790 --> 00:25:56,710 but would be correct under a dynamic typing scheme. 508 00:25:56,710 --> 00:25:59,110 The way these operators are set up, 509 00:25:59,110 --> 00:26:01,795 so Ia3A to la4A, 510 00:26:01,795 --> 00:26:04,390 guarantee that any execution will 511 00:26:04,390 --> 00:26:06,685 traverse them in a very specific order. 512 00:26:06,685 --> 00:26:08,920 The way this temp variable is being 513 00:26:08,920 --> 00:26:10,990 used is being used basically as a place holder, 514 00:26:10,990 --> 00:26:13,045 as a miscellaneous, if you will, 515 00:26:13,045 --> 00:26:15,370 where a certain value is just propagated into 516 00:26:15,370 --> 00:26:17,755 the next phase instead of having 517 00:26:17,755 --> 00:26:20,200 to define a fresh variable for 518 00:26:20,200 --> 00:26:23,140 each instance or for the phase between la3A and la4A. 519 00:26:23,140 --> 00:26:25,015 Attempt is just being reused, 520 00:26:25,015 --> 00:26:29,320 but it's being used in such a way that it's not consistent typed. 521 00:26:29,320 --> 00:26:32,170 For example, it's initially defined to be 522 00:26:32,170 --> 00:26:34,930 an integer, but then at some point, 523 00:26:34,930 --> 00:26:39,130 it changes to be set of integers, 524 00:26:39,130 --> 00:26:40,900 which is not allowed in our system. 525 00:26:40,900 --> 00:26:46,420 This, you cannot just fix with [inaudible]. 526 00:26:46,420 --> 00:26:48,850 These are the experiments. Now, let's 527 00:26:48,850 --> 00:26:51,380 look at what happens under the hood. 528 00:26:52,110 --> 00:26:54,910 Under the hood, what we would want 529 00:26:54,910 --> 00:26:57,685 is to pick a type and return return. 530 00:26:57,685 --> 00:27:00,310 It seems like this will work for most things, 531 00:27:00,310 --> 00:27:02,560 but unfortunately, there are exceptions, 532 00:27:02,560 --> 00:27:04,555 and those exceptions are records. 533 00:27:04,555 --> 00:27:06,160 The problem with records is there's 534 00:27:06,160 --> 00:27:08,680 an infinite family of records with arbitrary numbers of 535 00:27:08,680 --> 00:27:10,870 fields and you want to have 536 00:27:10,870 --> 00:27:12,190 a finite number of terms at 537 00:27:12,190 --> 00:27:14,680 the SMT encoding level. How do we solve this? 538 00:27:14,680 --> 00:27:16,540 Well, we solve this by cheating. 539 00:27:16,540 --> 00:27:19,480 We start the process by looking at 540 00:27:19,480 --> 00:27:22,240 a specification and in any given specification, 541 00:27:22,240 --> 00:27:23,860 there are only finitely many couple of 542 00:27:23,860 --> 00:27:26,560 indices and only finitely many record fields. 543 00:27:26,560 --> 00:27:30,280 We take that information and we use this to modify our encoding on 544 00:27:30,280 --> 00:27:33,100 a specification-specification basis to avoid 545 00:27:33,100 --> 00:27:37,210 having to introduce infinitely many term constructors. 546 00:27:37,210 --> 00:27:42,085 This is our encoding. So we define abstract datatype S_T. 547 00:27:42,085 --> 00:27:44,800 Most functions are what you would expect. 548 00:27:44,800 --> 00:27:47,365 You have the classic constants input string, 549 00:27:47,365 --> 00:27:50,170 and then the function set sequence and fun. 550 00:27:50,170 --> 00:27:54,070 Then we have these special constructors for records, and couples, 551 00:27:54,070 --> 00:27:59,545 and operators where we use one constructor for, for example, 552 00:27:59,545 --> 00:28:01,705 all sizes of records, 553 00:28:01,705 --> 00:28:03,730 and that one constructor conceptually 554 00:28:03,730 --> 00:28:06,700 has all of the possible fields in the specification. 555 00:28:06,700 --> 00:28:09,130 Then you use additional analysis in 556 00:28:09,130 --> 00:28:13,945 the recovery phase to figure out when a record term is given, 557 00:28:13,945 --> 00:28:15,850 which record it actually represent 558 00:28:15,850 --> 00:28:17,560 and filter out per user fields so we 559 00:28:17,560 --> 00:28:22,195 don't get to be the record type that is then unusable. 560 00:28:22,195 --> 00:28:24,850 But this allows us to have one constructor. 561 00:28:24,850 --> 00:28:26,290 Or in the case of operators, 562 00:28:26,290 --> 00:28:31,340 finitely many constructors per abstract type family. 563 00:28:31,440 --> 00:28:37,195 Some examples. Int obviously is translated into int, 564 00:28:37,195 --> 00:28:40,135 so the value of 1 would be encoded as lowercase int. 565 00:28:40,135 --> 00:28:41,785 But on the other end, 566 00:28:41,785 --> 00:28:44,860 tuples, for example, the tuple 1 and TRUE, 567 00:28:44,860 --> 00:28:47,935 are encoded with the term tup, 568 00:28:47,935 --> 00:28:52,360 which takes more than just the two arguments for 1 and TRUE. 569 00:28:52,360 --> 00:28:54,640 It fix the tuple size, 570 00:28:54,640 --> 00:28:55,975 which is known to be 2. 571 00:28:55,975 --> 00:28:59,260 Int and bool, which are the types of the first two arguments, 572 00:28:59,260 --> 00:29:01,960 and then it takes the additional arguments equal to the number 573 00:29:01,960 --> 00:29:06,070 of indices up to the maximal index in the specification. 574 00:29:06,070 --> 00:29:08,110 The recovery will then take care 575 00:29:08,110 --> 00:29:11,080 of making sure that because the tuple size is 2, 576 00:29:11,080 --> 00:29:13,390 we only recover the first two fields because those are 577 00:29:13,390 --> 00:29:16,690 the actually present fields in this tuple type. 578 00:29:16,690 --> 00:29:19,330 Now, let's talk about actually encoding types. 579 00:29:19,330 --> 00:29:24,860 How do you do this,he process that I've just hinted at previously? 580 00:29:25,470 --> 00:29:28,840 We use the type-term compatibility, 581 00:29:28,840 --> 00:29:32,920 the syntax that you see here on the screen to encode that term. 582 00:29:32,920 --> 00:29:36,025 This x represents a type, in this case, term. 583 00:29:36,025 --> 00:29:39,310 Let's give you an example and not a definition. Let's see. 584 00:29:39,310 --> 00:29:44,125 You have the tuple type Alpha_1 int and record h to Alpha_2, 585 00:29:44,125 --> 00:29:47,110 and you're in a specification with the maximal tuples as 586 00:29:47,110 --> 00:29:50,020 a four and only two record fields, 587 00:29:50,020 --> 00:29:51,730 of which h is the second. 588 00:29:51,730 --> 00:29:54,085 The encoding for this would be the t, 589 00:29:54,085 --> 00:29:57,430 which is the term that represents the type-term, 590 00:29:57,430 --> 00:29:59,575 is a tuple of 591 00:29:59,575 --> 00:30:02,425 size three because that's the number of fields [inaudible] has, 592 00:30:02,425 --> 00:30:04,465 but it takes four arguments, 593 00:30:04,465 --> 00:30:05,710 c_1 through c_4 because that's 594 00:30:05,710 --> 00:30:08,605 the maximal number of fields in the specification. 595 00:30:08,605 --> 00:30:13,000 However, only c_1 through c_3 have additional constraints that 596 00:30:13,000 --> 00:30:17,830 come from the actual arguments of the tuple type. 597 00:30:17,830 --> 00:30:19,630 C_1 has to be Alpha_1, 598 00:30:19,630 --> 00:30:21,415 and because Alpha_1 is a type variable, 599 00:30:21,415 --> 00:30:23,620 there's no much more we can do besides assign it in 600 00:30:23,620 --> 00:30:26,440 SMT variable a_1 with no additional constraints. 601 00:30:26,440 --> 00:30:30,905 C_2 is an integer because int maps to lowercase integer, 602 00:30:30,905 --> 00:30:33,120 and c_3 is however you resolve 603 00:30:33,120 --> 00:30:35,550 the encoding of the record type h to Alpha_2. 604 00:30:35,550 --> 00:30:41,295 In our case, this is by assigning c_3 to be the record type c_5, 605 00:30:41,295 --> 00:30:43,750 c_6 because NF is 2, 606 00:30:43,750 --> 00:30:47,545 so we have two record fields of which h is the second because 607 00:30:47,545 --> 00:30:52,345 the second c_6 has the additional constraint that encodes Alpha_2. 608 00:30:52,345 --> 00:30:56,275 But because, again, Alpha_2 is an unconstrained type variable, 609 00:30:56,275 --> 00:30:59,725 it just gets assigned to variable a_2. 610 00:30:59,725 --> 00:31:03,350 This is how you would encode type-term. 611 00:31:04,380 --> 00:31:06,850 You get these types that you wish to 612 00:31:06,850 --> 00:31:09,745 encode from built-in operators. 613 00:31:09,745 --> 00:31:13,000 Essentially, with the exception of user-defined operators, 614 00:31:13,000 --> 00:31:15,835 the entirety of a TLS specification is just each chain 615 00:31:15,835 --> 00:31:20,080 of the built-in operator applications. 616 00:31:20,080 --> 00:31:24,190 If we're able to figure out the types of those expressions, 617 00:31:24,190 --> 00:31:28,120 this should give us constraints for the entire specification. 618 00:31:28,120 --> 00:31:32,185 Operators have types ranging from simple to more complex. 619 00:31:32,185 --> 00:31:34,840 For example, plus has the very simple type, 620 00:31:34,840 --> 00:31:37,645 it takes two integers and it returns an integer. 621 00:31:37,645 --> 00:31:41,230 On the other hand, set union is polymorphic, 622 00:31:41,230 --> 00:31:42,730 it takes two sets of 623 00:31:42,730 --> 00:31:45,440 the same kind and returns another set of that same kind. 624 00:31:45,440 --> 00:31:46,860 But it could be anything, 625 00:31:46,860 --> 00:31:48,810 it can take two integer sets or it can take 626 00:31:48,810 --> 00:31:51,570 two sets of strings or it can take two sets of records. 627 00:31:51,570 --> 00:31:54,930 As long as the two arguments are of the same type, 628 00:31:54,930 --> 00:31:58,180 it is valid to apply set union. 629 00:31:58,310 --> 00:32:03,850 The particularity with the set construction operator is that 630 00:32:03,850 --> 00:32:09,220 here we do have an infinite family of set constructors, 631 00:32:09,220 --> 00:32:10,780 but that's not a problem because it's 632 00:32:10,780 --> 00:32:13,180 an infinite family in the type system, 633 00:32:13,180 --> 00:32:15,025 not in the SMT part. 634 00:32:15,025 --> 00:32:18,010 We have one operator priority, 635 00:32:18,010 --> 00:32:21,474 and all of them take n arguments 636 00:32:21,474 --> 00:32:23,305 of the same type and return a set. 637 00:32:23,305 --> 00:32:26,320 Now, what is interesting and different from what 638 00:32:26,320 --> 00:32:28,960 you might have seen is the treatment of certain, 639 00:32:28,960 --> 00:32:30,790 what we call overload operators. 640 00:32:30,790 --> 00:32:32,770 In this case, for example, the tuple constructor. 641 00:32:32,770 --> 00:32:35,950 But overloaded means, is that the single operator, 642 00:32:35,950 --> 00:32:38,290 syntactically, is able to produce 643 00:32:38,290 --> 00:32:41,005 expressions of two different types that are incompatible. 644 00:32:41,005 --> 00:32:45,610 In this case, the tuple constructor has, 645 00:32:45,610 --> 00:32:49,915 basically, two types that make up its, 646 00:32:49,915 --> 00:32:52,060 what we call complex schema. 647 00:32:52,060 --> 00:32:54,910 The first one is a type that takes 648 00:32:54,910 --> 00:32:58,045 two arguments of any types and returns a tuple, 649 00:32:58,045 --> 00:33:00,700 and the second one takes 650 00:33:00,700 --> 00:33:03,655 two arguments of the same type and returns a sequence. 651 00:33:03,655 --> 00:33:05,890 What this basically means is that, 652 00:33:05,890 --> 00:33:08,320 in a vacuum, this operator can be used for either, 653 00:33:08,320 --> 00:33:10,600 but we don't know which it's actually being used 654 00:33:10,600 --> 00:33:13,990 for until we take additional constraints from the context. 655 00:33:13,990 --> 00:33:17,530 I had already used the word schema before. 656 00:33:17,530 --> 00:33:21,325 Schema is what we refer to the type of an operator. 657 00:33:21,325 --> 00:33:24,250 So the type of an operator is called a schema, 658 00:33:24,250 --> 00:33:27,805 and it's primitive if the operator is not overloaded. 659 00:33:27,805 --> 00:33:29,844 The primitive schema basically 660 00:33:29,844 --> 00:33:33,850 declares the type variables that the operator has as polymorphic, 661 00:33:33,850 --> 00:33:35,560 and then tells you the types of 662 00:33:35,560 --> 00:33:37,255 the arguments and the type of returns. 663 00:33:37,255 --> 00:33:39,880 X_1 through X_n are the types of the arguments, 664 00:33:39,880 --> 00:33:41,380 Y is the type of return, 665 00:33:41,380 --> 00:33:43,630 and this operator is 666 00:33:43,630 --> 00:33:46,645 polymorphic and it takes k number of variables. 667 00:33:46,645 --> 00:33:49,000 Or it could be a complex schema which 668 00:33:49,000 --> 00:33:50,740 belongs to an overloaded operator, 669 00:33:50,740 --> 00:33:54,640 which is just a finite collection of primitive schemes. 670 00:33:54,640 --> 00:33:58,495 We'll see later that the complex schemas are really where the, 671 00:33:58,495 --> 00:34:03,830 pun intended, complexity of solving this problem comes. 672 00:34:04,710 --> 00:34:08,410 How do we make a constraint from a known schema? 673 00:34:08,410 --> 00:34:11,965 Well, we compute what is called a schema instance. 674 00:34:11,965 --> 00:34:14,125 The schema instance encodes the fact 675 00:34:14,125 --> 00:34:16,690 that this particular instance of 676 00:34:16,690 --> 00:34:19,630 operator application is compatible with 677 00:34:19,630 --> 00:34:22,870 one of the instantiations of the operator types. 678 00:34:22,870 --> 00:34:25,765 Because the operator is polymorphic in general, 679 00:34:25,765 --> 00:34:29,860 and any particular application has complete arguments, 680 00:34:29,860 --> 00:34:30,925 we just have to make sure that 681 00:34:30,925 --> 00:34:38,280 the instance of this operator is typed correctly. 682 00:34:38,280 --> 00:34:44,290 You can interpret this schema instance as reading a term, t-hat, 683 00:34:44,290 --> 00:34:46,120 encodes the type of the result, 684 00:34:46,120 --> 00:34:48,670 and terms t_1 through t_n encode the types of 685 00:34:48,670 --> 00:34:53,420 the n arguments of the operator to which the schema belongs. 686 00:34:53,970 --> 00:34:57,100 Problem is that we have 687 00:34:57,100 --> 00:35:00,535 disjunctive constraints that come from complex schemas, 688 00:35:00,535 --> 00:35:03,640 and this is really why we need to use SMT because SMT 689 00:35:03,640 --> 00:35:07,570 gives us a nice way of solving disjunctive constraints. 690 00:35:07,570 --> 00:35:10,420 Whereas, in a purely conjunctive setting, 691 00:35:10,420 --> 00:35:12,505 we can maybe use something else, something more efficient. 692 00:35:12,505 --> 00:35:14,350 However, because we're not in that setting, 693 00:35:14,350 --> 00:35:17,110 we have to rely on SMT solver to solve 694 00:35:17,110 --> 00:35:20,875 this set of potentially disjunctive constraints for us. 695 00:35:20,875 --> 00:35:23,380 How do we put all that we've learned together and 696 00:35:23,380 --> 00:35:25,945 actually determine the types of TLA expressions? 697 00:35:25,945 --> 00:35:30,640 We use the similar syntax we were using before to denote 698 00:35:30,640 --> 00:35:32,650 that a term x represents the type of TLA 699 00:35:32,650 --> 00:35:35,260 expression e. The way we do this, 700 00:35:35,260 --> 00:35:36,340 I'm not going to give you definitions, 701 00:35:36,340 --> 00:35:37,900 I'm going to show you by example. 702 00:35:37,900 --> 00:35:39,460 The way we do this, for example, 703 00:35:39,460 --> 00:35:42,340 for literals, is exactly what you would expect. 704 00:35:42,340 --> 00:35:43,930 An integer literal, for example, 705 00:35:43,930 --> 00:35:46,030 42, you can just say that, 706 00:35:46,030 --> 00:35:48,520 if it's trying to be represented by the term x, 707 00:35:48,520 --> 00:35:52,045 then x is an int, and that's the end of it. 708 00:35:52,045 --> 00:35:54,625 More complex, for example, 709 00:35:54,625 --> 00:35:59,155 the application of plus or any other built-in operator. 710 00:35:59,155 --> 00:36:02,410 In that case, we have to analyze three things. 711 00:36:02,410 --> 00:36:05,320 First, we have to figure out the schema instance for 712 00:36:05,320 --> 00:36:11,050 plus for which we compute s plus c_1, c_2, x. 713 00:36:11,050 --> 00:36:13,390 The return of plus, 714 00:36:13,390 --> 00:36:14,920 which is equal to the body 715 00:36:14,920 --> 00:36:17,050 of the expression we're currently analyzing, 716 00:36:17,050 --> 00:36:18,850 has to be represented by x. 717 00:36:18,850 --> 00:36:20,395 For the two arguments, 718 00:36:20,395 --> 00:36:21,835 you want to need two in this case, 719 00:36:21,835 --> 00:36:24,505 we have to create fresh SMT variables, 720 00:36:24,505 --> 00:36:26,050 c_1 and c_2, to represent them. 721 00:36:26,050 --> 00:36:28,120 Then we compute the schema instance. 722 00:36:28,120 --> 00:36:30,010 In addition, we recurse over 723 00:36:30,010 --> 00:36:31,600 all the arguments because the arguments 724 00:36:31,600 --> 00:36:33,250 themselves might be complex expressions, 725 00:36:33,250 --> 00:36:35,575 and we compute their representations as well. 726 00:36:35,575 --> 00:36:39,970 So we compute the encoding of e_1 into c_1, 727 00:36:39,970 --> 00:36:42,770 and we compute the encoding of e_2 into c_2. 728 00:36:43,020 --> 00:36:47,080 For plus, the schema in question here, 729 00:36:47,080 --> 00:36:49,360 s plus, is just int, int to int. 730 00:36:49,360 --> 00:36:52,015 It would tell us that c_1 has to be an int, 731 00:36:52,015 --> 00:36:54,235 c_2 has to be an int, and x has to be an int, 732 00:36:54,235 --> 00:36:56,600 as expected for plus. 733 00:36:56,970 --> 00:36:59,620 The more interesting case is the one 734 00:36:59,620 --> 00:37:02,140 with an overloaded operator, for example. 735 00:37:02,140 --> 00:37:04,450 If we had the expression tuple 1, 2, 736 00:37:04,450 --> 00:37:06,250 we do the same thing as before, 737 00:37:06,250 --> 00:37:08,620 but now the schema instance is different. 738 00:37:08,620 --> 00:37:11,409 Because the operator here is overloaded, 739 00:37:11,409 --> 00:37:13,854 the schema instance is essentially 740 00:37:13,854 --> 00:37:16,900 a disjunction of two different things. 741 00:37:16,900 --> 00:37:18,730 It's a schema instance for s_1 and 742 00:37:18,730 --> 00:37:21,010 a schema instance for s_2, where, here, 743 00:37:21,010 --> 00:37:24,175 s_1 and s_2 are the two primitive schemas that make up 744 00:37:24,175 --> 00:37:29,830 the complex schema which this operator holds. 745 00:37:29,830 --> 00:37:31,810 Everything else is exactly the same, 746 00:37:31,810 --> 00:37:33,430 so we recurse on one, 747 00:37:33,430 --> 00:37:35,890 we recurse on two in exactly the same way. 748 00:37:35,890 --> 00:37:38,545 These two schemas, concretely, 749 00:37:38,545 --> 00:37:40,675 for the tuple constructor, 750 00:37:40,675 --> 00:37:42,745 are the tuple schema s_1, 751 00:37:42,745 --> 00:37:44,920 where we take two arguments and return 752 00:37:44,920 --> 00:37:48,610 a tuple of potentially different types, 753 00:37:48,610 --> 00:37:50,650 and the schema s_2, which we take 754 00:37:50,650 --> 00:37:53,320 two arguments of the same type and return a sequence of that. 755 00:37:53,320 --> 00:37:56,755 We compute both of these constraints for s_1 and for s_2, 756 00:37:56,755 --> 00:37:59,690 and then we just add the structure. 757 00:37:59,700 --> 00:38:04,420 Here is really where the complexity comes from if anywhere. 758 00:38:04,420 --> 00:38:06,565 This is how you would do it. 759 00:38:06,565 --> 00:38:08,950 Of course, there are special cases. 760 00:38:08,950 --> 00:38:12,595 There are cases where you need to manage user-defined operators. 761 00:38:12,595 --> 00:38:14,680 There are cases where you need to 762 00:38:14,680 --> 00:38:17,020 match [inaudible] expressions and stuff like that. 763 00:38:17,020 --> 00:38:19,690 I'm not going to give you expressions for those. 764 00:38:19,690 --> 00:38:21,340 If you're interested, contact me or Igor, 765 00:38:21,340 --> 00:38:25,490 and we'll point you towards our paper. 766 00:38:26,430 --> 00:38:28,810 Now, the question is, 767 00:38:28,810 --> 00:38:30,280 the thing we're doing, 768 00:38:30,280 --> 00:38:33,220 is it actually correct? Or what does that mean? 769 00:38:33,220 --> 00:38:34,870 We have a theorem that says that, 770 00:38:34,870 --> 00:38:38,350 if we manage to find a model and recover types from that model, 771 00:38:38,350 --> 00:38:40,480 the types we recover are sensible. 772 00:38:40,480 --> 00:38:42,820 What that means is that every variable in 773 00:38:42,820 --> 00:38:46,060 the specification is type consistently across all of its uses, 774 00:38:46,060 --> 00:38:47,830 across all of the operators. 775 00:38:47,830 --> 00:38:50,800 Second, although the types that we do infer, 776 00:38:50,800 --> 00:38:53,380 because we infer types for every intermediate expression as well, 777 00:38:53,380 --> 00:38:56,320 are compatible with what the schemas accept. 778 00:38:56,320 --> 00:38:57,970 For example, the schema of 779 00:38:57,970 --> 00:39:00,300 an overloaded operator would 780 00:39:00,300 --> 00:39:03,150 accept a type that's a model type of either one, 781 00:39:03,150 --> 00:39:08,835 and we always get one of those as a result as stated by theorem. 782 00:39:08,835 --> 00:39:11,865 Essentially, this theorem says that the types we get 783 00:39:11,865 --> 00:39:15,735 back from the model are useful and are reasonable. 784 00:39:15,735 --> 00:39:18,030 In conclusion, we provide you 785 00:39:18,030 --> 00:39:19,815 with the tools for either type checking 786 00:39:19,815 --> 00:39:23,700 or type inferring types in your TLA+ specifications, 787 00:39:23,700 --> 00:39:26,235 which you can use to make your work easier, 788 00:39:26,235 --> 00:39:29,030 to make fewer mistakes and catch them quicker. 789 00:39:29,030 --> 00:39:31,390 These tools work in practice in 790 00:39:31,390 --> 00:39:34,150 a reasonable amount of time and return 791 00:39:34,150 --> 00:39:39,550 types which we believe are useful to TLA specification designers. 792 00:39:39,550 --> 00:39:41,710 Thank you for listening. If you have any questions, 793 00:39:41,710 --> 00:39:43,270 please direct them to me or Igor, 794 00:39:43,270 --> 00:39:46,570 and we'll be happy to answer any of them. Thank you very much.