1
00:00:00,230 --> 00:00:03,810
>> Hello, everyone. Welcome
to this presentation.

2
00:00:03,810 --> 00:00:05,505
My name is Antoine Defourne.

3
00:00:05,505 --> 00:00:08,265
Today, we'll present
a joint work with

4
00:00:08,265 --> 00:00:11,130
Petar Vukmirovic about bringing

5
00:00:11,130 --> 00:00:14,100
higher-order automations to TLAPS.

6
00:00:14,100 --> 00:00:17,280
Let's start with some context.

7
00:00:17,280 --> 00:00:20,100
I want to make clear
that I will not talk

8
00:00:20,100 --> 00:00:22,800
about TLC and others in this talk.

9
00:00:22,800 --> 00:00:26,640
I will only be concerned
with TLAPS and proofs,

10
00:00:26,640 --> 00:00:30,860
and also we ignore
[inaudible] problems

11
00:00:30,860 --> 00:00:35,650
because there are not so much
used in practice for TLAPS.

12
00:00:35,650 --> 00:00:38,930
So problem I'm concerned
with is a cluster

13
00:00:38,930 --> 00:00:43,580
of theorems for TLAPS which are not

14
00:00:43,580 --> 00:00:48,470
handled very well currently
by the backend solvers

15
00:00:48,470 --> 00:00:53,105
because they are more
second-order problems

16
00:00:53,105 --> 00:00:56,710
which cannot be handled
by first-order backends.

17
00:00:56,710 --> 00:01:02,580
So I want to extend TLAPS with a
new higher order backend solver,

18
00:01:02,580 --> 00:01:05,980
and for this I chose Zipperposition.

19
00:01:06,760 --> 00:01:10,580
So let me explain quickly how TLAPS

20
00:01:10,580 --> 00:01:14,810
works for those of you
who never used TLAPS.

21
00:01:15,350 --> 00:01:19,450
So the two disabled
to pass a module of

22
00:01:19,450 --> 00:01:22,610
theorems and proof scripts, and from

23
00:01:22,610 --> 00:01:25,280
this pulse generator number of

24
00:01:25,280 --> 00:01:31,025
proof obligations which correspond
to proof steps in the script.

25
00:01:31,025 --> 00:01:36,480
Each proof obligation is then
sent to a backend solver,

26
00:01:36,480 --> 00:01:39,945
to all of the backend
solvers of TLAPS

27
00:01:39,945 --> 00:01:44,664
after been encoded in the
respective input languages.

28
00:01:44,664 --> 00:01:50,985
Whenever a solver answer that's
the proof of obligation is valid,

29
00:01:50,985 --> 00:01:53,205
it is trusted by TLAPS,

30
00:01:53,205 --> 00:01:56,820
ends with proof of obligation
is considered valid.

31
00:01:56,820 --> 00:02:01,550
So there are several backends
available: Isabelle, Zenon,

32
00:02:01,550 --> 00:02:08,760
a few SMT solvers and PTL for
the temporal reasoning parts.

33
00:02:10,430 --> 00:02:12,690
To illustrate my problem,

34
00:02:12,690 --> 00:02:17,100
let's start on the
formula, a definition.

35
00:02:17,100 --> 00:02:20,900
So I want to define the
sum operator in TLAPS,

36
00:02:20,900 --> 00:02:24,450
and I want to do it by recursion.

37
00:02:25,160 --> 00:02:27,440
So in this expression,

38
00:02:27,440 --> 00:02:32,480
there are two parameters of natural
numbers are n, n of series s.

39
00:02:32,480 --> 00:02:37,605
For n, I can just represent it
using the natural number of TLA+,

40
00:02:37,605 --> 00:02:39,470
but for s, I have a choice.

41
00:02:39,470 --> 00:02:43,585
I can use a TLA+
function or an operator.

42
00:02:43,585 --> 00:02:46,670
So I chose of an operator for this,

43
00:02:46,670 --> 00:02:48,365
for the sake of illustration,

44
00:02:48,365 --> 00:02:52,280
because that will
make the sum operator

45
00:02:52,280 --> 00:02:58,340
parameterized by another operator,
so it will be second-order.

46
00:02:58,340 --> 00:03:01,070
Here's a definition.

47
00:03:01,070 --> 00:03:02,750
There is a difficulty with

48
00:03:02,750 --> 00:03:05,420
this definition because
TLA+ doesn't allow

49
00:03:05,420 --> 00:03:09,875
to define recursive operators,
only recursive functions.

50
00:03:09,875 --> 00:03:14,755
So this is why I have to do
it in two steps like this.

51
00:03:14,755 --> 00:03:23,345
Now, another issue with this
definition is that when a definition,

52
00:03:23,345 --> 00:03:26,150
a recursive function,
is defined like this,

53
00:03:26,150 --> 00:03:30,065
it's actually a choose
expression behind the scenes.

54
00:03:30,065 --> 00:03:35,910
So sumRec must be read
as some function f,

55
00:03:35,910 --> 00:03:41,760
but some f, sorry, such that
f is a function of that,

56
00:03:41,760 --> 00:03:46,740
that matches the recursive
definition if it exists.

57
00:03:46,740 --> 00:03:50,235
If such function doesn't exist,

58
00:03:50,235 --> 00:03:53,025
then the deficient can't be used.

59
00:03:53,025 --> 00:03:58,140
So the user must prove by

60
00:03:58,140 --> 00:04:01,130
itself that such a function exists

61
00:04:01,130 --> 00:04:05,020
before the deficient can
be used for anything.

62
00:04:05,020 --> 00:04:09,240
Fortunately, the
standard distribution of

63
00:04:09,240 --> 00:04:13,815
TLAPS provide the
module to do just that.

64
00:04:13,815 --> 00:04:17,000
If we follow a simple
pattern of theorems,

65
00:04:17,000 --> 00:04:19,370
we can recover the basic facts and

66
00:04:19,370 --> 00:04:24,485
basic properties of Sum quickly.

67
00:04:24,485 --> 00:04:26,525
So it works like this:

68
00:04:26,525 --> 00:04:29,000
First, I have to prove
that you'll have

69
00:04:29,000 --> 00:04:33,350
SumDefConclusion which
I won't explain,

70
00:04:33,350 --> 00:04:37,220
but it states in some way that

71
00:04:37,220 --> 00:04:43,860
the function f that matches the
recursive definition does exist.

72
00:04:44,630 --> 00:04:47,120
You see here if it's a theorem

73
00:04:47,120 --> 00:04:50,270
has to be prioritized
by an operator S.

74
00:04:50,270 --> 00:04:53,240
This is the Assume New S part.

75
00:04:53,240 --> 00:04:54,830
I'm skipping the proof,

76
00:04:54,830 --> 00:04:59,610
but just know that this is
solved by TLAPS without problem.

77
00:04:59,930 --> 00:05:04,650
Now, the theorem I'm
interested in is SumDef.

78
00:05:04,650 --> 00:05:11,810
SumDef states that the sum
operator for argument n

79
00:05:11,810 --> 00:05:19,025
and S is defined by the recursive
definition that I intended.

80
00:05:19,025 --> 00:05:22,130
So this is a statement
that I would like to

81
00:05:22,130 --> 00:05:27,440
use instead of the actual
definition of operator

82
00:05:27,440 --> 00:05:33,640
whenever I need to have to
use the Sum definition.

83
00:05:33,640 --> 00:05:40,030
In principle, SumDef is just a
reformulation of SumDefConclusion.

84
00:05:40,030 --> 00:05:45,450
So its proof is just one line here.

85
00:05:46,060 --> 00:05:50,375
It works by invoking SumDefConclusion

86
00:05:50,375 --> 00:05:54,790
as a remark and expand
some definition.

87
00:05:54,790 --> 00:05:58,170
Now, unfortunately, this proof phase

88
00:05:58,170 --> 00:06:02,625
TLAPS is not able to proof this one.

89
00:06:02,625 --> 00:06:06,735
To understand why, let's
look at SumDefConclusion.

90
00:06:06,735 --> 00:06:10,620
So I already pointed out
that SumDefConclusion

91
00:06:10,620 --> 00:06:12,690
is parameterized by an operator S.

92
00:06:12,690 --> 00:06:15,630
In logical terms, that means

93
00:06:15,630 --> 00:06:18,950
SumDefConclusion is
the statement that is

94
00:06:18,950 --> 00:06:24,470
quantified over on operator S.

95
00:06:24,470 --> 00:06:30,570
For SumDef, this quantified lemma

96
00:06:30,570 --> 00:06:33,650
has to be instantiated
with an operator S.

97
00:06:33,650 --> 00:06:40,020
So we can see that it's not
first-order instantiation here,

98
00:06:40,020 --> 00:06:43,720
it's second-order instantiation.

99
00:06:44,420 --> 00:06:49,560
Where most of the backend
solvers of TLAPS are able to to

100
00:06:49,560 --> 00:06:53,340
handle first-order
instantiation because

101
00:06:53,340 --> 00:06:56,760
they are first-order logic.

102
00:06:56,760 --> 00:07:00,285
Isabelle is the only
backend that can do that.

103
00:07:00,285 --> 00:07:02,780
So that makes the proof very fragile.

104
00:07:02,780 --> 00:07:06,700
Because Isabelle is not
able to prove this one,

105
00:07:06,700 --> 00:07:11,330
TLAPS cannot do anything
more as a proof phase.

106
00:07:12,380 --> 00:07:17,480
We can state in general what

107
00:07:17,480 --> 00:07:23,530
problems are not going to
be handled well by TLAPS.

108
00:07:23,530 --> 00:07:26,790
Every time a lemma
that is parametized

109
00:07:26,790 --> 00:07:29,360
by an operator is invoked in

110
00:07:29,360 --> 00:07:33,880
a proof and lemma has to be used,

111
00:07:33,880 --> 00:07:37,700
so to be instantiated
with an operator,

112
00:07:37,700 --> 00:07:40,170
this is a second-order problem,

113
00:07:40,170 --> 00:07:46,400
so we can't use the first-order
back-end, at all, for those.

114
00:07:47,080 --> 00:07:49,500
As a special case,

115
00:07:49,500 --> 00:07:52,830
we can see primitive constructs

116
00:07:52,830 --> 00:07:55,440
like set comprehension for example,

117
00:07:55,440 --> 00:07:59,545
but also, choose expression
functions and so on.

118
00:07:59,545 --> 00:08:05,370
Has second-order constructs with

119
00:08:05,370 --> 00:08:11,365
not axios but axiom schemas
parameterized by an operator,

120
00:08:11,365 --> 00:08:14,300
predicate P in that case.

121
00:08:14,300 --> 00:08:21,930
Those axiom schemas needs to be
instantiated for the encoding.

122
00:08:22,050 --> 00:08:25,325
Of course, these are
common constructs.

123
00:08:25,325 --> 00:08:27,190
For the first-order encodings,

124
00:08:27,190 --> 00:08:29,190
we have to know them in some way.

125
00:08:29,190 --> 00:08:31,260
Currently, this is done by

126
00:08:31,260 --> 00:08:35,115
eliminating those
problematic constructs.

127
00:08:35,115 --> 00:08:38,750
So here, you can see one example,

128
00:08:38,750 --> 00:08:41,890
just to give you the ID.

129
00:08:42,130 --> 00:08:48,645
If I want to eliminate a set
comprehension for these predicates,

130
00:08:48,645 --> 00:08:51,500
I can start by every word and writing

131
00:08:51,500 --> 00:08:54,470
it as a second-order application.

132
00:08:54,470 --> 00:08:57,880
With one argument being the base set,

133
00:08:57,880 --> 00:09:02,110
but the other one will be the
predicates, so predicate P.

134
00:09:02,110 --> 00:09:08,150
Then what I need to do is to replace

135
00:09:08,150 --> 00:09:13,850
this application by another
application when you declare operator,

136
00:09:13,850 --> 00:09:17,720
which is specialized
for this predicate.

137
00:09:17,720 --> 00:09:22,365
I also need to provide the axiom,

138
00:09:22,365 --> 00:09:25,140
the instance of the axiom schema

139
00:09:25,140 --> 00:09:29,640
for set comprehension for that
predicate, and that's it.

140
00:09:29,640 --> 00:09:31,925
For our complications,

141
00:09:31,925 --> 00:09:37,215
when some local variables are
involved, see that example.

142
00:09:37,215 --> 00:09:39,415
We need to make some arrangement.

143
00:09:39,415 --> 00:09:41,150
Provide C as a parameter,

144
00:09:41,150 --> 00:09:44,090
but if this is something
that we are able to

145
00:09:44,090 --> 00:09:47,570
do in cases such as this one,

146
00:09:47,570 --> 00:09:51,540
because we know the axiom
schemas behind the construct.

147
00:09:51,560 --> 00:09:54,145
Now, if I don't want to do

148
00:09:54,145 --> 00:09:58,090
this rather technical
elimination procedure,

149
00:09:58,090 --> 00:10:02,670
I can just encode the problem into

150
00:10:02,670 --> 00:10:08,650
higher order logic and send the
higher-order problem to a solver,

151
00:10:08,650 --> 00:10:13,180
which is able to endure
higher-order logic.

152
00:10:13,600 --> 00:10:18,850
We chose Zipperposition
to make this experiment.

153
00:10:18,850 --> 00:10:23,235
Zipperposition is a first-order
superposition solver

154
00:10:23,235 --> 00:10:26,625
which was extended to higher
order logic recently.

155
00:10:26,625 --> 00:10:28,470
To give you more context,

156
00:10:28,470 --> 00:10:31,390
this is part of a project
called Matryoshka,

157
00:10:31,390 --> 00:10:34,510
and supervised by Jasmin Blanchette,

158
00:10:34,510 --> 00:10:38,200
which is about extending
solver from first-order to

159
00:10:38,200 --> 00:10:42,630
higher order logic in ways that
preserve the good properties.

160
00:10:42,630 --> 00:10:46,360
Zipperposition just won the

161
00:10:46,360 --> 00:10:51,010
CASC competition last
year in the THF division.

162
00:10:51,010 --> 00:10:57,525
THF is a component of the HTTP
standard language for problems,

163
00:10:57,525 --> 00:11:01,060
and it covers higher-order logic.

164
00:11:01,520 --> 00:11:06,155
We did the implementation,

165
00:11:06,155 --> 00:11:08,350
we implemented it at the module,

166
00:11:08,350 --> 00:11:10,660
so an export from TLA plus to THF.

167
00:11:10,660 --> 00:11:13,680
The important aspect of this is

168
00:11:13,680 --> 00:11:17,030
that we don't need to eliminate
the higher-order features anymore.

169
00:11:17,030 --> 00:11:23,930
We can translate the problem
more directly than before.

170
00:11:24,030 --> 00:11:28,040
Using Zipperposition, we were able to

171
00:11:28,040 --> 00:11:32,440
solve the sum examples rather easily.

172
00:11:32,970 --> 00:11:39,920
Now, there are still some work to do.

173
00:11:39,920 --> 00:11:42,830
Our priority right now is to make

174
00:11:42,830 --> 00:11:45,740
this new Zipperposition back-end

175
00:11:47,010 --> 00:11:52,220
as efficient as the first-order
back-ends of TLA plus.

176
00:11:52,220 --> 00:11:59,240
The first-order of encodings
are well optimized by now,

177
00:11:59,240 --> 00:12:02,110
especially the [inaudible] encoding.

178
00:12:02,960 --> 00:12:06,600
We could test with one Sum example,

179
00:12:06,600 --> 00:12:11,060
but to evaluate it at a
larger base of examples,

180
00:12:11,060 --> 00:12:12,805
we still have work to do.

181
00:12:12,805 --> 00:12:15,315
The next step would be to mix

182
00:12:15,315 --> 00:12:19,790
higher-order reasoning with
theories, especially, arithmetic.

183
00:12:20,570 --> 00:12:23,670
Here, the problem is
that Zipperposition

184
00:12:23,670 --> 00:12:25,660
is extended to higher order logic,

185
00:12:25,660 --> 00:12:28,400
but this extension is not compatible

186
00:12:28,400 --> 00:12:32,075
With special of reasoning
like arithmetic.

187
00:12:32,075 --> 00:12:36,410
We cannot use the
backend for application

188
00:12:36,410 --> 00:12:41,550
that mix higher order with
arithmetic for example.

189
00:12:41,770 --> 00:12:46,155
We need to find a
solution for that yet.

190
00:12:46,155 --> 00:12:48,855
Then the next step,

191
00:12:48,855 --> 00:12:53,205
maybe more longer term objective,

192
00:12:53,205 --> 00:12:54,900
would be to attempt

193
00:12:54,900 --> 00:13:01,850
an encoding based on sets
as predicates principle.

194
00:13:02,100 --> 00:13:07,260
The idea for this encoding
is that we can encode

195
00:13:07,260 --> 00:13:11,670
sets as predicates
which are satisfied by

196
00:13:11,670 --> 00:13:16,940
elements of that set,
and those elements only.

197
00:13:17,080 --> 00:13:21,120
Set membership is encoded as

198
00:13:21,120 --> 00:13:26,525
a functional application in
higher order logic very simply.

199
00:13:26,525 --> 00:13:30,500
Here, you can find some examples

200
00:13:30,500 --> 00:13:33,575
very easy to understand
how this works.

201
00:13:33,575 --> 00:13:41,215
The interest of that encoding will
be that we don't need to declare

202
00:13:41,215 --> 00:13:45,245
many new operators and axioms in

203
00:13:45,245 --> 00:13:48,240
the resulting file because

204
00:13:48,240 --> 00:13:53,600
each sets can just be
represented by the lambda term.

205
00:13:53,650 --> 00:14:00,270
That is all. Okay. This
is the end of the talk.

206
00:14:00,270 --> 00:14:02,470
Thank you for your attention.

207
00:14:02,470 --> 00:14:04,170
If you have any question,

208
00:14:04,170 --> 00:14:07,030
I'm ready to answer them now.