-
Notifications
You must be signed in to change notification settings - Fork 1
/
SMTProofs.tex
1235 lines (1071 loc) · 65.9 KB
/
SMTProofs.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{llncs}
\usepackage{url,amsmath,amssymb}
\usepackage{color}
\usepackage[utf8]{inputenc}
% Dissemination Plans and Important Dates:
% - Your tutorial shall be acompanied by a paper,
% which will be published as a chapter of a book
% in the "Logic and Foundations of Mathematics" series
% by College Publications.
% - Please send us a good, though not necessarily final,
% version of your paper ** BEFORE 1st OF JUNE **.
% This will give us (and College Publications) sufficient time
% to produce printed copies of a preliminary version of the book,
% to be distributed to registered participants.
% - Afterwards there will be plenty of time to improve
% the paper/chapter and incorporate feedback gained during the event.
% General remarks:
% - The target audience (for both the tutorials and
% the accompanying papers) consists of Ph.D. students,
% young post-docs and researchers from other logic-related communities.
% - Avoid obscurity and clarify concepts that might be
% unknown to people from other communities.
% - Strive for a self-contained paper, but be concise and
% cite papers where readers may find more information.
% - Do not hesitate to build bridges between your domain and
% other proof-related communities if you have some ideas to do so.
% These bridges should lead to interesting discussions during the workshop.
% - You are encouraged to compare what is done in your community
% with what is done in other communities.
% Try to understand and explain why your community does things differently.
% This could lead to interesting discussions too.
% - This template aims at ensuring a
% reasonably uniform style for all speakers.
% Nevertheless, feel free to deviate from the template if you need.
% We are aware that not all questions are applicable to all speakers.
% We hope the questions here will guide you
% in the production of your tutorial.
% If anything is unclear, if you have any questions, contact us!
% Thank you!
\newcommand{\Note}[1]{\textcolor{blue}{[#1]}}
\title{ Proofs in Satisfiability Modulo Theories }
% Please select a general title that reflects
% the community you are going to represent in the event.
\author{
Clark Barrett \inst{1}
\and
Leonardo de Moura \inst{2}
\and
Pascal Fontaine \inst{3}
}
\authorrunning{C.\~Barrett \and L.\~de Moura \and P.\~Fontaine}
\institute{
New York University\\
\email{[email protected]}
\and
Microsoft Research \\
\email{[email protected]}
\and
University of Lorraine and INRIA\\
\email{[email protected]}
}
\begin{document}
\maketitle
\section{Introduction}
\label{sec:intro}
Satisfiability Modulo Theories (SMT) solvers\footnote{We refer
to~\cite{Barrett14} for a survey on SMT.} check the satisfiability of
first-order formulas written in a language containing interpreted predicates
and functions. These interpreted symbols are defined either by first-order axioms
(e.g.\ the axioms of equality, or array axioms for operators {\tt read} and {\tt write},\dots) or by a
structure (e.g.\ the integer numbers equipped with constants, addition,
equality, and inequalities). Theories frequently implemented within SMT solvers
include the empty theory (a.k.a.\ the theory of uninterpreted symbols with
equality), linear arithmetic on integers and/or reals, bit-vectors, and the
theory of arrays. A very small example of an input formula for an SMT solver is
\begin{equation}\label{eq:example}
a \leq b \wedge b \leq a + x \wedge x = 0 \wedge
\big[ f(a) \neq f(b) \vee (q(a) \wedge \neg q(b + x)) \big].
\end{equation}
The above formula uses atoms over a language of equality, linear arithmetic,
and uninterpreted symbols ($q$ and $f$) within some Boolean combination. The
SMT-LIB language (currently in version 2.0~\cite{Barrett15}) is a
standard concrete input language for SMT solvers. Figure~\ref{fig:smtlib}
presents the above example formula in this format.
%CB: I will address this later
%The format currently does
%not provide any guidelines for proof output.
\begin{figure}
{\footnotesize
\begin{verbatim}
(set-logic QF_UFLRA)
(set-info :source | Example formula in SMT-LIB 2.0 |)
(set-info :smt-lib-version 2.0)
(declare-fun f (Real) Real)
(declare-fun q (Real) Bool)
(declare-fun a () Real)
(declare-fun b () Real)
(declare-fun x () Real)
(assert (and (<= a b) (<= b (+ a x)) (= x 0)
(or (not (= (f a) (f b))) (and (q a) (not (q (+ b x)))))))
(check-sat)
(exit)
\end{verbatim}
}
\caption{\label{fig:smtlib} Formula~\ref{eq:example}, presented in the SMT-LIB 2.0 language}
\end{figure}
SMT solvers were originally designed as decision procedures for
decidable quantifier-free fragments, but many SMT solvers additionally tackle
quantifiers, and some contain decision procedures for certain decidable quantified fragments
(see e.g.~\cite{Ge2}). For these solvers, refutational completeness for first-order logic
(with equality but without further interpreted symbols) is an explicit goal.
Also, some SMT solvers now deal with theories that are undecidable even in the
quantifier-free case, for instance non-linear arithmetic on integers~\cite{Borralleras1}.
In some aspects, and also in their implementation, SMT solvers can be seen as
extensions of propositional satisfiability (SAT) solvers to more expressive
languages. They lift the efficiency of SAT solvers to richer logics:
state-of-the-art SMT solvers are able to deal with very large formulas,
containing thousands of atoms. Very schematically, an SMT solver abstracts its
input to propositional logic by replacing every atom with a fresh proposition,
e.g., for the above example,
\begin{displaymath}
p_{a \leq b} \wedge p_{b \leq a + x} \wedge p_{x = 0} \wedge
\big[ \neg p_{f(a) = f(b)} \vee (p_{q(a)} \wedge \neg p_{q(b + x)}) \big].
\end{displaymath}
The underlying SAT solver is used to provide Boolean models for this
abstraction, e.g.\
\begin{displaymath}
\{ p_{a \leq b}, p_{b \leq a + x}, p_{x = 0}, \neg p_{f(a) = f(b)} \}
\end{displaymath}
and the theory reasoner repeatedly refutes these models and refines the Boolean
abstraction by adding new clauses (in this case $\neg p_{a \leq b} \vee \neg
p_{b \leq a + x} \vee \neg p_{x = 0} \vee p_{f(a) = f(b)}$) until either the
theory reasoner agrees with the model found by the SAT solver, or the
propositional abstraction is refined to an unsatisfiable formula. As a
consequence, propositional reasoning and theory reasoning are quite well
distinguished. Naturally, the interaction between the theory reasoning and the
SAT reasoning is in practice much more subtle than the above naive description,
but even when advanced techniques (e.g.\ online decision procedures and theory propagation, see again~\cite{Barrett14}) are used,
propositional and theory reasoning are not very strongly mixed. SMT proofs are
thus also an interleaving of SAT proofs and theory reasoning proofs. The SAT
proofs are typically based on some form of resolution, whereas the theory reasoning
is used to justify theory lemmas, i.e.\ the new clauses added from the theory
reasoner.
The entire proof ends up having the shape of a Boolean resolution tree, each of
whose leaves is a clause. These clauses are either derived from the input
formula or are theory lemmas, each of which must be justified by
theory-specific reasoning.
The main challenge of proof production is to collect and keep enough information
to produce proofs, without hurting efficiency too much.
%\Note{Clark}
% What are the tools that pioneered proof production in your area?
% What are the tools that currently produce proofs?
% How widespread is the feature of proof production
% among tools in your area?
% How was the historical evolution of the
% proof production feature in your area?
% If the tools in your area cannot produce proofs easily, explain why,
% and describe the current trends to alleviate this problem.
The paper is organized as follows. Section~\ref{sec:impl} discusses the
specifics of what is required to implement proof-production in SMT along with
several challenges. Next, Section~\ref{sec:history} gives a historical
overview of the work on producing proofs in SMT. This is followed by a
discussion of proof formats in Section~\ref{sec:format}. We then look at (in
Sections~\ref{sec:cvc},~\ref{sec:veriT}, and~\ref{sec:z3}) the particular
approaches taken by CVC4, veriT, and Z3. Section~\ref{sec:lean} includes a
discussion of the new \emph{Lean} prover, which attempts to bridge the gap
between SMT reasoning and proof assistants more directly by building a proof
assistant with efficient and sophisticated built-in SMT capabilities. Finally,
Section~\ref{sec:app} discusses current applications of SMT proofs, and then
Section~\ref{sec:concl} concludes.
\section{Implementing Proof-Production in SMT}
\label{sec:impl}
%\Note{Pascal}
Since the core of SMT solvers is mainly a SAT solver adapted for the needs of
SMT, the core of the proof is also a SAT proof. We refer to the other
article by Marijn Heule, in the same volume, for more details on SAT solving.
Let us just recall that SAT solvers work on a conjunctive normal form, and build
a (partial) assignment using Boolean propagations and decisions. If building
this assignment fails (i.e.\ it falsifies a clause) because of some decisions,
the algorithm analyses the conflict, learns a new clause that will serve as a
guide to avoid repeating the same wrong decisions again, and backtracks. Only
this learning phase is important for proofs, and it is easy to generate a
resolution trace corresponding to this conflict analysis and production of the
learned clause. Proof-producing SMT solvers rely on the underlying SAT solver
to produce this resolution proof.
In the context of SAT solving, a conflict is always due to the assignment
falsifying a clause. Another kind of conflict can occur in SMT: assume the SAT
solver assigned $p_{a=b}$ (i.e.\ the propositional abstraction of the atom
$a=b$) to true, and later $p_{f(a) = f(b)}$ to false. There may be no clause
conflicting with such an assignment, but it is obviously inconsistent on the
theory level. The theory reasoner embedded in the SMT solver checks assignments
for theory consistency. In the present case, it would notify a conflict to the
underlying SAT engine, and would produce the clause $\neg p_{a=b} \vee p_{f(a) =
f(b)}$. This \emph{theory lemma} refines the propositional abstraction with some
knowledge from the theory. This knowledge can then be used like any other
clause. If theory reasoners provide detailed proofs for theory lemmas, the SMT
proof is just the interleaving of these theory proofs and the resolution proofs
generated by the SAT solver.
\subsection{Proofs of Theory Lemmas}
Simple examples of theory reasoners include decision procedures for
uninterpreted symbols and linear arithmetic.
The satisfiability of sets of ground literals with equalities and uninterpreted
symbols can be checked using congruence
closure~\cite{Nelson2,Nieuwenhuis3,Nieuwenhuis6}. Efficient algorithms are
non-trivial, but the idea of congruence-closure algorithms is simple. They
build a
partition of all relevant terms, according to the equalities, and check for
inconsistencies with negated equalities. For instance, if the relevant terms
are $a$, $b$, $f(a)$ and $f(b)$, the initial partition would be $\big\{\{a\},
\{b\}, \{f(a)\}, \{f(b)\}\big\}$. If an equality $a=b$ is asserted, the
algorithm would merge the classes for $a$ and $b$, because of this equality, and
would also merge the classes for $f(a)$ and $f(b)$ because it merged the
arguments of $f$ and because equality is a congruence. The resulting partition would be $\big\{\{a, b\}, \{f(a), f(b)\}\big\}$. If the algorithm is
aware that $f(a) \neq f(b)$ should hold, it detects a conflict as soon as $f(a)$
and $f(b)$ are put in the same congruence class. Producing proofs for congruence
closure is thus simply a matter of keeping a trace of which (two) terms are merged
and why (i.e.\ either because of a congruence, or an asserted equality). This
information can be stored in an acyclic undirected\footnote{For efficiency
reasons, modern congruence closure algorithms actually use a directed
graph~\cite{Nieuwenhuis6}.} graph, where terms are nodes, and edges represent
asserted equalities or congruences. If the equality of two terms can be deduced
by transitivity using asserted equalities and congruences, then there is a
(unique) path between the two terms in the graph. To prove a congruence
(e.g.\ $g(a,c) = g(b,d)$), it suffices to first prove the equality of the
corresponding arguments ($a=b$, $c=d$) and use an instance of the congruence
axiom schema, e.g.\ $(a=b \wedge c=d) \Rightarrow g(a,c) = g(b,d)$. Proof
steps (i.e.\ an equality being the consequence of transitivity, and congruence
instances) are again simply connected using Boolean resolution.
Another very important theory supported by many SMT solvers is linear
arithmetic on reals. Decision procedures for this theory are often based on
a specialized simplex algorithm~\cite{Dutertre1,KBD13}. The algorithm
maintains (a) a set of linear equalities, (b) upper and lower bounds on the
variables, and (c) an assignment. The bounds and the linear equalities come
directly from asserted literals; new variables are introduced so that bounds
only apply on variables and not on more complicated linear combinations.
Starting from an assignment satisfying all equalities, the procedure tries to
progressively satisfy all bounds on the variables. During the process, the
linear equalities are combined linearly to produce new sets of equalities,
always equivalent to the original set. If the right strategies are used, the
procedure is guaranteed to terminate either with an assignment satisfying all
equalities and bounds, or an assignment that causes a specific linear equality
to be in contradiction with the bounds on its variables. It then suffices to
collect the constraints corresponding to the equality and the bounds to derive
a contradiction. In fact, the coefficients of the
variables in the linear equality correspond to the coefficients of the linear
combination related to Farkas' lemma (roughly, this lemma states that a set of
linear inequalities is inconsistent if and only if there is a linear
combination of those inequalities that reduces to $1 \leq 0$; see e.g.~\cite{Bockmayr1}).
The above procedure only applies to reals. Various techniques are used to adapt
it for mixed real and integer linear arithmetic. One is branching: if, in the
real algorithm, an integer variable $x$ has a rational value (e.g.\ 1.5), the
procedure may issue a new lemma stating $x\leq 1 \vee x \geq 2$. Notice that
this lemma is a tautology of arithmetic. Another technique is to introduce
cuts; cuts basically remove from the set of real-feasible solutions a subset
with no integer solution. Very schematically, cuts are linear combinations of
already available constraints, strengthened using the fact that variables are
integer, e.g.\ $x+y \leq 1.5$ can be strengthened to $x + y \leq 1$ if $x$ and
$y$ are integer variables.
Other theory reasoning engines found in modern solvers include modules for
reasoning about arrays, inductive data types, bit-vectors, strings, and
non-linear arithmetic.
Their internals may vary a lot from one solver to another, and we will not
consider them in more detail here. For some non-linear arithmetic techniques
(see e.g.~\cite{Bockmayr1}) like cylindrical algebraic decomposition or
virtual substitution, it is not even clear how to produce useful certificates.
The theory reasoner may be a decision procedure for a single theory, but it is often a
combination of decision procedures for multiple theories. Combination
frameworks~\cite{Nelson3,Tinelli1} decide sets of
literals mixing interpreted symbols from several decidable languages (like the
aforementioned linear arithmetic and uninterpreted symbols) by combining the
decision procedures for the individual theories into one decision
procedure for the union of languages. Combining the decision procedures is possible
when the theories in the combination satisfy certain properties. A sufficient
condition, for instance, is that the theories are \emph{stably-infinite}\footnote{A theory
is stably-infinite if every satisfiable set of literals in the theory has an
infinite model.} and disjoint; this is notably the case for the combination
of linear arithmetic with uninterpreted symbols. Combining theories also
involves guessing which shared terms are equal and which are not, or,
equivalently, communicating disjunctions of equalities between decision
procedures. In order for a formula to be \emph{unsatisfiable}, each
possible equivalence relation over the shared terms must be ruled out. This
typically is done by producing many individual theory lemmas, which, taken
together, rule out all of the equivalence relations. Proofs for combinations of
theories are thus built from proofs of the theory lemmas in the individual
theories, and the component proofs are simply combined using
Boolean reasoning rules.
\subsection{Challenging Aspects of Proof Production}
Quantifier reasoning in SMT is typically done through
Skolemization when possible, and heuristic instantiation
otherwise~\cite{Ge1,Moura9}. Proofs involving instantiations are easily
obtained by augmenting ground proofs with applications of a simple
instantiation rule. Formulas like $\forall \mathbf{x}\,
\varphi(\mathbf{x}) \Rightarrow \varphi(\mathbf{t})$ --- where $\mathbf{x}$ is a
vector of variables and $\mathbf{t}$ is a vector of terms with the same length
--- are first-order tautologies and can be provided as lemmas for building proofs.
Skolemization, however, is not trivial. This is because Skolemization is
typically implemented by introducing a new uninterpreted constant or function,
but such a step % does not preserve logical equivalence.
is not locally sound. It is only when looking at a global view
of the proof, noting that the conclusion does not involve any introduced
symbols, that soundness can be shown. Proofs involving Skolemization must
therefore track the introduced constants and ensure they are not used
in the conclusion of the proof. Furthermore, for efficiency, the solver may
resort to advanced techniques for Skolemization that are even less amenable to
proofs.
Another challenge is \emph{preprocessing}. SMT solvers typically include a
module for preprocessing formulas to simplify them before running the main
algorithm to search for a satisfying assignment. This module may perform
anything from simple rewrites (such as rewriting $x=y$ to $y=x$) to complex
global simplifications that significantly change the structure of the formula.
At the end of preprocessing, the formula is also typically converted into
conjunctive normal form (CNF). The CNF clauses form leaves in the ultimate
resolution proof. Thus, in order to justify one of these clauses, a proof must
capture each preprocessing step that was used to derive it from the original
formula. Such bookkeeping is tedious and error-prone. One strategy is to
disable complicated preprocessing techniques during proof-production. Still,
SMT solvers rely on some minimal preprocessing for correct functioning, so in
order to generate complete proofs, at least this preprocessing has to be
recorded as part of the proof.
% Please list the (most common) proof systems
% (e.g. resolution, superposition, tableaux,
% sequent calculus, natural deduction, ...)
% underlying state-of-the-art tools of your kind.
% Please show the inference rules of
% (some of) these proof systems explicitly.
% Use, for example, proof.sty or bussproofs.sty.
% Why are these proof systems particularly useful for
% your automated deduction tools?
% Which features make them suitable?
% What is not so convenient in them?
% What are the trends w.r.t. proof systems for your kind of tool?
%\section{Proof Consumption}
% Does your kind of tool consume proofs from another kind of tool?
% (e.g. SMT-solvers using (proofs from) sat-solvers;
% higher-order ATPs and proof assistants using (proofs from)
% first-order ATPs; ...)
% If so, is there anything that could be improved in the proofs
% that are generated by this other kind of tool?
% If your tool consumes proofs from other tools,
% is it only a proof-checker or is
% it a wider tool also used to build proofs by its own?
% If it is a wider tool, do you also provide a lighter version of
% your tool that is dedicated to proof-checking?
% If not, would it be desirable?
% What are the trends?
%\subsection*{Challenges}
%\Note{Clark}
% Which algorithms are used to search for a proof/refutation?
% Is the procedure able to find (counter-)models as well?
% Which criteria (e.g. speed, sizes (or other measures) of proofs/models)
% are used to evaluate whether a procedure is better than another?
% What are the "bottlenecks" that prevent current procedures
% from solving more problems?
% Is there any mismatch between the abstract proof systems and
% the implemented proof search procedures?
% If so, how big is the gap between proof theory
% and automated deduction in your field?
% Would it be possible/desirable to develop proof systems
% that are closer to the actual implementations?
% Are there any particular decision procedures or
% proof search procedures that are not well-covered by proof theory yet?
% Do the proof search procedures guarantee that an optimal proof
% (for some sense of "optimal") will be found, if a proof exists?
% Are there proof search optimization techniques
% (e.g. resolution refinements, subsumption,
% in-processing) that succesfully improve theorem proving efficiency,
% but may be harmful for the generation of "good" (e.g. short, detailed, easy to check, ...) proofs?
% Which techniques are harmful? Which are harmless?
% If the generated proofs are not optimal,
% what kinds of redundancies may they contain?
% Are there methods to improve the proofs in a post-processing phase?
% Would it be possible/profitable to modify the existing
% proof search procedures so that they generate better proofs,
% that do not need to be post-processed afterwards?
% If the generated proofs are optimal but depend on the
% kind of proof-checker that is used to verify the produced proofs,
% explicit the dependency you rely on to generate optimal proofs.
% What are the trends w.r.t. proof search methods for your kind of tool?
\section{A Short History of Proofs in SMT}
\label{sec:history}
The Cooperating Validity Checker (CVC)~\cite{SBD02} was the first SMT solver to
tackle the problem of proof-production. CVC was built by Aaron Stump and Clark
Barrett, advised by David Dill, at Stanford University. CVC was designed to
improve upon and replace the Stanford Validity Checker (SVC)~\cite{BDL96} for
use in the group's verification applications and also to serve as a research
platform for studying SMT ideas and algorithms. CVC
uses a proof format based on the Edinburgh Logical Framework (LF)~\cite{HHP93}
and its proofs can be checked using the flea proof-checker~\cite{SBD02b,SD02}, which was
developed concurrently with CVC.
The motivation for producing proofs was primarily that it would provide a means
of independently certifying a correct result,
so that users would not have to rely on the correctness of a large and
frequently-changing code base. Another motivation was the hope that
proof-production could aid in finding and correcting nasty bugs in CVC.
In retrospect, however, the most important and lasting contribution of CVC's
proof infrastructure was that it enabled an efficient integration of a SAT
solver for Boolean reasoning. Initial attempts to use a SAT solver within
CVC suffered from poor performance because the theory-reasoning part of CVC
was being used as a black box, simply flagging a particular branch of the SAT
search as unsatisfiable. Without additional information (i.e. a conflict
clause specifying a small reason for the unsatisfiability), the SAT solver could
not benefit from clause learning or non-chronological back-jumping and frequently
failed to terminate, even on relatively easy problems. During a conversation
with Cormaq Flanagan, the idea of using the proof infrastructure to compute
conflict clauses emerged, and this was the key to finally making the
integration with SAT efficient (see~\cite{BDS02-CAV02} for more details). The
technique was implemented in CVC, as well as in Flanagan's solver called
Verifun~\cite{FJO+03} and this laid the groundwork for the so-called DPLL(T)
architecture~\cite{NieOT-JACM-06} used in nearly every modern SMT solver.
The next important development in proof-producing SMT solvers was the
early exploration of using proofs to communicate with skeptical proof
assistants. The first work in this area~\cite{MBG06} was a translator designed
to import proofs from CVC Lite~\cite{BB04} (the successor to CVC) into HOL
Lite~\cite{H96}, a proof assistant for higher-order logic with a small trusted
core set of rules. The goals of this work were two-fold: to provide access to
efficient decision procedures within HOL Lite and to enable the use of HOL Lite
as a proof-checker for CVC Lite. Shortly thereafter~\cite{FMM+06,HCF+07}, a similar effort was
made to integrate the haRVey SMT solver~\cite{DR03} with the Isabelle/HOL proof
assistant~\cite{NPW02}. These early efforts demonstrated the feasibility of such
integrations.
In 2008, an effort was made to leverage this work to certify that the
benchmarks in the SMT-LIB library were correctly labeled (benchmarks are
labeled with a \emph{status} that can be ``sat'', ``unsat'', or ``unknown'',
indicating the expected result when solving the benchmark). The certification
was done using CVC3~\cite{BT07} (the successor to CVC Lite) and
HOL Lite~\cite{GB08}. Many of the benchmarks in the library were certified and
additionally, a bug in the library was found as two benchmarks that had been
labeled satisfiable were certified as unsatisfiable.
The same work reports briefly on an anecdote that further validates the value
of proof-production. A latent bug in CVC3 was revealed during the 2007
SMT-COMP competition. Using HOL Light as a proof-checker, the cause of the bug
was quickly detected as a faulty proof rule. The bug in the proof rule had
persisted for years and would have been very difficult to detect without the
aid of the proof-checker.
The same year also marked the appearance of proof-production capabilities in
several additional SMT solvers. Fx7 was a solver written by Micha{\l} Moskal
which emphasized quantified reasoning and fast proof-checking~\cite{M08}.
MathSAT4 used an internal proof engine to enable the generation of
unsatisfiable cores and interpolants~\cite{BCF+08}. Z3 began supporting
proof-production as well~\cite{dMB08}, though its use of a single rule for
theory lemmas meant that checking or reconstructing Z3 proofs requires the
external checker to have some automated reasoning capability (see
Section~\ref{sec:z3}). Finally, development on veriT, the successor to haRVey,
began in earnest~\cite{BdOD+09}, with proof-production as a primary goal (see Section~\ref{sec:veriT}).
Another important community development around this time was an attempt to
converge on a standard proof format for SMT-LIB. We discuss this effort in
Section~\ref{sec:format} below.
At the time of this writing, the state of proofs in SMT is in flux. A few
groups are making a serious effort to develop solvers (e.g. CVC4 and veriT)
capable of producing self-contained, independently-checkable proofs, though
these are in progress and no standard format has been agreed upon. Other
solvers (like Z3) produce a trace of deduction steps, but reconstructing a full
proof from these steps requires additional search in some cases. Still other
solvers (e.g. MathSAT, SMTInterpol), use proof technology primarily to drive
additional solver functionality, like computing explanations, interpolants, or
unsatisfiable cores. In this paper, we focus on tools that produce proofs with
the goal of having them checked or translated by an external tool.
\section{Proof Formats for SMT}
\label{sec:format}
The SMT-LIB initiative has produced a series of documents standardizing formats
for theories, logics, models, and benchmarks. An early goal of the initiative
was to produce a standard format for proofs as well. However, this proved to
be challenging, primarily because each solver implements its decision
procedures in a different way, meaning the proof rules for each solver could
vary significantly. Another challenge is that it is difficult to formally
express the wide variety of rules needed to cover the decision procedures used
in SMT solvers using existing proof languages (such as the Edinburgh Logical
Framework, LF~\cite{HHP93}).
Probably the most ambitious effort to develop a standard proof format was the
one led by Aaron Stump~\cite{ORS09,SO08} (at the time, a leader of the SMT-LIB initiative), who
developed a language called LFSC, modeled after LF, but able to express more
complicated rules by expressing side-conditions using pieces of trusted
code in a small custom programming language. LFSC allows a user to define a
set of arbitrary inference rules and then use these to construct a proof. LFSC
was used as the proof language for the prototype proof-producing SMT
solver CLSAT and was also integrated as an alternative proof-language in
CVC3~\cite{SOR+13}. LFSC is also the proof language used in CVC4 (see
Section~\ref{sec:cvc} for an example).
In 2011, the first Workshop on Proof Exchange for Theorem Proving (PxTP) was
held, collocated with the 23rd Conference on Automated Deduction (CADE) in
Wroc{\l}aw, Poland. The topic of proof formats was central, with several
papers on the subject. Besson, Fontaine, and Th\'ery proposed a proof format
for SMT based on the syntax of SMT-LIB~\cite{BFT11}. This later formed the basis for
the veriT proof format (see Section~\ref{sec:veriT}). B\"ohme and Weber made a
plea from a user's perspective for proof formats to be clear, well-documented,
and complete (i.e. not requiring any search to replay or check)~\cite{BW11}.
Unfortunately, none of the proposed formats have caught on as a general proof
language for SMT-LIB. One reason for this is that there are not nearly as many
consumers of SMT proofs as there are of SMT technology generally, so the proof
format has not been prioritized. Another is that there are differences of
opinion on what features should be included in such a standard. And another is
that the few tools that produce proofs have considerable momentum behind their
own formats and shifting to a new format would require a lot of work. However,
there are clear benefits to a standard format, especially for proof exchange
between SMT solvers and other tools, so we expect this will continue to be a
subject of debate and research.
%% Unlike SAT solvers, SMT solvers are usually large pieces of software: besides
%% \marginpar{PF: introduction is mostly done but this does not really fit well}
%% propositional reasoning engines, they include decision procedures that can
%% themselves be quite complex. Furthermore, the code of the components (i.e.\ the
%% various decision procedures and the propositional reasoner) can be intricately
%% mixed together. Implementing a new SMT solver from scratch, or even getting
%% into the code of an existing SMT solver, is a tedious task. The OpenSMT
%% solver~\cite{} is maybe the one that includes the best in its philosophy a less
%% stiff learning curve for new developers.
\section{The CVC Family of SMT Solvers}
\label{sec:cvc}
The original CVC tool used a proof system based on LF. Each deductive step in
the tool had a parallel proof-production step. CVC had its own internal
Boolean engine which recorded proofs. Proofs were not available when using a
SAT solver for Boolean reasoning. In CVC Lite and CVC3, a different design
decision was made: a special \emph{Theorem} class was created to hold all
derived facts. Theorem classes could only be constructed using special
proof-rule functions. This allowed us to isolate all of the trusted code in
the system in a few proof-rule files. This approach, while elegant, turned out
not to be very efficient. In particular, many deductive steps were not needed
in the final proof, but their proofs were recorded anyway. More significantly,
there was no way to turn proof production off completely.
In CVC4, we opted for a new model in which proof production is done much more efficiently.
First, all proof code is protected under a compile-time flag. By
disabling this flag, CVC4 can be built with no proof code at all (for maximum
efficiency). When proofs are compiled in, they are generated lazily: only a
minimal amount of bookkeeping is done in the SAT solver and no bookkeeping at
all is done in the theory solvers.\footnote{This is not quite true for the
bit-vector theory solver which includes a second internal SAT solver as part
of its machinery. The same minimal bookkeeping is done in this internal SAT
solver as is done in the main SAT solver.} If the user asks for a proof, the SAT solver
bookkeeping is leveraged to produce the resolution part of the proof. Proofs of the
needed theory lemmas used in the resolution proof are then
generated by \emph{re-running} the theory decision procedures in
proof-producing mode. In proof-producing mode, theory decision procedures log their
deductive steps in a special \emph{proof manager} object which then stitches
all of the pieces together into a full proof.
As mentioned above, the proof format used by CVC4 is LFSC. A full description
of LFSC is well beyond the scope of this paper, but a good introduction can be
found in~\cite{SOR+13}. LFSC is versatile, and can be used to precisely represent proofs generated in various contexts. This makes it particularly suitable for SMT proofs, where parts of
proofs may come from various, very different reasoners. The
philosophy behind LFSC is quite similar to some other
approaches, e.g.~\cite{B13}. LFSC has a stand-alone efficient proof-checker, which can be used for any proof
system expressible in LFSC. The input to the proof checker is an LFSC
proof signature file (e.g.\ specifying the proof rules for a particular theory) and the proof, and the checker checks that the proof correctly
uses the rules in the signature. Figure~\ref{fig:lfsc} shows an LFSC proof for the example from
Section~\ref{sec:intro}.
The first section states what is being proved: given real numbers $a$, $b$, and
$x$, function $f$ and predicate $q$, together with four proofs each justifying
one of the four conjuncts of Formula~\ref{eq:example}, it produces a proof of
the empty clause (represented by {\tt cln}). The next section assigns names to
different formulas appearing in the proof (each formula gets both a
propositional name, starting with $v$ and a logical atom name, starting with
$a$). Next, rules are given for converting Formula~\ref{eq:example} into
clauses in CNF, each of which is assigned a label starting with $C$. Next, a number of
theory derivations are given, again each generating a clause starting with
$C$. Finally, a propositional resolution tree is given ($R$ and $Q$ are
resolution rules) showing how to derive
the empty clause from the clauses introduced through CNF and through theory
lemmas. The proof shown depends on several additional proof files (not shown)
which define the rules for resolution, theory reasoning, CNF conversion etc.
\begin{figure}
{\scriptsize
\begin{verbatim}
(check
(% a var_real
(% b var_real
(% x var_real
(% f (term (arrow Real Real))
(% q (term (arrow Real Bool))
(% @F1 (th_holds (<=_Real (a_var_real a) (a_var_real b)))
(% @F2 (th_holds (<=_Real (a_var_real b) (+_Real (a_var_real a) (a_var_real x))))
(% @F3 (th_holds (= Real (a_var_real x) (a_real 0/1)))
(% @F4 (th_holds (or (not (= Real (apply _ _ f (a_var_real a)) (apply _ _ f (a_var_real b))))
(and (= Bool (apply _ _ q (a_var_real a)) btrue)
(= Bool (apply _ _ q (+_Real (a_var_real b) (a_var_real x)))
bfalse))))
(: (holds cln)
(decl_atom (<=_Real (a_var_real a) (a_var_real b)) (\ v1 (\ a1
(decl_atom (<=_Real (a_var_real b) (+_Real (a_var_real a) (a_var_real x))) (\ v2 (\ a2
(decl_atom (= Real (a_var_real x) (a_real 0/1)) (\ v3 (\ a3
(decl_atom (= Real (a_var_real a) (a_var_real b)) (\ v4 (\ a4
(decl_atom (= Real (apply _ _ f (a_var_real a)) (apply _ _ f (a_var_real b))) (\ v5 (\ a5
(decl_atom (= Bool (apply _ _ q (a_var_real a)) btrue) (\ v6 (\ a6
(decl_atom (= Bool (apply _ _ q (+_Real (a_var_real b) (a_var_real x))) bfalse) (\ v7 (\ a7
(decl_atom (<=_Real (a_var_real b) (a_var_real a)) (\ v8 (\ a8
(decl_atom (= Real (a_var_real a) (+_Real (a_var_real b) (a_var_real x))) (\ v9 (\ a9
(decl_atom (and (= Bool (apply _ _ q (a_var_real a)) btrue)
(= Bool (apply _ _ q (+_Real (a_var_real b) (a_var_real x))) bfalse))
(\ v10 (\ a10
; CNFication
(satlem _ _ (asf _ _ _ a1 (\ l1 (clausify_false (contra _ @F1 l1)))) (\ C1
(satlem _ _ (asf _ _ _ a2 (\ l2 (clausify_false (contra _ @F2 l2)))) (\ C2
(satlem _ _ (asf _ _ _ a3 (\ l3 (clausify_false (contra _ @F3 l3)))) (\ C3
(satlem _ _ (ast _ _ _ a5 (\ l5 (asf _ _ _ a6 (\ l6 (clausify_false (contra _
(and_elim_1 _ _ (or_elim_1 _ _ (not_not_intro _ l5) @F4)) l6)))))) (\ C4
(satlem _ _ (ast _ _ _ a5 (\ l5 (asf _ _ _ a7 (\ l7 (clausify_false (contra _
(and_elim_2 _ _ (or_elim_1 _ _ (not_not_intro _ l5) @F4)) l7)))))) (\ C5
; Theory lemmas
; ~a4 ^ a1 ^ a8 => false
(satlem _ _ (asf _ _ _ a4 (\ l4 (ast _ _ _ a1 (\ l1 (ast _ _ _ a8
(\ l8 (clausify_false (contra _ l1 (or_elim_1 _ _ (not_not_intro _ (<=_to_>=_Real _ _ l8))
(not_=_to_>=_=<_Real _ _ l4)))))))))) (\ C6
; a2 ^ a3 ^ ~a8 => false
(satlem _ _ (ast _ _ _ a2 (\ l2 (ast _ _ _ a3 (\ l3 (asf _ _ _ a8 (\ l8 (clausify_false
(poly_norm_>= _ _ _ (<=_to_>=_Real _ _ l2) (pn_- _ _ _ _ _ (pn_+ _ _ _ _ _
(pn_var a) (pn_var x)) (pn_var b)) (\ pn2
(poly_norm_= _ _ _ (symm _ _ _ l3) (pn_- _ _ _ _ _ (pn_const 0/1) (pn_var x)) (\ pn3
(poly_norm_> _ _ _ (not_<=_to_>_Real _ _ l8) (pn_- _ _ _ _ _ (pn_var b) (pn_var a)) (\ pn8
(lra_contra_> _ (lra_add_>_>= _ _ _ pn8 (lra_add_=_>= _ _ _ pn3 pn2)))))))))))))))) (\ C7
; a4 ^ ~a5 => false
(satlem _ _ (ast _ _ _ a4 (\ l4 (asf _ _ _ a5 (\ l5 (clausify_false
(contra _ (cong _ _ _ _ _ _ (refl _ f) l4) l5)))))) (\ C8
; a3 ^ a4 ^ ~a9 => false
(satlem _ _ (ast _ _ _ a3 (\ l3 (ast _ _ _ a4 (\ l4 (asf _ _ _ a9 (\ l9 (clausify_false
(poly_norm_= _ _ _ (symm _ _ _ l3) (pn_- _ _ _ _ _ (pn_const 0/1) (pn_var x)) (\ pn3
(poly_norm_= _ _ _ l4 (pn_- _ _ _ _ _ (pn_var a) (pn_var b)) (\ pn4
(poly_norm_distinct _ _ _ l9 (pn_- _ _ _ _ _ (pn_+ _ _ _ _ _
(pn_var b) (pn_var x)) (pn_var a)) (\ pn9
(lra_contra_distinct _ (lra_add_=_distinct _ _ _
(lra_add_=_= _ _ _ pn3 pn4) pn9))))))))))))))) (\ C9
; a9 ^ a6 ^ a7 => false
(satlem _ _ (ast _ _ _ a9 (\ l9 (ast _ _ _ a6 (\ l6 (ast _ _ _ a7 (\ l7 (clausify_false
(contra _ (trans _ _ _ _ (trans _ _ _ _ (symm _ _ _ l6) (cong _ _ _ _ _ _
(refl _ q) l9)) l7) b_true_not_false)))))))) (\ C10
; Resolution proof
(satlem_simplify _ _ _ (R _ _ (Q _ _ (Q _ _ C6 C1 v1) (Q _ _ (Q _ _ C7 C2 v2) C3 v3) v8)
(Q _ _ (Q _ _ (Q _ _ (Q _ _ (R _ _ C9 C10 v9) C3 v3) C4 v6) C5 v7) C8 v5) v4)
(\ x x)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
\end{verbatim}
}
\caption{\label{fig:lfsc} An LFSC proof for example Formula \ref{eq:example}.}
\end{figure}
At the time this article is being written, CVC4 supports proofs for
uninterpreted functions and arrays. Support for the theory of bit-vectors is a
current project, and support for arithmetic proofs (already present in
CVC3) is expected to follow soon thereafter.
\section{The veriT SMT Solver}
\label{sec:veriT}
The veriT SMT solver is developed jointly at Loria, Nancy (France) and UFRN,
Natal (Brazil). It is open-source, under the BSD license. veriT is first a
testing platform for techniques developed around SMT, but it is sufficiently
stable to be used by third parties. Proofs in veriT are mainly resolution
proofs interleaved with theory reasoning lemmas.
The proof trace language of veriT is inspired by the SMT-LIB 2.0 standard. A
sample proof for our running example~\ref{eq:example} is given in
Figure~\ref{fig:proofverit}. The language is quite coarse-grained and rather
falls into the ``proof trace'' category rather than the category of full detailed proofs. It
provides, however, a full account of the resolution proof, and equality reasoning
is broken down into applications of the congruence and transitivity instances of
the axiom schemas for equality. Symmetry of equality is silently used
uniformly. Special proof rules are assigned to theory lemmas (from arithmetic),
but veriT does not break down arithmetic reasoning to instances of e.g.,
Presburger axioms. The example here does not feature quantifier reasoning;
proofs with quantifiers use quantifier instantiation rules.
\begin{figure}
{\scriptsize
\begin{verbatim}
(set .c1 (input :conclusion ((and (<= a b) (<= b (+ a x)) (= x 0)
(or (not (= (f b) (f a))) (and (q a) (not (q (+ b x)))))))))
(set .c2 (and :clauses (.c1) :conclusion ((<= a b))))
(set .c3 (and :clauses (.c1) :conclusion ((<= b (+ a x)))))
(set .c4 (and :clauses (.c1) :conclusion ((= x 0))))
(set .c5 (and :clauses (.c1) :conclusion
((or (not (= (f b) (f a))) (and (q a) (not (q (+ b x))))))))
(set .c6 (and_pos :conclusion ((not (and (q a) (not (q (+ b x))))) (q a))))
(set .c7 (and_pos :conclusion ((not (and (q a) (not (q (+ b x))))) (not (q (+ b x))))))
(set .c8 (or :clauses (.c5) :conclusion
((not (= (f b) (f a))) (and (q a) (not (q (+ b x)))))))
(set .c9 (eq_congruent :conclusion ((not (= a b)) (= (f b) (f a)))))
(set .c10 (la_disequality :conclusion ((or (= a b) (not (<= a b)) (not (<= b a))))))
(set .c11 (or :clauses (.c10) :conclusion ((= a b) (not (<= a b)) (not (<= b a)))))
(set .c12 (resolution :clauses (.c11 .c2) :conclusion ((= a b) (not (<= b a)))))
(set .c13 (la_generic :conclusion ((not (<= b (+ a x))) (<= b a) (not (= x 0)))))
(set .c14 (resolution :clauses (.c13 .c3 .c4) :conclusion ((<= b a))))
(set .c15 (resolution :clauses (.c12 .c14) :conclusion ((= a b))))
(set .c16 (resolution :clauses (.c9 .c15) :conclusion ((= (f b) (f a)))))
(set .c17 (resolution :clauses (.c8 .c16) :conclusion ((and (q a) (not (q (+ b x)))))))
(set .c18 (resolution :clauses (.c6 .c17) :conclusion ((q a))))
(set .c19 (resolution :clauses (.c7 .c17) :conclusion ((not (q (+ b x))))))
(set .c20 (eq_congruent_pred :conclusion ((not (= a (+ b x))) (not (q a)) (q (+ b x)))))
(set .c21 (resolution :clauses (.c20 .c18 .c19) :conclusion ((not (= a (+ b x))))))
(set .c22 (la_disequality :conclusion
((or (= a (+ b x)) (not (<= a (+ b x))) (not (<= (+ b x) a))))))
(set .c23 (or :clauses (.c22) :conclusion
((= a (+ b x)) (not (<= a (+ b x))) (not (<= (+ b x) a)))))
(set .c24 (resolution :clauses (.c23 .c21) :conclusion
((not (<= a (+ b x))) (not (<= (+ b x) a)))))
(set .c25 (eq_congruent_pred :conclusion
((not (= a b)) (not (= (+ a x) (+ b x))) (<= a (+ b x)) (not (<= b (+ a x))))))
(set .c26 (eq_congruent :conclusion ((not (= a b)) (not (= x x)) (= (+ a x) (+ b x)))))
(set .c27 (eq_reflexive :conclusion ((= x x))))
(set .c28 (resolution :clauses (.c26 .c27) :conclusion ((not (= a b)) (= (+ a x) (+ b x)))))
(set .c29 (resolution :clauses (.c25 .c28) :conclusion
((not (= a b)) (<= a (+ b x)) (not (<= b (+ a x))))))
(set .c30 (resolution :clauses (.c29 .c3 .c15) :conclusion ((<= a (+ b x)))))
(set .c31 (resolution :clauses (.c24 .c30) :conclusion ((not (<= (+ b x) a)))))
(set .c32 (la_generic :conclusion ((<= (+ b x) a) (not (= a b)) (not (= x 0)))))
(set .c33 (resolution :clauses (.c32 .c4 .c15 .c31) :conclusion ()))
\end{verbatim}
}
\caption{\label{fig:proofverit} The proof output by veriT for example Formula \ref{eq:example}. The output has been slightly edited to cut and indent long lines.}
\end{figure}
The first line gives the input. Clauses {\tt c2} to {\tt c8} explain clausification
of the input. Equality reasoning produces clauses labeled with {\tt
eq\_congruent(\_pred)} and {\tt eq\_reflexive}, whereas the linear arithmetic
module produces all the clauses labeled {\tt la\_disequality} and {\tt
la\_generic}. The rest of the proof is mainly resolutions, and ends with the
empty clause.
% PF I should maybe say more...
In the future, the proof format of veriT will be further improved to even better
stick to the SMT-LIB standard. In particular, when using shared terms (for
simplicity, the given proof example does not use shared terms), veriT uses a
notation to label repeated formulas which was inspired by the SVC custom input
language, but which does not fit well with the spirit of the SMT-LIB language.
\section{The Z3 SMT Solver}
\label{sec:z3}
Z3 is a Satisfiability Modulo Theories (SMT) solver developed at
Microsoft Research. Z3 source code is available online, and it is
free for non-commercial purposes. Z3 is used in various software
analysis and test-case generation projects at Microsoft Research and
elsewhere. Proof generation is based on two simple ideas: (1) a
notion of implicit quotation to avoid introducing auxiliary variables
(this simplifies the creation of proof objects considerably); and (2) natural
deduction style proofs to facilitate modular proof re-construction.
%\subsection{Proof Format and Production}
In Z3, proof objects are represented as terms. So a proof-tree is just a term
where each inference rule is represented by a function symbol.
For example, consider the proof-rule $mp(p, q, \varphi)$, where $p$ is a proof for
$\psi \rightarrow \varphi$ and $q$ is a proof for $\psi$. Each proof-rule has a
\emph{consequent}, the consequent of $mp(p, q, \varphi)$ is $\varphi$.
A basic underlying principle for composing and building proofs in Z3 has been to support a modular
architecture that works well with theory solvers that receive literal assignments from other solvers and
produce contradictions or new literal assignments. The theory solvers should be able to produce independent
and opaque explanations for their decisions.
Conceptually, each solver acts upon a set of hypotheses and produces a consequent. The basic proof-rules
that support such an architecture can be summarized as: \emph{hypothesis},
which introduces assumptions, \emph{lemma}, which eliminates hypotheses, and \emph{unit resolution}, which handles basic propagation.
We say that a proof-term is \emph{closed} when every path that ends with a hypothesis contains an application
of rule lemma. If a proof-term is not closed, it is open.
The main propositional inference engine in Z3 is based on a DPLL(T) architecture.
The DPLL(T) proof search method lends itself naturally to producing resolution style proofs.
Systems, such as zChaff and a version of MiniSAT, produce proof logs based on logging the
unit propagation steps as well as the conflict resolution steps. The resulting log suffices to produce a
propositional resolution proof.
The approach taken in Z3 bypasses logging, and instead builds proof
objects during conflict resolution. With each clause we attach a
proof. Clauses that were produced as part of the input have proofs
that were produced from the previous steps. This approach does not
require logging resolution steps for every unit-propagation, but
delays the analysis of which unit propagation steps are useful until
conflict resolution. The approach also does not produce a resolution
proof directly. It produces a natural deduction style proof with
hypotheses.
The theory of equality can be captured by axioms for reflexivity,
symmetry, transitivity, and substitutivity of equality. In Z3,
these axioms are inference rules, and these inference rules apply
for any binary relation that is reflexive,
symmetric, transitive, and/or reflexive-monotone.
In the DPLL(T) architecture, decision procedures for a theory $T$
identify sets of asserted $T$-inconsistent literals. Dually, the
disjunction of the negated literals are $T$-tautologies. Consequently,
proof terms created by theories can be summarized using a single form,
here called \emph{theory lemmas}. Some theory lemmas are annotated
with hints to help proof checkers and reconstruction. For example, the
theory of linear arithmetic produces theory lemmas based on Farkas'
lemma. For example, suppose $p$ is a proof for $x > 0$, and $q$ is a
proof for $2x + 1 < 0$, then $\mbox{\emph{farkas}}(1, p, -1/2, q, \neg(x > 0) \vee \neg(2x + 1 < 0))$
is a theory lemma where the coefficients $1$ and $-1/2$ are hints.
The Z3 simplifier applies standard simplification rules for the
supported theories. For example, terms using the arithmetical operations,
whether for integer, real, or bit-vector arithmetic, are normalized into sums of monomials.
A single proof rule called \emph{rewrite} is used to record the simplification steps.
For example, $\mbox{\emph{rewrite}}(x + x = 2 x)$ is a proof for $x + x = 2x$.
Notice that Z3 does not axiomatize the legal rewrites.
Instead, to check the rewrite steps, one must rely on a
proof checker to be able to apply similar inferences for the set of built-in theories.
Thus, Z3 proofs are coarse-grained and also fall into the ``proof trace''
category, similarly to veriT.
Since Z3 proofs are terms, they can be traversed using the Z3 API.
Proofs can also be output in the SMT-LIB 2.0 language. A sample proof
for our running example~\ref{eq:example} is given in
Figure~\ref{fig:proofz3}.
\begin{figure}
{\scriptsize
\begin{verbatim}
(let (($x82 (q b)) (?x49 (* (- 1.0) b)) (?x50 (+ a ?x49))
($x51 (<= ?x50 0.0)) (?x35 (f b)) (?x34 (f a))
($x36 (= ?x34 ?x35)) ($x37 (not $x36))
($x43 (or $x37 (and (q a) (not (q (+ b x))))))
($x33 (= x 0.0)) (?x57 (+ a ?x49 x)) ($x56 (>= ?x57 0.0))
($x44 (and (<= a b) (<= b (+ a x)) $x33 $x43))
(@x60 (monotonicity (rewrite (= (<= a b) $x51))
(rewrite (= (<= b (+ a x)) $x56))
(= $x44 (and $x51 $x56 $x33 $x43))))
(@x61 (mp (asserted $x44) @x60 (and $x51 $x56 $x33 $x43)))
(@x62 (and-elim @x61 $x51)) ($x71 (>= ?x50 0.0)))
(let ((@x70 (trans (monotonicity (and-elim @x61 $x33) (= ?x57 (+ a ?x49 0.0)))
(rewrite (= (+ a ?x49 0.0) ?x50)) (= ?x57 ?x50))))
(let ((@x74 (mp (and-elim @x61 $x56) (monotonicity @x70 (= $x56 $x71)) $x71)))
(let ((@x121 (monotonicity (symm ((_ th-lemma arith eq-propagate 1 1) @x74 @x62 (= a b))
(= b a))
(= $x82 (q a)))))
(let (($x38 (q a)) ($x96 (or (not $x38) $x82)) ($x97 (not $x96)))
(let ((@x115 (monotonicity (symm ((_ th-lemma arith eq-propagate 1 1) @x74 @x62 (= a b))
(= b a))
(= ?x35 ?x34))))
(let (($x100 (or $x37 $x97)))
(let ((@x102 (monotonicity (rewrite (= (and $x38 (not $x82)) $x97))
(= (or $x37 (and $x38 (not $x82))) $x100))))
(let (($x85 (not $x82)))
(let (($x88 (and $x38 $x85)))
(let (($x91 (or $x37 $x88)))
(let ((@x81 (trans (monotonicity (and-elim @x61 $x33) (= (+ b x) (+ b 0.0)))
(rewrite (= (+ b 0.0) b)) (= (+ b x) b))))
(let ((@x87 (monotonicity (monotonicity @x81 (= (q (+ b x)) $x82)) (= (not (q (+ b x)))
$x85))))
(let ((@x93 (monotonicity (monotonicity @x87 (= (and $x38 (not (q (+ b x)))) $x88))
(= $x43 $x91))))
(let ((@x103 (mp (mp (and-elim @x61 $x43) @x93 $x91) @x102 $x100)))
(let ((@x119 (unit-resolution (def-axiom (or $x96 $x38))
(unit-resolution @x103 (symm @x115 $x36) $x97) $x38)))
(let ((@x118 (unit-resolution (def-axiom (or $x96 $x85))
(unit-resolution @x103 (symm @x115 $x36) $x97) $x85)))
(unit-resolution @x118 (mp @x119 (symm @x121 (= $x38 $x82)) $x82) false)))))))))))))))))
\end{verbatim}
}
\caption{\label{fig:proofz3} The proof output by Z3 for example Formula \ref{eq:example}. The output has been slightly edited to cut and indent long lines.}
\end{figure}
\section{The Lean Prover}
\label{sec:lean}
Lean is a new open source theorem prover being developed by Leonardo
de Moura and Soonho Kong~\cite{lean}. Lean is not a standard SMT
solver; it can be used as an automatic prover like SMT solvers, but it
can also be used as a proof assistant. The Lean kernel is based on
dependent type theory, and is implemented in two layers. The first
layer contains the type checker and APIs for creating and
manipulating the terms, the declarations, and the environment. The first
layer has several configuration options. For example, developers may
instantiate the kernel with or without an impredicative Prop sort.
They may also select whether Prop is proof-irrelevant or not. The
first layer consists of 5k lines of C++ code. The second layer
provides additional components such as inductive families (500
additional lines of code). When the kernel is instantiated, one
selects which of these components should be used. The current
components are already sufficient for producing an implementation of
the Calculus of Inductive Constructions (CIC). The main difference is
that Lean does not have universe cumulativity; it instead provides
universe polymorphism. The Lean CIC-based kernel is treated as the standard
kernel. Another design goal is to support the new Homotopy Type System
(HTS) proposed by Vladimir Voevodsky. HTS is going to be implemented
as another kernel instantiation.
Lean is meant to be used as a standalone system and as a software
library. It provides an extensive API and can be easily embedded in
other systems. SMT solvers can use the Lean API to create proof terms
that can be independently checked. The API can also be used to export
Lean proofs to other systems based on CIC (e.g., Coq and Matita).
Having a more expressive language for encoding proofs provides several
advantages. First, we can easily add new ``proof rules'' without
modifying the proof checker (i.e., type checker). Proof rules such as
\emph{mp} and \emph{monotonicity} used in Z3 are just theorems in Lean.
When a new decision procedure (or any other form of
automation) is implemented, the developer must first prove the theorems
that are needed to justify the results produced by the automatic
procedure. For example, suppose a developer is implementing a procedure for
Presburger arithmetic, she will probably use a theorem such as:
{\scriptsize
\begin{verbatim}
theorem add_comm (n m:nat) : n + m = m + n
:=
induction_on m
(trans (add_zero_right _) (symm (add_zero_left _)))
(take k IH,
calc
n + succ k = succ (n+k) : add_succ_right _ _
... = succ (k + n) : {IH}
... = succ k + n : symm (add_succ_left _ _))
\end{verbatim}
}
Preprocessing steps such as Skolemization can be supported in
a similar way. Whenever a preprocessing procedure applies a
Skolemization step, it uses the following theorem to justify it.
{\scriptsize
\begin{verbatim}
theorem skolem_th {A : Type} {B : A -> Type} {P : forall x : A, B x -> Bool} :
(forall x, exists y, P x y) = (exists f, (forall x, P x (f x)))
:= iff_intro
(assume H : (forall x, exists y, P x y), @axiom_of_choice _ _ P H)
(assume H : (exists f, (forall x, P x (f x))),