Many hyperlinks are disabled.

Use anonymous login
to enable hyperlinks.

Overview

Comment: | Penultimate draft of the report. After spell checking and such, this will become the final draft. |
---|---|

Downloads: | Tarball | ZIP archive | SQL archive |

Timelines: | family | ancestors | descendants | both | trunk |

Files: | files | file ages | folders |

SHA1: | 98924328add2f2586fae104b095ae6bdbe806d8f |

User & Date: | andy 2015-04-18 19:48:55 |

Context

2015-04-18
| ||

21:14 | Final draft of report. (Unless I think of something else to add...) check-in: 04f1cd0696 user: andy tags: trunk | |

19:48 | Penultimate draft of the report. After spell checking and such, this will become the final draft. check-in: 98924328ad user: andy tags: trunk | |

18:47 | Added a TODO comment describing the problems around applying the substitution to the body of an expanded clause, instead of appending them as unification goals. check-in: b5c5523821 user: andy tags: trunk | |

Changes

Changes to report/arend-report.bib.

48 48 author={Page, Rex and Eastlund, Carl and Felleisen, Matthias}, 49 49 booktitle={Proceedings of the 2008 international workshop on Functional and declarative programming in education}, 50 50 pages={21--30}, 51 51 year={2008}, 52 52 organization={ACM} 53 53 } 54 54 55 + 56 +@inproceedings{Ford:2004:PEG:964001.964011, 57 + author = {Ford, Bryan}, 58 + title = {Parsing Expression Grammars: A Recognition-based Syntactic Foundation}, 59 + booktitle = {Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages}, 60 + series = {POPL '04}, 61 + year = {2004}, 62 + isbn = {1-58113-729-X}, 63 + location = {Venice, Italy}, 64 + pages = {111--122}, 65 + numpages = {12}, 66 + url = {http://doi.acm.org/10.1145/964001.964011}, 67 + doi = {10.1145/964001.964011}, 68 + acmid = {964011}, 69 + publisher = {ACM}, 70 + address = {New York, NY, USA}, 71 + keywords = {BNF, GTDPL, TDPL, context-free grammars, lexical analysis, packrat parsing, parsing expression grammars, regular expressions, scannerless parsing, syntactic predicates, unified grammars}, 72 +} 55 73 56 74 @inproceedings{reichelt2010types, 57 75 title={Treating Sets as Types in a Proof Assistant for Ordinary Mathematics}, 58 76 author={Reichelt, Sebastian}, 59 77 date={September 4, 2010}, 60 78 organization={Institute of Informatics, University of Warsaw, Warszawa, Poland}, 61 79 url={http://hlm.sourceforge.net/types.pdf},

Changes to report/arend-report.pdf.

cannot compute difference between binary files

Changes to report/arend-report.tex.

69 69 % All premises/conclusions in inference rules should be in math mode 70 70 \renewcommand{\predicate}[1]{$ #1 $} 71 71 72 72 % Setup fonts. 73 73 \setmainfont[Mapping=tex-text,Numbers=OldStyle]{Junicode} 74 74 \setsansfont[Mapping=tex-text,Numbers=OldStyle,Scale=MatchLowercase]{Gill Sans MT} 75 75 \setmonofont[Mapping=tex-text,Scale=MatchLowercase]{Fantasque Sans Mono} 76 + 77 +% The \lnum{} command displays numbers normally (not old style). The font 78 +% setting above sets them to old-style by default, and you cannot "turn off" 79 +% a font feature, only create a new font with the feature turned off. 80 +\newfontface\liningnum{Junicode} 81 +\newcommand{\lnum}[1]{{\liningnum #1}} 76 82 77 83 % Hyperlinks. We might want these inline, or in the sidebar. 78 84 \newcommand{\link}[2]{#2\footnote{\url{#1}}} 79 85 80 86 % Citations. Both in the text and in the sidebar. The NoHyper makes the actual 81 87 % citation link to the bibliography at the end, rather than to the bibentry 82 88 % in the margin. It also has the unfortunate side effect of making any URLs ................................................................................ 308 314 proof-trees, as the traditional paragraph-proof notation tends to obscure 309 315 the fact that a proof is composed of subproofs. 310 316 311 317 \item Because proofs are hierarchical, their construction is very similar to 312 318 that of other kinds of programming. A proof is iteratively broken down into 313 319 smaller parts, lemmas can be used, like functions, to abstract out 314 320 commonly-used subproofs, or to break down proofs that would otherwise be 315 - too large and unweildy to understand. 321 + too large and unwieldy to understand. 316 322 317 323 \item The lack of traditional negation eliminates some of the more-confusing 318 324 elements of traditional logic programming. In particular, Arend has nothing 319 325 corresponding to ``negation as failure''; it is not possible in Arend to 320 326 know, in a computational sense, when a proposition $P$ fails. Thus reasoning 321 327 in Arend is entirely about things that are known to be true, never about 322 328 things that are assumed to be false. ................................................................................ 385 391 \item \emph{Proof assistants} aim to aid in the development and checking of 386 392 formal proofs. While some systems only check proofs for completeness, more 387 393 modern systems will typically also aid the user in the development of a 388 394 proof. Recent examples 389 395 include Coq \mcitep{Coq:manual}, Twelf \mcitep{pfenning1999system}, and 390 396 Abella \mcitep{gacek08ijcar} Several element of Arend's design 391 397 were influenced by Abella, most notably the use of a constructive two-level 392 - logic for describing and reasoning about systems. 398 + logic for describing and reasoning about systems. For a full history of the 399 + development of computer-assisted proof systems, see 400 + \mcitet{geuvers2009proof}. 393 401 394 402 \item More recently, several systems have emerged which try to bridge the gap 395 403 between traditional functional programming and theorem proving. Systems 396 404 such as Agda \mcitep{norell:thesis}, Idris \mcitep{brady2011idris}, 397 405 and Beluga \mcitep{pientka2010beluga} are based on the 398 406 Curry-Howard isomorphism (see below) and thus represent propositions as 399 407 types, and proofs as functional programs. These systems offer varying ................................................................................ 449 457 support for visualizing proof structure, its consistency, and in the 450 458 degree of ``assistance'' it gives to the user. 451 459 452 460 \item \mcitet{suppes1981future} offers an interesting perspective, that of an 453 461 instructor of mathematics, not computer science, at Stanford University, 454 462 and furthermore, one who had, at the time of this publication, been using 455 463 interactive theorem proving in undergraduate coursework for almost twenty 456 - years. Suppes raises several points of concern which are still revelant, 457 - most notably the conflict between providing a system which is ``intelligent'' 464 + years. Suppes raises several points of concern which are still relevant, 465 + most notably the conflict between providing a system which is 466 + ``intelligent'' 458 467 and, hence, easy to use, and one which meets the pedagogical goals of 459 468 teaching \emph{introductory} material (the sort of material which a more 460 469 intelligent proof assistant would tend to elide). Thus, although Arend is 461 470 actually capable of automatically proving a reasonable large set of 462 471 propositions, during proof construction it intentionally refrains from 463 472 doing so, thus forcing the user to carry out --- and, hopefully, to 464 473 learn --- the elementary steps of proof construction. ................................................................................ 529 538 \inference[Assume-A] 530 539 {} 531 540 {A} 532 541 \] 533 542 However, this assumption only applies within the ``scope'' of the subproof of 534 543 $A -> B$. 535 544 536 -In order to express this, we introduce the \emph{hypothetical judgements} using 545 +In order to express this, we introduce the \emph{hypothetical judgments} using 537 546 the symbol $|-$. 538 -$A_1, A_2, \ldots A_n |- C$ should be read ``$C$ is true \emph{assuming} $A_1$, etc.'' We call the $A_i$ the \emph{antecedants} of the judgement, and 547 +$A_1, A_2, \ldots A_n |- C$ should be read ``$C$ is true \emph{assuming} $A_1$, etc.'' We call the $A_i$ the \emph{antecedents} of the judgment, and 539 548 $C$ the \emph{consequent}. (Although not strictly correct, we will sometimes 540 -refer to the antecedants and consequent of a \emph{rule}, since a rule can have 541 -at most one hypothetical judgement as its conclusion.) 549 +refer to the antecedents and consequent of a \emph{rule}, since a rule can have 550 +at most one hypothetical judgment as its conclusion.) 542 551 \footnote{Note that $|-$ is \emph{not} a logical connective; in particular, it 543 552 is not valid to say, e.g., $A \land (B |- C)$.} We use $\Gamma$ to signify any 544 553 collection of $P_i$. To enable the use of $|-$ in 545 554 our derivations we add the following axiom, known as the \emph{assumption rule}: 546 555 \[ 547 556 \inference[Assume] 548 557 {} ................................................................................ 782 791 derivations produced by queries are a restricted form of the derivations 783 792 constructed as proofs. Thus, queries against the specification can serve as an 784 793 introduction to the creation of simple proofs. 785 794 786 795 Since Arend's specification logic is a restricted form of the Horn-clause 787 796 classical logic 788 797 used in traditional Prolog, a resolution-style proof search procedure is 789 -sufficient to execute queries against a specification. The operational semantics 790 -of this procedure are presented, in an abbreviated form, in 791 -\hyperref[spec-execute]{Appendix~1}. 798 +sufficient to execute queries against a specification. 792 799 Note that in keeping with its intuitionistic foundation, Arend has no negation 793 800 operator. (Intuitionistically, $\neg P$ can be defined as $P \rightarrow \bot$, 794 801 but Arend's specification logic lacks the rightward $\rightarrow$ implication 795 802 operator. This operator \emph{is} present in the reasoning logic, which thus 796 803 requires a more sophisticated handling.) 804 +% spec-execute -- Mention operational semantics here. 797 805 798 806 \subsection{Reasoning logic} 799 807 800 808 \newthought{Arend's reasoning logic} is closer to a full 801 809 expression of first-order 802 810 intuitionistic logic, with extensions to support proofs by single-induction. 803 811 The inference rules defining the reasoning logic are 804 812 presented in Figure \ref{reasoning-rules}. It supports rightward implication 805 813 ($\rightarrow$), with the restriction that the premise cannot contain nested 806 814 implications, only conjunctions, disjunctions, and atomic 807 815 goals.\footnote{This restriction subsumes the \emph{stratification} 808 816 restriction in Abella; stratification in some form is required to ensure 809 -monotonicity of the logic.} This implies that antecedants cannot contain 817 +monotonicity of the logic.} This implies that antecedents cannot contain 810 818 implications, since they cannot be added by $\rightarrow_R$, and are not allowed 811 819 in definitions in the specification logic. 812 820 813 821 %%% ------------------- Reasoning logic rules -------------------- 814 822 \begin{figure}[!p] 815 823 816 824 % Assumption rule ................................................................................ 855 863 % Implication rule 856 864 \[ 857 865 \inference[${->}_R$] 858 866 {\Gamma,P |- Q} 859 867 {\Gamma |- P -> Q} 860 868 \qquad\qquad 861 869 \text{\small\parbox{5cm}{(There is no ${->}_L$ rule, because implications are 862 -not allowed in antecedants.)}} 870 +not allowed in antecedents.)}} 863 871 \] 864 872 865 873 \vspace{1em} 866 874 867 875 % Forall rules 868 876 \[ 869 877 \inference[$\forall_R$] ................................................................................ 944 952 derivation (because this query has no free variables, no substitution is 945 953 displayed). The rules of the specification (the $\mathbb{N}$ specification, 946 954 given in full in \hyperref[sec:nat-spec]{appendix II}) are displayed below 947 955 the input line. 948 956 949 957 \item The interactive \emph{proof assistant} allows the user to construct 950 958 proofs for given propositions, in the context of a particular 951 - specification. 959 + specification. Figure~\ref{fig:passist1} illustrates the proof assistant 960 + interface: the rules of the current specification are displayed in the 961 + pane on the left; since the current proof is inductive, the inductive 962 + hypothesis is included. (Lemmas, once proved, are also displayed as 963 + rules.) 964 + 965 + The right pane displays the current proof statement, and the proof, 966 + here, in progress. The user can double-click on any antecedent, or any goal, 967 + to perform the default elaboration for that element. For example, 968 + double-clicking an antecedent $P \land Q$ would apply the $\land$-L rule, 969 + producing a subproof with antecedents $P,Q$. For situations in which 970 + there is more than one possible action (e.g., the $\lor$-R rules, or when 971 + backchaining against the IH), keyboard interaction is used to select the 972 + appropriate action. 952 973 953 974 \end{itemize} 954 975 955 976 956 977 \begin{figure*}[ht!] 957 978 958 979 \begin{centering} ................................................................................ 960 981 \end{centering} 961 982 962 983 \caption{The browser-based run-eval-print-loop interface} 963 984 \label{fig:repl} 964 985 \end{figure*} 965 986 966 987 967 -\begin{figure*}[ht!] 988 +\begin{figure*}[t!] 968 989 969 990 \begin{centering} 970 -\includegraphics[width=6.5in]{proof-start-sample.png} 991 +\includegraphics[width=6.5in]{proof-sample.png} 971 992 \end{centering} 972 993 973 994 \caption{The proof assistant interface, with an incomplete inductive proof} 974 995 \label{fig:passist1} 975 996 \end{figure*} 976 997 977 998 978 999 \clearpage 979 1000 \section{Implementation} 980 1001 981 -\newthought{Arend currently exists} in two semi-overlapping implementations: 1002 +\newthought{Arend is implemented as} a web-based system, with a server 1003 +component, written in Prolog and running in the 1004 +\link{http://swi-prolog.org}{SWI-Prolog} and a browser-based frontend. 1005 +Currently, the implementation of Arend consists of 1006 + 1007 +\begin{itemize} 1008 +\item \lnum{1,401} lines of Prolog 1009 +\item \lnum{6,198} lines of Javascript (of which \lnum{442} lines are test 1010 + code) 1011 +\item \lnum{493} lines of PEG grammar specification 1012 +\item \lnum{501} lines of HTML 1013 +\item \lnum{129} lines of CSS 1014 +\item \lnum{41} source code files in total 1015 +\end{itemize} 1016 + 1017 +The following libraries and applications are used in the development of Arend; 1018 +these will be described in detail in the following section. 982 1019 \begin{itemize} 983 -\item A \emph{client-based} implementation, written in JavaScript and running 984 -largely in the browser. (The core JavaScript proof-checker can also run 985 -offline, via a JavaScript engine such as \link{https://nodejs.org/}{node.js} or 986 -\link{https://iojs.org/en/index.html}{io.js}.) This implementation is currently 987 -incomplete. 988 -\item A \emph{server-based} implementation written in Prolog, running on top of 989 -the \link{http://swi-prolog.org}{SWI-Prolog} HTTP server. 1020 +\item \link{https://nodejs.org/}{Node.js} 1021 +\item \link{http://www.swi-prolog.org/}{SWI-Prolog} 1022 +\item \link{http://pengines.swi-prolog.org/docs/index.html}{Pengines} 1023 +\item \link{https://jquery.com/}{jQuery} --- General utilities for 1024 + Javascript in a browser environment 1025 +\item \link{https://lodash.com/}{Lodash} --- Collection utilities for Javascript 1026 +\item \link{http://pegjs.org/}{PEG.js} --- Parsing Expression Grammar parser 1027 + generator for Javascript. 1028 +\item \link{https://qunitjs.com/}{QUnit} --- Javascript test framework 1029 +\item \link{http://www.jscheck.org/}{JSCheck} --- Randomized testing 1030 + engine for Javascript. 990 1031 \end{itemize} 991 1032 992 -Both implementations share some front-end code, mostly related to the 993 -representation of terms as JavaScript objects, and to the rendering of rules and 994 -proofs as HTML. 995 - 996 -Details... 997 - 998 -\subsection{Client-based implementation (JavaScript)} 999 - 1000 -Lines of code (UI vs. checker), number of test cases, portability layer for 1001 -browser/offline compatibility, etc. 1002 - 1003 -Libraries used: 1033 +Arend's development is tracked using the \link{http://fossil-scm.org}{Fossil} 1034 +source control management system. As of \lnum{April 18, 2015,} the project 1035 +history consisted of \lnum{294} commits spanning eight months of development. 1036 +Arend, its source code and project history, can be found on the web at 1037 +\url{http://fossil.twicetwo.com/arend.pl}. 1038 + 1039 +\subsection{Automated testing} 1040 +Arend's Javascript code is run through a test suite consisting of \lnum{133} 1041 +automated tests, however, of these, \lnum{12} central tests are randomized, 1042 +running, by default, \lnum{20} randomly generated tests each. Thus, a total 1043 +of \lnum{361} individual tests are run. The test suite can be run in the 1044 +browser, or offline, via Node.js. Testing is handled via the 1045 +\link{https://qunitjs.com/}{QUnit} test framework; a small compatibility 1046 +layer was written to allow QUnit tests to run offline. Randomized testing is 1047 +provided by \link{http://www.jscheck.org/}{JSCheck} a ``port'' of the 1048 +Haskell QuickCheck framework to Javascript. We have extended JSCheck with 1049 +support for randomized generation of Arend data types: atoms, variables, 1050 +and ground and non-ground terms up to a limited depth. 1051 + 1052 +QUnit is designed for automated testing in a browser environment. Arend's 1053 +development, as far as is possible, tries to target both browser and offline 1054 +environments; this is true even for Javascript code. Thus, we wrote ``QNodeit'', 1055 +a small compatibility layer that allows the complete suite of QUnit tests 1056 +to run offline, via Node.js. All non-user-interface tests run successfully in 1057 +both the browser and offline environments. QNodeit also integrates JSCheck into 1058 +QUnit, allowing JSCheck's randomized tests to be used naturally within QUnit. 1059 + 1060 +\subsection{Client-side implementation} 1061 + 1062 +Although Arend's client-side code only serves to provide a user-interface to 1063 +the backend's proof-checking and manipulation engine, it contains a significant 1064 +amount of logic itself. Early in Arend's development we felt that the best 1065 +course would be to implement the entire proof checking engine in Javascript, 1066 +thus allowing proof-handling with no server at all. Although this did not 1067 +prove feasible, a significant amount of logic-handling code in Javascript was 1068 +written, and, so far from being a redundancy, this has proved to be an asset. 1069 +Arend's client is not ``dumb'', but in fact understands a great deal of the 1070 +structure of the proofs it is presenting. This allows for richer user-interface 1071 +possibilities, and more flexible coordination between front- and back-end. 1072 + 1073 +The \link{https://lodash.com/}{Lodash} Javascript library provides a set of 1074 +general utilities, 1075 +mostly aimed at manipulation of collections (arrays and objects) and enabling 1076 +a functional style of programming; Lodash is used extensively throughout 1077 +Arend. Lodash makes its facilities available 1078 +as methods of a global \texttt{\_} object. Thus, filtering out the odd elements 1079 +of an array in ``stock'' Lodash would take the form 1080 +\begin{verbatim} 1081 +_.filter([1,2,3,4], function(e) { return e % 2 == 0; }); 1082 +\end{verbatim} 1083 + 1084 +We have created a wrapper library around Lodash called ``Nodash'' which 1085 +integrates Lodash's utility methods into the global prototypes of the datatypes 1086 +which they operate on. Thus, for example, the \texttt{filter} method, which 1087 +can be applied to any ``collection'' -- array, object, or string --- would 1088 +be installed on the global \texttt{Array.prototype}, \texttt{Object.prototype}, 1089 +and \texttt{String.prototype} objects, so that it is accessible simply as 1090 +\begin{verbatim} 1091 +[1,2,3,4].filter(function(e) { return e % 2 == 0; }); 1092 +\end{verbatim} 1093 +\marginnote[-1in]{Historically, ``monkey-patching'' the global prototypes was 1094 +considered highly unsafe, as new properties would be ``enumerable'' and would 1095 +thus become visible in, for example, \texttt{for-in} loops over the properties 1096 +of \emph{any} object. However, all modern Javascript implementations support 1097 +the \texttt{defineProperty} method, which allows the creation of 1098 +\emph{non-enumerate} properties on objects which do not have this problem. 1099 +Nodash uses \texttt{defineProperty} to safely add Lodash's methods to the 1100 +global prototypes.} 1101 +Note that in Nodash the global \texttt{\_} object is still available, for 1102 +those methods which do not fit with any of the global prototypes. 1103 + 1104 +The client-side implementation includes a complete parser for the 1105 +specification language. This allows user input to be fully syntax-checked 1106 +before being sent to the server, and allows for faster feedback to the user 1107 +when syntax errors occur. The parser is written using 1108 +\link{http://pegjs.org/}{PEG.js}, a parsing expression 1109 + grammar (PEG) \mcitep{Ford:2004:PEG:964001.964011} parser-generator. The 1110 +specification grammar in PEG form consists of \lnum{28} non-terminals and 1111 +\lnum{53} terminal tokens. Of note is the fact that the grammar uses the 1112 +standard 1113 +Unicode character classes in its definition of ``identifier'' and ``operator''; 1114 +an operator, in particular, is defined as any sequence of characters from the 1115 +symbolic or punctuation classes\footnote{Classes Sc, Sm, Sk, So, Pc, Pd, and 1116 +Po.}. This allows for traditional mathematical operators such as $\in$ or 1117 +$\leq$ to be used directly in specifications. 1118 + 1119 +Other modules of note in the client-side implementation include: 1004 1120 \begin{itemize} 1005 -\item jQuery 1121 +\item \texttt{core} --- contains a small 1122 + amount of compatibility code that allows the other non-user-interface 1123 + modules to operate transparently in either the browser or Node. In 1124 + particular, it provides a browser implementation of the \texttt{require()} 1125 + function in Node, used to load modules. In the browser, these modules must 1126 + have already been loaded via standard \texttt{<script>} tags (i.e., dynamic 1127 + loading is not provided), but, once loaded, they are installed into a 1128 + global module repository; \texttt{require} then simply returns a reference 1129 + to the appropriate module. This allows Javascript code to transparently 1130 + access other modules, without knowing whether they are being dynamically 1131 + loaded within Node, or have already been loaded in the browser. 1006 1132 1007 -\item Lo-Dash 1133 +\item \texttt{terms} --- contains a complete representation of \emph{terms}, 1134 + the fundamental datatype of logic programming. The \texttt{terms} module 1135 + supports walking the structure of terms, converting Prolog-style 1136 + term-lists to Javascript Arrays and vice versa, rendering terms to either 1137 + specification or Prolog-compatible strings, and enumerating the variables 1138 + in a term. 1008 1139 1009 - No-dash, our Lo-dash wrapper 1140 +\item \texttt{unify} --- contains a fully-functional implementation of the 1141 + Robinson unification algorithm \mcitep{robinson1965machine}. Although 1142 + this module is exhaustively tested, it is currently minimally used. In the 1143 + future, we hope to build a pattern-matching utility library on it, to allow 1144 + for more natural examination and manipulation of term-structures on the 1145 + client-side. 1010 1146 1011 -\item jsCheck 1012 - 1013 -\item PEG.js 1014 - 1015 -\item qunit 1147 +\item \texttt{term\_render} --- contains the rendering engine for converting 1148 + terms, rules, and derivations to HTML structures. Note that derivation 1149 + rendering is extensible via a number of ``hook'' functions, which allow 1150 + client code to extend or replace the rendering of the various 1151 + proof components: antecedents, consequents, disjunctive consequents, 1152 + and consequents which could be the target of the inductive hypothesis. 1016 1153 \end{itemize} 1017 1154 1018 -Module structure: 1019 -\begin{itemize} 1020 -\item Browser/Node compatibiltiy layer 1021 -\item terms 1022 -\item unify 1023 -\item parser + spec grammar 1024 -\item term rendering 1025 -\end{itemize} 1155 +\subsection{Server-side implementation} 1026 1156 1157 +The core of Arend's proof-checking engine is written in Prolog, and is made 1158 +available to the client via a HTTP interface. SWI-Prolog provides both a 1159 +standard HTTP server module, as well as a specialized ``app engine'' module 1160 +called \textsc{Pengines}, both of which are used in Arend. HTML, CSS, and 1161 +Javascript files for the client-side interface are served via the standard 1162 +HTTP server, while two PEngine ``applications'', \texttt{repl} and 1163 +\texttt{passist} provide the interface to the proof-checker itself. 1027 1164 1028 -\subsection{Server-based implementation (SWI-Prolog)} 1029 - 1030 -Lines of code, for the HTTP server component and the checker component, 1031 -separately. Hopefully the LoC for the checker will be significantly less than 1032 -that for the JS implementation! If we get any tests running, number of test 1033 -cases. 1034 - 1035 -(Development details? Num. of files, SCM used, total hours, etc.) 1165 +PEngines provides a transparent interface between Javascript code running 1166 +in the browser, and Prolog code running on the server. With it, our 1167 +client-side code can directly execute queries against the exported 1168 +predicates of the two aforementioned applications. The results (success, 1169 +failure, and substitution) of those queries can then be enumerated. Terms 1170 +are encoded as JSON objects (our \texttt{terms} module can decode JSON 1171 +terms into its own types). 1036 1172 1037 1173 It should be noted that the Prolog implementation of Arend is mostly portable, 1038 1174 relying only on ISO-Prolog predicates, commonly-available libraries, and a few 1039 -SWI-Prolog-specific extensions (mostly dealing with limiting the depth of search 1040 -trees). 1175 +SWI-Prolog-specific extensions and modules (most notably the HTTP server 1176 +module). It would not be difficult to port the proof-checker itself to another 1177 +Prolog system. 1041 1178 1042 -It is interesting to directly compare the two implementations of unification 1043 -(one of the few modules which is semantically similar in both 1044 -implementations). Both systems implement the Robinson unification 1045 -algorithm \mcitep{robinson1965machine}. The Javascript implementation consists of 1046 -136 lines of non-comment code and is quite difficult to follow; the Prolog 1047 -implementation requires only 69 lines, 1048 -an almost 50\% reduction! This despite the fact that the Prolog implementation 1049 -does \emph{not} use the built-in unification to simplify the algorithm at all; 1050 -it would be largely the same if only traditional assignment were used. 1051 - 1052 -Our future goal is for these two implementations to be built off the same 1053 -underlying codebase, for example, by moving the client implementation towards 1054 -a full implementation of the Warren Abstract Machine \mcitep{warren1983abstract}. 1055 -This would allow the 1056 -Prolog code currently used in the server implementation to transparently run 1057 -in a JavaScript environment, whether the browser or one of the aforementioned 1058 -offline engines. 1059 - 1060 -Module structure: 1179 +The Prolog core proof-checker implementation consists of the following modules, 1180 +the functionality of which will be described in detail in the following 1181 +sections: 1061 1182 \begin{itemize} 1062 -\item explicit substitutions 1063 -\item specification execution 1064 -\item proof checking and elaboration 1065 -\item server, Pengines, etc. 1183 +\item \texttt{subst} --- provides support for working with \emph{explicit 1184 + substitutions}. As described below, Arend cannot use Prolog's own 1185 + substitution, as the substitutions that are applied to a proof object 1186 + may differ in subtrees; Prolog applies a substitution globally. Thus, we 1187 + re-implement both variables and substitutions for our own use. 1188 + 1189 +\item \texttt{program} --- provides support for expanding atomic goals in the 1190 + specification language, and for executing goals in the specification. 1191 + The \texttt{run/3} predicate produces proof objects compatible with those 1192 + produced by the full proof checker, but is an independent implementation. 1193 + 1194 +\item \texttt{checker} --- the heart of the proof checking system, provides 1195 + routines for dealing with proof objects, elaborating incomplete proofs, 1196 + and checking and generating proof objects corresponding to particular 1197 + statements. 1066 1198 \end{itemize} 1067 1199 1068 1200 \subsection{Proof representation} 1069 1201 1070 1202 Informally, the content of a derivation is relatively simple: a tree of 1071 1203 \textsc{-Left} and/or \textsc{-Right} rule applications, drawn from the rules 1072 1204 of the reasoning logic in figure \ref{reasoning-rules}. In practice, a more 1073 -detailed representation is required, one which, in particular, necessates the 1205 +detailed representation is required, one which, in particular, necessitates the 1074 1206 use of \emph{explicit substitution}, rather than the implicit substitution that 1075 1207 would result from naive use of (for example) the system unification in Prolog. 1076 1208 1077 -For derivations purely derived from the specifcation language, explicit 1209 +For derivations purely derived from the specification language, explicit 1078 1210 substitution is not required. This is because for any valid derivation of a 1079 1211 proposition expressed in the specification language, the substitution is 1080 1212 \emph{consistent} throughout the derivation tree; it is not possible for 1081 1213 different ``branches'' to have different substitutions. (Disjunction may, of 1082 1214 course, result in the generation of multiple distinct derivations, each with its 1083 1215 own unifying substitution, but within any single derivation there is always a 1084 1216 single consistent substitution.) 1085 1217 1086 -However, because the reasoning language posesses rightward implication, a new 1087 -element is introduced to derivations: the list of \emph{antecedants}. Case 1088 -analysis on a disjunction antecedant is superficially similar applying the 1218 +However, because the reasoning language possesses rightward implication, a new 1219 +element is introduced to derivations: the list of \emph{antecedents}. Case 1220 +analysis on a disjunction antecedent is superficially similar applying the 1089 1221 $\land$-R rule, with an important distinction: the branches produced each have 1090 1222 \emph{independent} bindings. Consider, for example, case analysis on 1091 1223 $\mathsf{nat}(X)$ in the course of a proof: 1092 1224 \[ 1093 1225 \inference[] 1094 1226 {X = z, \mathsf{nat}(z) |- \ldots & 1095 1227 X = s(X'), \mathsf{nat}(X') |- \ldots} ................................................................................ 1102 1234 inconsistent, and actually impossible to express, because Horn clauses forbid 1103 1235 such conjunctions as conclusions. Conversely, given the conclusion 1104 1236 $X = z \lor X = s(x')$ we have 1105 1237 the choice of which unification to use, and thus there is no inconsistency.} 1106 1238 1107 1239 Because of this, each branch of a derivation must maintain its own substitution, 1108 1240 applied to its goals as they are expanded. 1241 + 1242 +Proofs are represented as a term of the form 1243 +\begin{verbatim} 1244 +proof(Goal,Ctx,Subst,Proof) 1245 +\end{verbatim} 1246 +where \texttt{Goal} is the goal to be proved, \texttt{Ctx} is the list of 1247 +antecedents (initially empty for most goals), \texttt{Subst} is the 1248 +substitution that results if the goal is proved, and \texttt{Proof} is a 1249 +proof term for this particular goal. 1250 + 1251 +The proof terms vary, depending on the structure of the goal and the contents 1252 +of the context. Non-axiomatic proof terms include one or more 1253 +sub-proofs. The possible proof terms are 1254 +\begin{itemize} 1255 +\item \texttt{induction(N,Proof)} --- proves a $\forall$ inductively, on 1256 + the N-th element of the context. 1257 +\item \texttt{ih(Proof)} --- Proves an atomic goal by backchaining it against 1258 + the inductive hypothesis. 1259 +\item \texttt{generic(Proof)} --- Proves a $\forall$ generically (i.e., by 1260 + substituting a unique constant for the quantified variable). 1261 +\item \texttt{instan(V,Proof)} --- Proves a $\exists$ by giving a value for 1262 + the quantified variable. 1263 +\item \texttt{product([Proofs...])} --- Proves a conjunction by providing a 1264 + subproof for each conjunct. 1265 +\item \texttt{choice(N,Proof)} --- Proves the N-th branch of a disjunction. 1266 +\item \texttt{implies(Proof)} --- Proves an implication, by adding the 1267 + left-hand-side to the context as an assumption and then proving the 1268 + right-hand-side. 1269 +\item \texttt{ctxmem(N)} --- Implements the ``assumption rule''; i.e., proves 1270 + an atomic goal by unifying it with the N-th element of the context. 1271 +\item \texttt{backchain(Proof)} --- Proves an atomic goal by backchaining it; 1272 + i.e., by expanding it into its definition, the disjunction of its clauses. 1273 +\item \texttt{unify(A,B)} --- Proves a goal of the form \texttt{A = B} 1274 + (this has the side-effect of adding the substitution produced by 1275 + $A = B$ to the \texttt{Subst} for this proof subtree). 1276 +\item \texttt{case(T,N,Keep,[Proofs...])} --- Performs case-analysis on an 1277 + element of the context. The type \texttt{T} corresponds to the 1278 + various -Left rules of the reasoning logic. 1279 +\item \texttt{trivial} --- Proves $\top$. 1280 +\item \texttt{hole} --- Proves any goal, but represents an incomplete proof. 1281 +\end{itemize} 1109 1282 1110 1283 \subsection{Capture avoidance} 1111 1284 1112 1285 The use of explicit substitution means that the proof-checker must also contend 1113 1286 with another troublesome problem which Prolog conceals: avoiding \emph{accidental 1114 1287 capture} when expanding a call to a goal. Consider the goal $\mathsf{nat}(X)$. 1115 1288 We would expect this goal to succeed twice, first with the solution $X = z$. ................................................................................ 1139 1312 $X \mapsto Y$ to $\forall Y, P(X,Y)$; clearly, one of the $Y$'s will need to 1140 1313 be renamed. Since the $Y$ in the substitution is most likely bound somewhere 1141 1314 else in the derivation, we choose to rename the $Y$ bound within the $\forall$. 1142 1315 1143 1316 \subsection{Unification} 1144 1317 1145 1318 The Robinson unification algorithm is presented, in natural deduction style, 1146 - in figure 1147 -\ref{unification-rules}. Note that the same algorithm is used, with minor 1319 + in figure~\ref{unification-rules}. 1320 + Note that the same algorithm is used, with minor 1148 1321 modification, in both the Javascript and Prolog implementations (for a direct 1149 -comparison of these two implementations, see comments below). 1322 +comparison of these two implementations, see our comments below). 1150 1323 1151 1324 \begin{figure}[h!] 1152 1325 \[ 1153 1326 \inference[Wildcard] 1154 1327 {} 1155 1328 {\_ \equiv t \mid \varepsilon} 1156 1329 \] ................................................................................ 1184 1357 {} 1185 1358 {\mathtt{[]} \equiv \mathtt{[]} \mid \varepsilon} 1186 1359 \qquad\qquad 1187 1360 \inference[Cons] 1188 1361 {t \equiv s \mid \theta_1 & \hat{t} \theta_1 \equiv \hat{s} \theta_1 \mid \theta_2} 1189 1362 {(t:\hat{t}) \equiv (s:\hat{s}) \mid \theta_2} 1190 1363 \] 1364 +\caption{Rules for the Robinson unification algorithm} 1191 1365 \label{unification-rules} 1192 1366 \end{figure} 1193 1367 1194 1368 (This presentation is based in part on that in \mcitet{pfenning2006logic}.) 1369 + 1370 +It is interesting to directly compare the two implementations of unification 1371 +(one of the few modules which is semantically similar in both 1372 +implementations). Both systems implement the same algorithm. The Javascript 1373 +implementation consists of 1374 +136 lines of non-comment code and is quite difficult to follow; the Prolog 1375 +implementation requires only 69 lines, 1376 +an almost 50\% reduction! This despite the fact that the Prolog implementation 1377 +does \emph{not} use the built-in unification to simplify the algorithm at all; 1378 +it would be largely the same if only traditional assignment were used. 1195 1379 1196 1380 1197 1381 \subsection{Proof Checking} 1198 1382 1199 1383 The proof checker is implemented in approximately 450 lines of Prolog code. It 1200 1384 essentially implements the rules of the reasoning logic presented in figure 1201 1385 \ref{reasoning-rules}, by recursively checking the proof tree. That is, it 1202 1386 first checks the root node to ensure that the proof object is consistent with 1203 -the conclusion (premises and succedant) by ensuring that it has the correct 1387 +the conclusion (antecedents and consequent) by ensuring that it has the correct 1204 1388 number and type of subproofs. The subproofs are then recursively checked, 1205 1389 working through the tree until the axiomatic rules are reached (or, in the 1206 1390 case of an incomplete proof, a \texttt{hole} is found, indicating an unproved 1207 1391 subtree). 1208 1392 1209 1393 There are two complications to a straightforward top-down recursive 1210 1394 implementation: ................................................................................ 1253 1437 \subsection{Proof Elaboration} 1254 1438 1255 1439 The heart of the incremental proof checking algorithm is the predicate 1256 1440 \texttt{elaborate} (defined in file \texttt{checker.pl}). \texttt{elaborate} 1257 1441 works in conjunction with \texttt{check}, the proof checker itself. The purpose 1258 1442 of elaborate is to take an incomplete proof, together with a reference to one 1259 1443 of its leaves (i.e., to a $?$), and to ``elaborate'' it with respect to either 1260 -its succedant, or one of its premises. As described above, the form of the 1444 +its consequent, or one of its antecedents. As described above, the form of the 1261 1445 selected element dictates the forms of the subproofs. \texttt{elaborate} 1262 1446 ``fills in'' these subproofs, to a single level only (i.e., all the subproofs 1263 1447 it constructs are themselves $?$). 1264 1448 1265 1449 In a logic which did not include unification or the ability to call defined 1266 1450 predicates (atomic goals), we could elaborate any node of the tree, 1267 1451 independent of the rest of the tree. ................................................................................ 1454 1638 1455 1639 \begin{figure}[h!] 1456 1640 1457 1641 All left-rules instantiate the Proof to a term of the form 1458 1642 \[ 1459 1643 \text{case}(\text{Type},N,\text{Keep},[\text{Proofs}\ldots]) 1460 1644 \] 1461 -In the rules below we leave $N$ (the index in the list of antecedants of the 1645 +In the rules below we leave $N$ (the index in the list of antecedents of the 1462 1646 judgment to be cased upon) implicit and omit Keep (a Boolean flag indicating 1463 -whether the targeted antecedant should remain in the list, or be 1647 +whether the targeted antecedent should remain in the list, or be 1464 1648 removed), and write them as 1465 1649 \[ 1466 1650 \text{Type}(\text{Proofs}\ldots) 1467 1651 \] 1468 1652 1469 1653 \[ 1470 1654 \inference[False-Left] ................................................................................ 1553 1737 it has the form $P \land Q$, then the output proof will be elaborated to 1554 1738 \[ 1555 1739 \proof{P \land Q}{\Gamma}{\text{product}([ 1556 1740 \proof{P}{\Gamma}{\text{hole}}, 1557 1741 \proof{Q}{\Gamma}{\text{hole}} 1558 1742 ])} 1559 1743 \] 1560 -The subproofs of the product will have the correct succedants and premises, 1744 +The subproofs of the product will have the correct consequents and antecedents, 1561 1745 but their own proofs will be empty, ready for further elaboration. 1562 1746 1563 1747 1564 1748 \subsection{Inductive reasoning} 1565 1749 1566 1750 Arend's reasoning logic supports proofs of a universal quantification both 1567 1751 generically and by \emph{induction}. Arend's induction is technically on the 1568 1752 height of derivations, although from the perspective of the user it supports 1569 1753 full structural induction on terms; this subsumes the usual natural number 1570 1754 induction. 1571 1755 1572 1756 Arend supports only single-induction proofs (i.e., induction on 1573 -multiple premisses is not allowed) and supports only induction global to a 1757 +multiple antecedents is not allowed) and supports only induction global to a 1574 1758 proof (i.e., nested 1575 1759 inductions are not allowed, although they can be ``faked'' by using lemmas). 1576 1760 These restrictions imply that the induction hypothesis can be regarded as 1577 1761 being global to a proof, thus eliminating the need to restrict the scope 1578 1762 of difference induction hypotheses to different branches of the proof tree. 1579 1763 1580 1764 Internally, induction is implemented by \emph{goal tagging}. When an inductive 1581 -proof is declared, a particular goal in the antecedants is selected, by the 1765 +proof is declared, a particular goal in the antecedents is selected, by the 1582 1766 user, to be the target of the induction.\marginnote{For example, in a proof of 1583 1767 $\mathsf{nat}(X) |- \mathsf{add}(X,0,X)$ we would induct on $\mathsf{nat}(X)$.} 1584 1768 This goal must be a user goal; it cannot be a built-in operator such as 1585 1769 conjunction, disjunction, or unification. The functor of the goal is internally 1586 1770 flagged as being ``big'' (indicated as $\uparrow$) and the induction 1587 1771 hypothesis is defined in terms of the same goal, but flagged as 1588 -``small'' ($\downarrow$). For example, to prove that $\mathsf{nat}(X) -> \mathsf{add}(X,0,X)$ our proof would proceed by induction on $\mathsf{nat}(X)$, and 1772 +``small'' ($\downarrow$). For example, to prove that 1773 +$\mathsf{nat}(X) -> \mathsf{add}(X,0,X)$ our proof would proceed 1774 +by induction on $\mathsf{nat}(X)$, and 1589 1775 thus we would have the inductive hypothesis 1590 1776 $\mathsf{nat}^\downarrow(X) -> \mathsf{add}(X,0,X)$. 1591 1777 1592 1778 When a $\uparrow$ goal is expanded by case analysis or backchaining, 1593 1779 any sub-goals in its expansion 1594 1780 are flagged as $\downarrow$, indicating that they are ``smaller'' than the original 1595 1781 goal. The induction hypothesis can only be applied to goals which are ``small'', ................................................................................ 1673 1859 being a flaw in our system, this surprising result reflects the fact that one 1674 1860 computational interpretation of $\bot$ is that of a non-terminating computation. 1675 1861 In fact, this result is simply the computational analogue of the well-known 1676 1862 \emph{ex falso quodlibet} principle of classical logic. 1677 1863 1678 1864 \clearpage 1679 1865 \section{Future work} 1866 + 1867 +There are many areas in which Arend could be enhanced and extended. We 1868 +examine ten possibilities here, including one, equational reasoning, in some 1869 +depth. 1680 1870 1681 1871 \begin{itemize} 1682 1872 \item Arend's specification logic is simple enough that formalization of 1683 1873 its properties should not be difficult, nonetheless, for formal correctness 1684 1874 and consistency 1685 1875 work needs to be done to show that it is both \emph{sound} (everything it 1686 1876 proves true is true) and \emph{non-deterministically complete} (when it ................................................................................ 1758 1948 conclusion. Since one of the difficulties of the derivation-tree proof 1759 1949 format is its often-extravagant use of horizontal space, it should be 1760 1950 possible to automatically \emph{extract} a sub-proof as a lemma. This 1761 1951 could be done even when the lemma is not generally useful, but simply 1762 1952 as a way of ``naming'' a portion of the proof or commenting on its 1763 1953 purpose. 1764 1954 1765 -\item The implementaiton of Arend is primitively minimal, suitable only for 1955 +\item The implementation of Arend is primitively minimal, suitable only for 1766 1956 small-scale usage. Real-world usage would require integration with existing 1767 1957 grading systems and automatic checks of student-submitted proofs. On a higher 1768 1958 level we could easily imagine a set of features aimed at 1769 1959 creating an ecosystem around student-created specifications, proofs, and 1770 1960 lemmas, perhaps with collaboration features for multi-student use. 1771 1961 1772 1962 \item Although we believe Arend's derivation-tree presentation of proofs has ................................................................................ 1801 1991 \subsection{Equational reasoning} 1802 1992 1803 1993 One very large extension to the logic which we would like to investigate would 1804 1994 be the incorporation of \emph{equational reasoning}. Arend's current concept 1805 1995 of equality is based entirely on unification: two terms are equal if they can 1806 1996 be made equal by some substitution. Equational reasoning allows equality to 1807 1997 be defined via rules, for example \marginnote{For computational purposes equational rules are often regarded as 1808 -unidirectional \emph{rewite rules}: $a = b$ becomes $a \mapsto b$.} 1998 +unidirectional \emph{rewrite rules}: $a = b$ becomes $a \mapsto b$.} 1809 1999 1810 2000 \[ 1811 2001 \inference[$=$-Refl] 1812 2002 {} 1813 2003 {x = x} 1814 2004 \qquad 1815 2005 \inference[$=$-Symm] ................................................................................ 1830 2020 \inference[$\times$-dist-$+$] 1831 2021 {} 1832 2022 {a \times (b + c) = a \times b + a \times c} 1833 2023 \] 1834 2024 1835 2025 Equational reasoning is a particularly powerful mechanism for reasoning about 1836 2026 \emph{functional programs} and programming languages; we can state 1837 -equivalencies between expressions and 2027 +equivalences between expressions and 1838 2028 then prove that two programs, or two classes of programs, are equivalent under 1839 2029 those rules. Even more powerfully, if we wish to prove some property of an 1840 -entire language, it is sufficient to show that the equivalencies preserve that 2030 +entire language, it is sufficient to show that the equivalences preserve that 1841 2031 property. 1842 2032 1843 2033 Although the additional of equational reasoning to Arend would give the system 1844 2034 a marked increase in logical power, it would also correspondingly increase the 1845 2035 complexity of its implementation and presentation. Equivalence rules require 1846 2036 a different treatment of terms and variables from unification, and it is not 1847 2037 entirely clear how the two will interact, logically. Equivalences over programs ................................................................................ 1850 2040 logic variables, but can both contain, and be contained by, them. Although 1851 2041 logics exist which provide support for reasoning about binding structures 1852 2042 \mcitep{miller05tocl} their usage in proofs involves additional complexity that 1853 2043 may be at odd with our goal of use in \emph{introductory} curriculum. 1854 2044 1855 2045 \clearpage 1856 2046 \section{Appendix I: Syntax of the Specification Logic} 1857 -\label{spec-syntax} 2047 +\label{sec:spec-syntax} 1858 2048 1859 2049 \newcommand{\nterm}[1]{\mathop{\langle\text{#1}\rangle}} 1860 2050 \newcommand{\sterm}[1]{\mathop{\text{\emph{#1}}}} 1861 2051 \newcommand{\term}[1]{\mathop{\text{\texttt{``#1''}}}} 1862 2052 1863 2053 \newthought{The following is a simplified presentation} of the grammar of the 1864 2054 specification 1865 2055 language. In particular, precedence rules for infix operators have been 1866 2056 omitted, but are consistent with normal usage. Note that while the grammar 1867 2057 includes rules defining infix arithmetic and comparison operators, these 1868 2058 operators have no \emph{semantic} significance within the specification logic. 1869 2059 They are present under the assumption that specifications may want to use them, 1870 -and will expect them to have their normal precendence ordering. 2060 +and will expect them to have their normal precedence ordering. 1871 2061 1872 2062 \begin{align*} 1873 2063 \nterm{start} &<- \nterm{definitions}? \\ 1874 2064 \\ 1875 2065 \nterm{definitions} &<- \nterm{definition} \nterm{definitions}* \\ 1876 2066 \nterm{definition} &<- \sterm{Rulename}? \nterm{predicate} \\ 1877 2067 &\mid \nterm{infix-decl} \\ ................................................................................ 1900 2090 &\mid \term{(} \nterm{term} \term{)} \\ 1901 2091 &\mid \sterm{Atom} \\ 1902 2092 &\mid \sterm{Variable} \\ 1903 2093 \\ 1904 2094 \nterm{compound} &<- \sterm{Atom} \term{(} \nterm{termlist} \term{)} \\ 1905 2095 \end{align*} 1906 2096 1907 -\subsection{Operational Semantics}\label{spec-execute} 2097 +%\subsection{Operational Semantics}\label{spec-execute} 1908 2098 1909 -This may or may not get written. If it does, it will be based on the 1910 -stack-based semantics of \mcitet{pfenning2006logic}. 2099 +%This may or may not get written. If it does, it will be based on the 2100 +%stack-based semantics of \mcitet{pfenning2006logic}. 1911 2101 1912 2102 \clearpage 1913 2103 \section[Appendix II: Specification for N, lists]{Appendix II: Specification for $\mathbb{N}$, lists} 1914 2104 \label{sec:nat-spec} 1915 2105 1916 2106 \newthought{The following is an example specification} included with Arend, 1917 2107 one which