Check-in [98924328ad]

Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Penultimate draft of the report. After spell checking and such, this will become the final draft.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:98924328add2f2586fae104b095ae6bdbe806d8f
User & Date: andy 2015-04-18 19:48:55
Context
2015-04-18
21:14
Final draft of report. (Unless I think of something else to add...) check-in: 04f1cd0696 user: andy tags: trunk
19:48
Penultimate draft of the report. After spell checking and such, this will become the final draft. check-in: 98924328ad user: andy tags: trunk
18:47
Added a TODO comment describing the problems around applying the substitution to the body of an expanded clause, instead of appending them as unification goals. check-in: b5c5523821 user: andy tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to report/arend-report.bib.

48
49
50
51
52
53
54


















55
56
57
58
59
60
61
  author={Page, Rex and Eastlund, Carl and Felleisen, Matthias},
  booktitle={Proceedings of the 2008 international workshop on Functional and declarative programming in education},
  pages={21--30},
  year={2008},
  organization={ACM}
}




















@inproceedings{reichelt2010types,
  title={Treating Sets as Types in a Proof Assistant for Ordinary Mathematics},
  author={Reichelt, Sebastian},
  date={September 4, 2010},
  organization={Institute of Informatics, University of Warsaw, Warszawa, Poland},
  url={http://hlm.sourceforge.net/types.pdf},







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
  author={Page, Rex and Eastlund, Carl and Felleisen, Matthias},
  booktitle={Proceedings of the 2008 international workshop on Functional and declarative programming in education},
  pages={21--30},
  year={2008},
  organization={ACM}
}


@inproceedings{Ford:2004:PEG:964001.964011,
 author = {Ford, Bryan},
 title = {Parsing Expression Grammars: A Recognition-based Syntactic Foundation},
 booktitle = {Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages},
 series = {POPL '04},
 year = {2004},
 isbn = {1-58113-729-X},
 location = {Venice, Italy},
 pages = {111--122},
 numpages = {12},
 url = {http://doi.acm.org/10.1145/964001.964011},
 doi = {10.1145/964001.964011},
 acmid = {964011},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {BNF, GTDPL, TDPL, context-free grammars, lexical analysis, packrat parsing, parsing expression grammars, regular expressions, scannerless parsing, syntactic predicates, unified grammars},
} 

@inproceedings{reichelt2010types,
  title={Treating Sets as Types in a Proof Assistant for Ordinary Mathematics},
  author={Reichelt, Sebastian},
  date={September 4, 2010},
  organization={Institute of Informatics, University of Warsaw, Warszawa, Poland},
  url={http://hlm.sourceforge.net/types.pdf},

Changes to report/arend-report.pdf.

cannot compute difference between binary files

Changes to report/arend-report.tex.

69
70
71
72
73
74
75






76
77
78
79
80
81
82
...
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
...
385
386
387
388
389
390
391
392


393
394
395
396
397
398
399
...
449
450
451
452
453
454
455
456
457

458
459
460
461
462
463
464
...
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
...
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796

797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
...
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
...
944
945
946
947
948
949
950
951













952
953
954
955
956
957
958
...
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981




982
983
984












985
986
987
988
989










990
991
992
993
994





995
996












997








998
999
1000
1001










1002
1003







1004
1005


1006
1007



















1008
1009














1010
1011













1012
1013






1014
1015
1016






1017
1018
1019
1020
1021
1022
1023
1024






1025
1026
1027
1028
1029
1030
1031
1032
1033







1034
1035







1036
1037
1038
1039
1040

1041
1042
1043
1044
1045
1046
1047
1048
1049
1050

1051
1052
1053
1054
1055
1056
1057
1058
1059


1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
....
1102
1103
1104
1105
1106
1107
1108









































1109
1110
1111
1112
1113
1114
1115
....
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
....
1184
1185
1186
1187
1188
1189
1190

1191
1192
1193
1194










1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
....
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
....
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
....
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587

1588

1589
1590
1591
1592
1593
1594
1595
....
1673
1674
1675
1676
1677
1678
1679




1680
1681
1682
1683
1684
1685
1686
....
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
....
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
....
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
....
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
....
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
% All premises/conclusions in inference rules should be in math mode
\renewcommand{\predicate}[1]{$ #1 $}

% Setup fonts.
\setmainfont[Mapping=tex-text,Numbers=OldStyle]{Junicode}
\setsansfont[Mapping=tex-text,Numbers=OldStyle,Scale=MatchLowercase]{Gill Sans MT}
\setmonofont[Mapping=tex-text,Scale=MatchLowercase]{Fantasque Sans Mono}







% Hyperlinks. We might want these inline, or in the sidebar.
\newcommand{\link}[2]{#2\footnote{\url{#1}}}

% Citations. Both in the text and in the sidebar. The NoHyper makes the actual
% citation link to the bibliography at the end, rather than to the bibentry
% in the margin. It also has the unfortunate side effect of making any URLs
................................................................................
    proof-trees, as the traditional paragraph-proof notation tends to obscure
    the fact that a proof is composed of subproofs. 

\item Because proofs are hierarchical, their construction is very similar to 
    that of other kinds of programming. A proof is iteratively broken down into
    smaller parts, lemmas can be used, like functions, to abstract out 
    commonly-used subproofs, or to break down proofs that would otherwise be
    too large and unweildy to understand. 

\item The lack of traditional negation eliminates some of the more-confusing 
    elements of traditional logic programming. In particular, Arend has nothing
    corresponding to ``negation as failure''; it is not possible in Arend to 
    know, in a computational sense, when a proposition $P$ fails. Thus reasoning
    in Arend is entirely about things that are known to be true, never about 
    things that are assumed to be false.
................................................................................
\item \emph{Proof assistants} aim to aid in the development and checking of
    formal proofs. While some systems only check proofs for completeness, more
    modern systems will typically also aid the user in the development of a
    proof. Recent examples 
    include Coq \mcitep{Coq:manual}, Twelf \mcitep{pfenning1999system}, and 
    Abella \mcitep{gacek08ijcar} Several element of Arend's design
    were influenced by Abella, most notably the use of a constructive two-level
    logic for describing and reasoning about systems.



\item More recently, several systems have emerged which try to bridge the gap
    between traditional functional programming and theorem proving. Systems
    such as Agda \mcitep{norell:thesis}, Idris \mcitep{brady2011idris}, 
    and Beluga \mcitep{pientka2010beluga} are based on the
    Curry-Howard isomorphism (see below) and thus represent propositions as 
    types, and proofs as functional programs. These systems offer varying 
................................................................................
    support for visualizing proof structure, its consistency, and in the 
    degree of ``assistance'' it gives to the user.

\item \mcitet{suppes1981future} offers an interesting perspective, that of an
    instructor of mathematics, not computer science, at Stanford University,
    and furthermore, one who had, at the time of this publication, been using
    interactive theorem proving in undergraduate coursework for almost twenty
    years. Suppes raises several points of concern which are still revelant, 
    most notably the conflict between providing a system which is ``intelligent''

    and, hence, easy to use, and one which meets the pedagogical goals of 
    teaching \emph{introductory} material (the sort of material which a more
    intelligent proof assistant would tend to elide). Thus, although Arend is
    actually capable of automatically proving a reasonable large set of
    propositions, during proof construction it intentionally refrains from 
    doing so, thus forcing the user to carry out --- and, hopefully, to 
    learn --- the elementary steps of proof construction.
................................................................................
\inference[Assume-A]
    {}
    {A}
\]
However, this assumption only applies within the ``scope'' of the subproof of 
$A -> B$. 

In order to express this, we introduce the \emph{hypothetical judgements} using
the symbol $|-$. 
$A_1, A_2, \ldots A_n |- C$ should be read ``$C$ is true \emph{assuming} $A_1$, etc.'' We call the $A_i$ the \emph{antecedants} of the judgement, and 
$C$ the \emph{consequent}. (Although not strictly correct, we will sometimes
refer to the antecedants and consequent of a \emph{rule}, since a rule can have
at most one hypothetical judgement as its conclusion.)
\footnote{Note that $|-$ is \emph{not} a logical connective; in particular, it
is not valid to say, e.g., $A \land (B |- C)$.} We use $\Gamma$ to signify any
collection of $P_i$. To enable the use of $|-$ in
our derivations we add the following axiom, known as the \emph{assumption rule}:
\[
\inference[Assume]
    {}
................................................................................
derivations produced by queries are a restricted form of the derivations 
constructed as proofs. Thus, queries against the specification can serve as an
introduction to the creation of simple proofs. 

Since Arend's specification logic is a restricted form of the Horn-clause 
classical logic
used in traditional Prolog, a resolution-style proof search procedure is 
sufficient to execute queries against a specification. The operational semantics
of this procedure are presented, in an abbreviated form, in 
\hyperref[spec-execute]{Appendix~1}.
Note that in keeping with its intuitionistic foundation, Arend has no negation
operator. (Intuitionistically, $\neg P$ can be defined as $P \rightarrow \bot$,
but Arend's specification logic lacks the rightward $\rightarrow$ implication 
operator. This operator \emph{is} present in the reasoning logic, which thus
requires a more sophisticated handling.)


\subsection{Reasoning logic}

\newthought{Arend's reasoning logic} is closer to a full 
expression of first-order
intuitionistic logic, with extensions to support proofs by single-induction. 
The inference rules defining the reasoning logic are 
presented in Figure \ref{reasoning-rules}. It supports rightward implication 
($\rightarrow$), with the restriction that the premise cannot contain nested 
implications, only conjunctions, disjunctions, and atomic 
goals.\footnote{This restriction subsumes the \emph{stratification} 
restriction in Abella; stratification in some form is required to ensure 
monotonicity of the logic.} This implies that antecedants cannot contain
implications, since they cannot be added by $\rightarrow_R$, and are not allowed
in definitions in the specification logic.

%%% ------------------- Reasoning logic rules --------------------
\begin{figure}[!p]

% Assumption rule
................................................................................
% Implication rule
\[
\inference[${->}_R$]
    {\Gamma,P |- Q}
    {\Gamma |- P -> Q}
\qquad\qquad
\text{\small\parbox{5cm}{(There is no ${->}_L$ rule, because implications are 
not allowed in antecedants.)}}
\]

\vspace{1em}

% Forall rules
\[
\inference[$\forall_R$]
................................................................................
    derivation (because this query has no free variables, no substitution is
    displayed). The rules of the specification (the $\mathbb{N}$ specification,
    given in full in \hyperref[sec:nat-spec]{appendix II}) are displayed below
    the input line.

\item The interactive \emph{proof assistant} allows the user to construct
    proofs for given propositions, in the context of a particular 
    specification.














\end{itemize}


\begin{figure*}[ht!]

\begin{centering}
................................................................................
\end{centering}

\caption{The browser-based run-eval-print-loop interface}
\label{fig:repl}
\end{figure*}


\begin{figure*}[ht!]

\begin{centering}
\includegraphics[width=6.5in]{proof-start-sample.png}
\end{centering}

\caption{The proof assistant interface, with an incomplete inductive proof}
\label{fig:passist1}
\end{figure*}


\clearpage
\section{Implementation}

\newthought{Arend currently exists} in two semi-overlapping implementations:




\begin{itemize}
\item A \emph{client-based} implementation, written in JavaScript and running
largely in the browser. (The core JavaScript proof-checker can also run 












offline, via a JavaScript engine such as \link{https://nodejs.org/}{node.js} or
\link{https://iojs.org/en/index.html}{io.js}.) This implementation is currently
incomplete.
\item A \emph{server-based} implementation written in Prolog, running on top of
the \link{http://swi-prolog.org}{SWI-Prolog} HTTP server. 










\end{itemize}

Both implementations share some front-end code, mostly related to the 
representation of terms as JavaScript objects, and to the rendering of rules and
proofs as HTML.






Details...





















\subsection{Client-based implementation (JavaScript)}

Lines of code (UI vs. checker), number of test cases, portability layer for 
browser/offline compatibility, etc.











Libraries used:







\begin{itemize}
\item jQuery



\item Lo-Dash




















    No-dash, our Lo-dash wrapper















\item jsCheck














\item PEG.js







\item qunit
\end{itemize}







Module structure:
\begin{itemize}
\item Browser/Node compatibiltiy layer
\item terms
\item unify
\item parser + spec grammar
\item term rendering






\end{itemize}


\subsection{Server-based implementation (SWI-Prolog)}

Lines of code, for the HTTP server component and the checker component, 
separately. Hopefully the LoC for the checker will be significantly less than
that for the JS implementation! If we get any tests running, number of test 
cases.








(Development details? Num. of files, SCM used, total hours, etc.)








It should be noted that the Prolog implementation of Arend is mostly portable,
relying only on ISO-Prolog predicates, commonly-available libraries, and a few
SWI-Prolog-specific extensions (mostly dealing with limiting the depth of search
trees).


It is interesting to directly compare the two implementations of unification
(one of the few modules which is semantically similar in both
implementations). Both systems implement the Robinson unification 
algorithm \mcitep{robinson1965machine}. The Javascript implementation consists of
136 lines of non-comment code and is quite difficult to follow; the Prolog 
implementation requires only 69 lines,
an almost 50\% reduction! This despite the fact that the Prolog implementation
does \emph{not} use the built-in unification to simplify the algorithm at all;
it would be largely the same if only traditional assignment were used.


Our future goal is for these two implementations to be built off the same 
underlying codebase, for example, by moving the client implementation towards
a full implementation of the Warren Abstract Machine \mcitep{warren1983abstract}. 
This would allow the 
Prolog code currently used in the server implementation to transparently run
in a JavaScript environment, whether the browser or one of the aforementioned
offline engines.



Module structure:
\begin{itemize}
\item explicit substitutions
\item specification execution
\item proof checking and elaboration
\item server, Pengines, etc.
\end{itemize}

\subsection{Proof representation}

Informally, the content of a derivation is relatively simple: a tree of 
\textsc{-Left} and/or \textsc{-Right} rule applications, drawn from the rules
of the reasoning logic in figure \ref{reasoning-rules}. In practice, a more
detailed representation is required, one which, in particular, necessates the
use of \emph{explicit substitution}, rather than the implicit substitution that
would result from naive use of (for example) the system unification in Prolog.

For derivations purely derived from the specifcation language, explicit 
substitution is not required. This is because for any valid derivation of a
proposition expressed in the specification language, the substitution is 
\emph{consistent} throughout the derivation tree; it is not possible for 
different ``branches'' to have different substitutions. (Disjunction may, of
course, result in the generation of multiple distinct derivations, each with its
own unifying substitution, but within any single derivation there is always a
single consistent substitution.)

However, because the reasoning language posesses rightward implication, a new
element is introduced to derivations: the list of \emph{antecedants}. Case
analysis on a disjunction antecedant is superficially similar applying the
$\land$-R rule, with an important distinction: the branches produced each have
\emph{independent} bindings. Consider, for example, case analysis on 
$\mathsf{nat}(X)$ in the course of a proof:
\[
\inference[]
    {X = z,     \mathsf{nat}(z)  |- \ldots &
     X = s(X'), \mathsf{nat}(X') |- \ldots}
................................................................................
inconsistent, and actually impossible to express, because Horn clauses forbid
such conjunctions as conclusions. Conversely, given the conclusion 
$X = z \lor X = s(x')$ we have
the choice of which unification to use, and thus there is no inconsistency.}

Because of this, each branch of a derivation must maintain its own substitution,
applied to its goals as they are expanded.










































\subsection{Capture avoidance}

The use of explicit substitution means that the proof-checker must also contend
with another troublesome problem which Prolog conceals: avoiding \emph{accidental
capture} when expanding a call to a goal. Consider the goal $\mathsf{nat}(X)$.
We would expect this goal to succeed twice, first with the solution $X = z$. 
................................................................................
$X \mapsto Y$ to $\forall Y, P(X,Y)$; clearly, one of the $Y$'s will need to 
be renamed. Since the $Y$ in the substitution is most likely bound somewhere
else in the derivation, we choose to rename the $Y$ bound within the $\forall$.

\subsection{Unification}

The Robinson unification algorithm is presented, in natural deduction style,
 in figure 
\ref{unification-rules}. Note that the same algorithm is used, with minor 
modification, in both the Javascript and Prolog implementations (for a direct
comparison of these two implementations, see comments below).

\begin{figure}[h!]
\[
\inference[Wildcard]
    {}
    {\_ \equiv t \mid \varepsilon}
\]
................................................................................
    {}
    {\mathtt{[]} \equiv \mathtt{[]} \mid \varepsilon}
\qquad\qquad
\inference[Cons]
    {t \equiv s \mid \theta_1  &  \hat{t} \theta_1 \equiv \hat{s} \theta_1 \mid \theta_2}
    {(t:\hat{t}) \equiv (s:\hat{s}) \mid \theta_2}
\]

\label{unification-rules}
\end{figure}

(This presentation is based in part on that in \mcitet{pfenning2006logic}.)












\subsection{Proof Checking}

The proof checker is implemented in approximately 450 lines of Prolog code. It
essentially implements the rules of the reasoning logic presented in figure
\ref{reasoning-rules}, by recursively checking the proof tree. That is, it
first checks the root node to ensure that the proof object is consistent with
the conclusion (premises and succedant) by ensuring that it has the correct
number and type of subproofs. The subproofs are then recursively checked,
working through the tree until the axiomatic rules are reached (or, in the 
case of an incomplete proof, a \texttt{hole} is found, indicating an unproved
subtree). 

There are two complications to a straightforward top-down recursive 
implementation:
................................................................................
\subsection{Proof Elaboration}

The heart of the incremental proof checking algorithm is the predicate
\texttt{elaborate}  (defined in file \texttt{checker.pl}). \texttt{elaborate}
works in conjunction with \texttt{check}, the proof checker itself. The purpose
of elaborate is to take an incomplete proof, together with a reference to one
of its leaves (i.e., to a $?$), and to ``elaborate'' it with respect to either
its succedant, or one of its premises. As described above, the form of the
selected element dictates the forms of the subproofs. \texttt{elaborate} 
``fills in'' these subproofs, to a single level only (i.e., all the subproofs
it constructs are themselves $?$). 

In a logic which did not include unification or the ability to call defined
predicates (atomic goals), we could elaborate any node of the tree, 
independent of the rest of the tree.
................................................................................

\begin{figure}[h!]

All left-rules instantiate the Proof to a term of the form
\[
\text{case}(\text{Type},N,\text{Keep},[\text{Proofs}\ldots])
\]
In the rules below we leave $N$ (the index in the list of antecedants of the 
judgment to be cased upon) implicit and omit Keep (a Boolean flag indicating 
whether the targeted antecedant should remain in the list, or be 
removed), and write them as
\[
\text{Type}(\text{Proofs}\ldots)
\]

\[
\inference[False-Left]
................................................................................
it has the form $P \land Q$, then the output proof will be elaborated to
\[
\proof{P \land Q}{\Gamma}{\text{product}([
    \proof{P}{\Gamma}{\text{hole}},
    \proof{Q}{\Gamma}{\text{hole}}
])}
\]
The subproofs of the product will have the correct succedants and premises,
but their own proofs will be empty, ready for further elaboration.


\subsection{Inductive reasoning}

Arend's reasoning logic supports proofs of a universal quantification both
generically and by \emph{induction}. Arend's induction is technically on the
height of derivations, although from the perspective of the user it supports
full structural induction on terms; this subsumes the usual natural number
induction. 

Arend supports only single-induction proofs (i.e., induction on
multiple premisses is not allowed) and supports only induction global to a 
proof (i.e., nested 
inductions are not allowed, although they can be ``faked'' by using lemmas). 
These restrictions imply that the induction hypothesis can be regarded as
being global to a proof, thus eliminating the need to restrict the scope 
of difference induction hypotheses to different branches of the proof tree.

Internally, induction is implemented by \emph{goal tagging}. When an inductive
proof is declared, a particular goal in the antecedants is selected, by the 
user, to be the target of the induction.\marginnote{For example, in a proof of
$\mathsf{nat}(X) |- \mathsf{add}(X,0,X)$ we would induct on $\mathsf{nat}(X)$.}
This goal must be a user goal; it cannot be a built-in operator such as 
conjunction, disjunction, or unification. The functor of the goal is internally
flagged as being ``big'' (indicated as $\uparrow$) and the induction 
hypothesis is defined in terms of the same goal, but flagged as 

``small'' ($\downarrow$). For example, to prove that $\mathsf{nat}(X) -> \mathsf{add}(X,0,X)$ our proof would proceed by induction on $\mathsf{nat}(X)$, and

thus we would have the inductive hypothesis
$\mathsf{nat}^\downarrow(X) -> \mathsf{add}(X,0,X)$.

When a $\uparrow$ goal is expanded by case analysis or backchaining,
 any sub-goals in its expansion
are flagged as $\downarrow$, indicating that they are ``smaller'' than the original
goal. The induction hypothesis can only be applied to goals which are ``small'',
................................................................................
being a flaw in our system, this surprising result reflects the fact that one
computational interpretation of $\bot$ is that of a non-terminating computation.
In fact, this result is simply the computational analogue of the well-known 
\emph{ex falso quodlibet} principle of classical logic.

\clearpage
\section{Future work}





\begin{itemize}
\item Arend's specification logic is simple enough that formalization of
    its properties should not be difficult, nonetheless, for formal correctness
    and consistency
    work needs to be done to show that it is both \emph{sound} (everything it
    proves true is true) and \emph{non-deterministically complete} (when it 
................................................................................
    conclusion. Since one of the difficulties of the derivation-tree proof 
    format is its often-extravagant use of horizontal space, it should be
    possible to automatically \emph{extract} a sub-proof as a lemma. This 
    could be done even when the lemma is not generally useful, but simply
    as a way of ``naming'' a portion of the proof or commenting on its 
    purpose. 

\item The implementaiton of Arend is primitively minimal, suitable only for 
    small-scale usage. Real-world usage would require integration with existing
    grading systems and automatic checks of student-submitted proofs. On a higher
    level we could easily imagine a set of features aimed at 
    creating an ecosystem around student-created specifications, proofs, and 
    lemmas, perhaps with collaboration features for multi-student use.

\item Although we believe Arend's derivation-tree presentation of proofs has
................................................................................
\subsection{Equational reasoning}

One very large extension to the logic which we would like to investigate would
be the incorporation of \emph{equational reasoning}. Arend's current concept
of equality is based entirely on unification: two terms are equal if they can 
be made equal by some substitution. Equational reasoning allows equality to 
be defined via rules, for example \marginnote{For computational purposes equational rules are often regarded as 
unidirectional \emph{rewite rules}: $a = b$ becomes $a \mapsto b$.}

\[
\inference[$=$-Refl]
    {}
    {x = x}
\qquad
\inference[$=$-Symm]
................................................................................
\inference[$\times$-dist-$+$]
    {}
    {a \times (b + c) = a \times b + a \times c}
\]

Equational reasoning is a particularly powerful mechanism for reasoning about
\emph{functional programs} and programming languages; we can state 
equivalencies between expressions and
then prove that two programs, or two classes of programs, are equivalent under
those rules. Even more powerfully, if we wish to prove some property of an 
entire language, it is sufficient to show that the equivalencies preserve that
property.

Although the additional of equational reasoning to Arend would give the system
a marked increase in logical power, it would also correspondingly increase the
complexity of its implementation and presentation. Equivalence rules require
a different treatment of terms and variables from unification, and it is not
entirely clear how the two will interact, logically. Equivalences over programs
................................................................................
logic variables, but can both contain, and be contained by, them. Although 
logics exist which provide support for reasoning about binding structures
\mcitep{miller05tocl} their usage in proofs involves additional complexity that
may be at odd with our goal of use in \emph{introductory} curriculum.

\clearpage
\section{Appendix I: Syntax  of the Specification Logic}
\label{spec-syntax}

\newcommand{\nterm}[1]{\mathop{\langle\text{#1}\rangle}}
\newcommand{\sterm}[1]{\mathop{\text{\emph{#1}}}}
\newcommand{\term}[1]{\mathop{\text{\texttt{``#1''}}}}

\newthought{The following is a simplified presentation} of the grammar of the 
specification
language. In particular, precedence rules for infix operators have been 
omitted, but are consistent with normal usage. Note that while the grammar 
includes rules defining infix arithmetic and comparison operators, these 
operators have no \emph{semantic} significance within the specification logic.
They are present under the assumption that specifications may want to use them,
and will expect them to have their normal precendence ordering.

\begin{align*}
\nterm{start} &<- \nterm{definitions}? \\
\\
\nterm{definitions} &<- \nterm{definition} \nterm{definitions}* \\
\nterm{definition}  &<- \sterm{Rulename}? \nterm{predicate} \\
                    &\mid \nterm{infix-decl} \\
................................................................................
                &\mid \term{(} \nterm{term} \term{)} \\
                &\mid \sterm{Atom} \\
                &\mid \sterm{Variable} \\
\\
\nterm{compound} &<- \sterm{Atom} \term{(} \nterm{termlist} \term{)} \\
\end{align*}

\subsection{Operational Semantics}\label{spec-execute}

This may or may not get written. If it does, it will be based on the 
stack-based semantics of \mcitet{pfenning2006logic}.

\clearpage
\section[Appendix II: Specification for N, lists]{Appendix II: Specification for $\mathbb{N}$, lists}
\label{sec:nat-spec}

\newthought{The following is an example specification} included with Arend, 
one which 







>
>
>
>
>
>







 







|







 







|
>
>







 







|
|
>







 







|

|

|
|







 







|
<
<





>












|







 







|







 







|
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|


|










|
>
>
>
>

<
<
>
>
>
>
>
>
>
>
>
>
>
>
|
<
<
<
<
>
>
>
>
>
>
>
>
>
>


<
<
<
>
>
>
>
>

<
>
>
>
>
>
>
>
>
>
>
>
>

>
>
>
>
>
>
>
>
|

<
<
>
>
>
>
>
>
>
>
>
>

<
>
>
>
>
>
>
>
|
<
>
>

<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

<
>
>
>
>
>
>
>
>
>
>
>
>
>
>

<
>
>
>
>
>
>
>
>
>
>
>
>
>

<
>
>
>
>
>
>

<
<
>
>
>
>
>
>

<
<
<
<
<
<
<
>
>
>
>
>
>


<
|

<
<
<
<
>
>
>
>
>
>
>

<
>
>
>
>
>
>
>



|
|
>

<
|
|
|
|
|
|
|
|
>

<
|
|
|
|
<
<

>
>
|
|
<
<
<
<







|



|








|
|
|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
|

|







 







>




>
>
>
>
>
>
>
>
>
>








|







 







|







 







|

|







 







|












|







|






>
|
>







 







>
>
>
>







 







|







 







|







 







|


|







 







|












|







 







|

|
|







69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
...
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
...
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
...
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
...
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
...
791
792
793
794
795
796
797
798


799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
...
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
...
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
...
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007


1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020




1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032



1033
1034
1035
1036
1037
1038

1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061


1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072

1073
1074
1075
1076
1077
1078
1079
1080

1081
1082
1083

1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103

1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118

1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132

1133
1134
1135
1136
1137
1138
1139


1140
1141
1142
1143
1144
1145
1146







1147
1148
1149
1150
1151
1152
1153
1154

1155
1156




1157
1158
1159
1160
1161
1162
1163
1164

1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178

1179
1180
1181
1182
1183
1184
1185
1186
1187
1188

1189
1190
1191
1192


1193
1194
1195
1196
1197




1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
....
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
....
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
....
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
....
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
....
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
....
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
....
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
....
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
....
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
....
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
....
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
....
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
% All premises/conclusions in inference rules should be in math mode
\renewcommand{\predicate}[1]{$ #1 $}

% Setup fonts.
\setmainfont[Mapping=tex-text,Numbers=OldStyle]{Junicode}
\setsansfont[Mapping=tex-text,Numbers=OldStyle,Scale=MatchLowercase]{Gill Sans MT}
\setmonofont[Mapping=tex-text,Scale=MatchLowercase]{Fantasque Sans Mono}

% The \lnum{} command displays numbers normally (not old style). The font
% setting above sets them to old-style by default, and you cannot "turn off"
% a font feature, only create a new font with the feature turned off. 
\newfontface\liningnum{Junicode}
\newcommand{\lnum}[1]{{\liningnum #1}}

% Hyperlinks. We might want these inline, or in the sidebar.
\newcommand{\link}[2]{#2\footnote{\url{#1}}}

% Citations. Both in the text and in the sidebar. The NoHyper makes the actual
% citation link to the bibliography at the end, rather than to the bibentry
% in the margin. It also has the unfortunate side effect of making any URLs
................................................................................
    proof-trees, as the traditional paragraph-proof notation tends to obscure
    the fact that a proof is composed of subproofs. 

\item Because proofs are hierarchical, their construction is very similar to 
    that of other kinds of programming. A proof is iteratively broken down into
    smaller parts, lemmas can be used, like functions, to abstract out 
    commonly-used subproofs, or to break down proofs that would otherwise be
    too large and unwieldy to understand. 

\item The lack of traditional negation eliminates some of the more-confusing 
    elements of traditional logic programming. In particular, Arend has nothing
    corresponding to ``negation as failure''; it is not possible in Arend to 
    know, in a computational sense, when a proposition $P$ fails. Thus reasoning
    in Arend is entirely about things that are known to be true, never about 
    things that are assumed to be false.
................................................................................
\item \emph{Proof assistants} aim to aid in the development and checking of
    formal proofs. While some systems only check proofs for completeness, more
    modern systems will typically also aid the user in the development of a
    proof. Recent examples 
    include Coq \mcitep{Coq:manual}, Twelf \mcitep{pfenning1999system}, and 
    Abella \mcitep{gacek08ijcar} Several element of Arend's design
    were influenced by Abella, most notably the use of a constructive two-level
    logic for describing and reasoning about systems. For a full history of the 
    development of computer-assisted proof systems, see 
    \mcitet{geuvers2009proof}.

\item More recently, several systems have emerged which try to bridge the gap
    between traditional functional programming and theorem proving. Systems
    such as Agda \mcitep{norell:thesis}, Idris \mcitep{brady2011idris}, 
    and Beluga \mcitep{pientka2010beluga} are based on the
    Curry-Howard isomorphism (see below) and thus represent propositions as 
    types, and proofs as functional programs. These systems offer varying 
................................................................................
    support for visualizing proof structure, its consistency, and in the 
    degree of ``assistance'' it gives to the user.

\item \mcitet{suppes1981future} offers an interesting perspective, that of an
    instructor of mathematics, not computer science, at Stanford University,
    and furthermore, one who had, at the time of this publication, been using
    interactive theorem proving in undergraduate coursework for almost twenty
    years. Suppes raises several points of concern which are still relevant, 
    most notably the conflict between providing a system which is 
    ``intelligent''
    and, hence, easy to use, and one which meets the pedagogical goals of 
    teaching \emph{introductory} material (the sort of material which a more
    intelligent proof assistant would tend to elide). Thus, although Arend is
    actually capable of automatically proving a reasonable large set of
    propositions, during proof construction it intentionally refrains from 
    doing so, thus forcing the user to carry out --- and, hopefully, to 
    learn --- the elementary steps of proof construction.
................................................................................
\inference[Assume-A]
    {}
    {A}
\]
However, this assumption only applies within the ``scope'' of the subproof of 
$A -> B$. 

In order to express this, we introduce the \emph{hypothetical judgments} using
the symbol $|-$. 
$A_1, A_2, \ldots A_n |- C$ should be read ``$C$ is true \emph{assuming} $A_1$, etc.'' We call the $A_i$ the \emph{antecedents} of the judgment, and 
$C$ the \emph{consequent}. (Although not strictly correct, we will sometimes
refer to the antecedents and consequent of a \emph{rule}, since a rule can have
at most one hypothetical judgment as its conclusion.)
\footnote{Note that $|-$ is \emph{not} a logical connective; in particular, it
is not valid to say, e.g., $A \land (B |- C)$.} We use $\Gamma$ to signify any
collection of $P_i$. To enable the use of $|-$ in
our derivations we add the following axiom, known as the \emph{assumption rule}:
\[
\inference[Assume]
    {}
................................................................................
derivations produced by queries are a restricted form of the derivations 
constructed as proofs. Thus, queries against the specification can serve as an
introduction to the creation of simple proofs. 

Since Arend's specification logic is a restricted form of the Horn-clause 
classical logic
used in traditional Prolog, a resolution-style proof search procedure is 
sufficient to execute queries against a specification. 


Note that in keeping with its intuitionistic foundation, Arend has no negation
operator. (Intuitionistically, $\neg P$ can be defined as $P \rightarrow \bot$,
but Arend's specification logic lacks the rightward $\rightarrow$ implication 
operator. This operator \emph{is} present in the reasoning logic, which thus
requires a more sophisticated handling.)
% spec-execute -- Mention operational semantics here.

\subsection{Reasoning logic}

\newthought{Arend's reasoning logic} is closer to a full 
expression of first-order
intuitionistic logic, with extensions to support proofs by single-induction. 
The inference rules defining the reasoning logic are 
presented in Figure \ref{reasoning-rules}. It supports rightward implication 
($\rightarrow$), with the restriction that the premise cannot contain nested 
implications, only conjunctions, disjunctions, and atomic 
goals.\footnote{This restriction subsumes the \emph{stratification} 
restriction in Abella; stratification in some form is required to ensure 
monotonicity of the logic.} This implies that antecedents cannot contain
implications, since they cannot be added by $\rightarrow_R$, and are not allowed
in definitions in the specification logic.

%%% ------------------- Reasoning logic rules --------------------
\begin{figure}[!p]

% Assumption rule
................................................................................
% Implication rule
\[
\inference[${->}_R$]
    {\Gamma,P |- Q}
    {\Gamma |- P -> Q}
\qquad\qquad
\text{\small\parbox{5cm}{(There is no ${->}_L$ rule, because implications are 
not allowed in antecedents.)}}
\]

\vspace{1em}

% Forall rules
\[
\inference[$\forall_R$]
................................................................................
    derivation (because this query has no free variables, no substitution is
    displayed). The rules of the specification (the $\mathbb{N}$ specification,
    given in full in \hyperref[sec:nat-spec]{appendix II}) are displayed below
    the input line.

\item The interactive \emph{proof assistant} allows the user to construct
    proofs for given propositions, in the context of a particular 
    specification. Figure~\ref{fig:passist1} illustrates the proof assistant
    interface: the rules of the current specification are displayed in the 
    pane on the left; since the current proof is inductive, the inductive
    hypothesis is included. (Lemmas, once proved, are also displayed as 
    rules.) 

    The right pane displays the current proof statement, and the proof,
    here, in progress. The user can double-click on any antecedent, or any goal,
    to perform the default elaboration for that element. For example, 
    double-clicking an antecedent $P \land Q$ would apply the $\land$-L rule,
    producing a subproof with antecedents $P,Q$. For situations in which 
    there is more than one possible action (e.g., the $\lor$-R rules, or when
    backchaining against the IH), keyboard interaction is used to select the
    appropriate action. 

\end{itemize}


\begin{figure*}[ht!]

\begin{centering}
................................................................................
\end{centering}

\caption{The browser-based run-eval-print-loop interface}
\label{fig:repl}
\end{figure*}


\begin{figure*}[t!]

\begin{centering}
\includegraphics[width=6.5in]{proof-sample.png}
\end{centering}

\caption{The proof assistant interface, with an incomplete inductive proof}
\label{fig:passist1}
\end{figure*}


\clearpage
\section{Implementation}

\newthought{Arend is implemented as} a web-based system, with a server 
component, written in Prolog and running in the 
\link{http://swi-prolog.org}{SWI-Prolog} and a browser-based frontend. 
Currently, the implementation of Arend consists of

\begin{itemize}


\item \lnum{1,401} lines of Prolog
\item \lnum{6,198} lines of Javascript (of which \lnum{442} lines are test
    code)
\item \lnum{493} lines of PEG grammar specification
\item \lnum{501} lines of HTML
\item \lnum{129} lines of CSS
\item \lnum{41} source code files in total
\end{itemize}

The following libraries and applications are used in the development of Arend;
these will be described in detail in the following section.
\begin{itemize}
\item \link{https://nodejs.org/}{Node.js}




\item \link{http://www.swi-prolog.org/}{SWI-Prolog}
\item \link{http://pengines.swi-prolog.org/docs/index.html}{Pengines}
\item \link{https://jquery.com/}{jQuery} --- General utilities for 
    Javascript in a browser environment
\item \link{https://lodash.com/}{Lodash} --- Collection utilities for Javascript
\item \link{http://pegjs.org/}{PEG.js} --- Parsing Expression Grammar parser
    generator for Javascript.
\item \link{https://qunitjs.com/}{QUnit} --- Javascript test framework
\item \link{http://www.jscheck.org/}{JSCheck} --- Randomized testing
    engine for Javascript.
\end{itemize}




Arend's development is tracked using the \link{http://fossil-scm.org}{Fossil}
source control management system. As of \lnum{April 18, 2015,} the project
history consisted of \lnum{294} commits spanning eight months of development.
Arend, its source code and project history, can be found on the web at
\url{http://fossil.twicetwo.com/arend.pl}.


\subsection{Automated testing}
Arend's Javascript code is run through a test suite consisting of \lnum{133}
automated tests, however, of these, \lnum{12} central tests are randomized,
running, by default, \lnum{20} randomly generated tests each. Thus, a total
of \lnum{361} individual tests are run. The test suite can be run in the
browser, or offline, via Node.js. Testing is handled via the 
\link{https://qunitjs.com/}{QUnit} test framework; a small compatibility
layer was written to allow QUnit tests to run offline. Randomized testing is
provided by \link{http://www.jscheck.org/}{JSCheck} a ``port'' of the
Haskell QuickCheck framework to Javascript. We have extended JSCheck with
support for randomized generation of Arend data types: atoms, variables, 
and ground and non-ground terms up to a limited depth. 

QUnit is designed for automated testing in a browser environment. Arend's 
development, as far as is possible, tries to target both browser and offline
environments; this is true even for Javascript code. Thus, we wrote ``QNodeit'',
a small compatibility layer that allows the complete suite of QUnit tests
to run offline, via Node.js. All non-user-interface tests run successfully in
both the browser and offline environments. QNodeit also integrates JSCheck into
QUnit, allowing JSCheck's randomized tests to be used naturally within QUnit.

\subsection{Client-side implementation}



Although Arend's client-side code only serves to provide a user-interface to 
the backend's proof-checking and manipulation engine, it contains a significant
amount of logic itself. Early in Arend's development we felt that the best
course would be to implement the entire proof checking engine in Javascript,
thus allowing proof-handling with no server at all. Although this did not
prove feasible, a significant amount of logic-handling code in Javascript was
written, and, so far from being a redundancy, this has proved to be an asset.
Arend's client is not ``dumb'', but in fact understands a great deal of the 
structure of the proofs it is presenting. This allows for richer user-interface
possibilities, and more flexible coordination between front- and back-end.


The \link{https://lodash.com/}{Lodash} Javascript library provides a set of 
general utilities, 
mostly aimed at manipulation of collections (arrays and objects) and enabling
a functional style of programming; Lodash is used extensively throughout
Arend. Lodash makes its facilities available
as methods of a global \texttt{\_} object. Thus, filtering out the odd elements
of an array in ``stock'' Lodash would take the form
\begin{verbatim}

_.filter([1,2,3,4], function(e) { return e % 2 == 0; });
\end{verbatim}


We have created a wrapper library around Lodash called ``Nodash'' which 
integrates Lodash's utility methods into the global prototypes of the datatypes
which they operate on. Thus, for example, the \texttt{filter} method, which
can be applied to any ``collection'' -- array, object, or string --- would
be installed on the global \texttt{Array.prototype}, \texttt{Object.prototype},
and \texttt{String.prototype} objects, so that it is accessible simply as
\begin{verbatim}
[1,2,3,4].filter(function(e) { return e % 2 == 0; });
\end{verbatim}
\marginnote[-1in]{Historically, ``monkey-patching'' the global prototypes was 
considered highly unsafe, as new properties would be ``enumerable'' and would
thus become visible in, for example, \texttt{for-in} loops over the properties
of \emph{any} object. However, all modern Javascript implementations support
the \texttt{defineProperty} method, which allows the creation of 
\emph{non-enumerate} properties on objects which do not have this problem. 
Nodash uses \texttt{defineProperty} to safely add Lodash's methods to the
global prototypes.}
Note that in Nodash the global \texttt{\_} object is still available, for 
those methods which do not fit with any of the global prototypes.


The client-side implementation includes a complete parser for the 
specification language. This allows user input to be fully syntax-checked 
before being sent to the server, and allows for faster feedback to the user
when syntax errors occur. The parser is written using 
\link{http://pegjs.org/}{PEG.js}, a parsing expression 
 grammar (PEG) \mcitep{Ford:2004:PEG:964001.964011} parser-generator. The
specification grammar in PEG form consists of \lnum{28} non-terminals and 
\lnum{53} terminal tokens. Of note is the fact that the grammar uses the 
standard
Unicode character classes in its definition of ``identifier'' and ``operator'';
an operator, in particular, is defined as any sequence of characters from the
symbolic or punctuation classes\footnote{Classes Sc, Sm, Sk, So, Pc, Pd, and 
Po.}. This allows for traditional mathematical operators such as $\in$ or 
$\leq$ to be used directly in specifications.


Other modules of note in the client-side implementation include:
\begin{itemize}
\item \texttt{core} --- contains a small
    amount of compatibility code that allows the other non-user-interface 
    modules to operate transparently in either the browser or Node. In 
    particular, it provides a browser implementation of the \texttt{require()} 
    function in Node, used to load modules. In the browser, these modules must 
    have already been loaded via standard \texttt{<script>} tags (i.e., dynamic 
    loading is not provided), but, once loaded, they are installed into a 
    global module repository; \texttt{require} then simply returns a reference 
    to the appropriate module. This allows Javascript code to transparently 
    access other modules, without knowing whether they are being dynamically 
    loaded within Node, or have already been loaded in the browser.


\item \texttt{terms} --- contains a complete representation of \emph{terms},
    the fundamental datatype of logic programming. The \texttt{terms} module
    supports walking the structure of terms, converting Prolog-style
    term-lists to Javascript Arrays and vice versa, rendering terms to either
    specification or Prolog-compatible strings, and enumerating the variables
    in a term.



\item \texttt{unify} --- contains a fully-functional implementation of the
    Robinson unification algorithm \mcitep{robinson1965machine}. Although
    this module is exhaustively tested, it is currently minimally used. In the
    future, we hope to build a pattern-matching utility library on it, to allow
    for more natural examination and manipulation of term-structures on the 
    client-side.








\item \texttt{term\_render} --- contains the rendering engine for converting
    terms, rules, and derivations to HTML structures. Note that derivation
    rendering is extensible via a number of ``hook'' functions, which allow
    client code to extend or replace the rendering of the various 
    proof components: antecedents, consequents, disjunctive consequents, 
    and consequents which could be the target of the inductive hypothesis.
\end{itemize}


\subsection{Server-side implementation}





The core of Arend's proof-checking engine is written in Prolog, and is made
available to the client via a HTTP interface. SWI-Prolog provides both a
standard HTTP server module, as well as a specialized ``app engine'' module
called \textsc{Pengines}, both of which are used in Arend. HTML, CSS, and 
Javascript files for the client-side interface are served via the standard
HTTP server, while two PEngine ``applications'', \texttt{repl} and 
\texttt{passist} provide the interface to the proof-checker itself.


PEngines provides a transparent interface between Javascript code running
in the browser, and Prolog code running on the server. With it, our 
client-side code can directly execute queries against the exported 
predicates of the two aforementioned applications. The results (success,
failure, and substitution) of those queries can then be enumerated. Terms
are encoded as JSON objects (our \texttt{terms} module can decode JSON
terms into its own types). 

It should be noted that the Prolog implementation of Arend is mostly portable,
relying only on ISO-Prolog predicates, commonly-available libraries, and a few
SWI-Prolog-specific extensions and modules (most notably the HTTP server
module). It would not be difficult to port the proof-checker itself to another
Prolog system.


The Prolog core proof-checker implementation consists of the following modules,
the functionality of which will be described in detail in the following 
sections:
\begin{itemize}
\item \texttt{subst} --- provides support for working with \emph{explicit
    substitutions}. As described below, Arend cannot use Prolog's own 
    substitution, as the substitutions that are applied to a proof object
    may differ in subtrees; Prolog applies a substitution globally. Thus, we
    re-implement both variables and substitutions for our own use.


\item \texttt{program} --- provides support for expanding atomic goals in the 
    specification language, and for executing goals in the specification. 
    The \texttt{run/3} predicate produces proof objects compatible with those
    produced by the full proof checker, but is an independent implementation.



\item \texttt{checker} --- the heart of the proof checking system, provides
    routines for dealing with proof objects, elaborating incomplete proofs,
    and checking and generating proof objects corresponding to particular
    statements.




\end{itemize}

\subsection{Proof representation}

Informally, the content of a derivation is relatively simple: a tree of 
\textsc{-Left} and/or \textsc{-Right} rule applications, drawn from the rules
of the reasoning logic in figure \ref{reasoning-rules}. In practice, a more
detailed representation is required, one which, in particular, necessitates the
use of \emph{explicit substitution}, rather than the implicit substitution that
would result from naive use of (for example) the system unification in Prolog.

For derivations purely derived from the specification language, explicit 
substitution is not required. This is because for any valid derivation of a
proposition expressed in the specification language, the substitution is 
\emph{consistent} throughout the derivation tree; it is not possible for 
different ``branches'' to have different substitutions. (Disjunction may, of
course, result in the generation of multiple distinct derivations, each with its
own unifying substitution, but within any single derivation there is always a
single consistent substitution.)

However, because the reasoning language possesses rightward implication, a new
element is introduced to derivations: the list of \emph{antecedents}. Case
analysis on a disjunction antecedent is superficially similar applying the
$\land$-R rule, with an important distinction: the branches produced each have
\emph{independent} bindings. Consider, for example, case analysis on 
$\mathsf{nat}(X)$ in the course of a proof:
\[
\inference[]
    {X = z,     \mathsf{nat}(z)  |- \ldots &
     X = s(X'), \mathsf{nat}(X') |- \ldots}
................................................................................
inconsistent, and actually impossible to express, because Horn clauses forbid
such conjunctions as conclusions. Conversely, given the conclusion 
$X = z \lor X = s(x')$ we have
the choice of which unification to use, and thus there is no inconsistency.}

Because of this, each branch of a derivation must maintain its own substitution,
applied to its goals as they are expanded.

Proofs are represented as a term of the form
\begin{verbatim}
proof(Goal,Ctx,Subst,Proof)
\end{verbatim}
where \texttt{Goal} is the goal to be proved, \texttt{Ctx} is the list of
antecedents (initially empty for most goals), \texttt{Subst} is the 
substitution that results if the goal is proved, and \texttt{Proof} is a 
proof term for this particular goal. 

The proof terms vary, depending on the structure of the goal and the contents
of the context. Non-axiomatic proof terms include one or more 
sub-proofs. The possible proof terms are
\begin{itemize}
\item \texttt{induction(N,Proof)} --- proves a $\forall$ inductively, on
     the N-th element of the context.
\item \texttt{ih(Proof)} --- Proves an atomic goal by backchaining it against 
    the inductive hypothesis.
\item \texttt{generic(Proof)} --- Proves a $\forall$ generically (i.e., by 
    substituting a unique constant for the quantified variable).
\item \texttt{instan(V,Proof)} --- Proves a $\exists$ by giving a value for 
    the quantified variable.
\item \texttt{product([Proofs...])} --- Proves a conjunction by providing a
    subproof for each conjunct.
\item \texttt{choice(N,Proof)} --- Proves the N-th branch of a disjunction.
\item \texttt{implies(Proof)} --- Proves an implication, by adding the 
    left-hand-side to the context as an assumption and then proving the 
    right-hand-side.
\item \texttt{ctxmem(N)} --- Implements the ``assumption rule''; i.e., proves
    an atomic goal by unifying it with the N-th element of the context.
\item \texttt{backchain(Proof)} --- Proves an atomic goal by backchaining it;
    i.e., by expanding it into its definition, the disjunction of its clauses.
\item \texttt{unify(A,B)} --- Proves a goal of the form \texttt{A = B}
    (this has the side-effect of adding the substitution produced by 
    $A = B$ to the \texttt{Subst} for this proof subtree).
\item \texttt{case(T,N,Keep,[Proofs...])} --- Performs case-analysis on an 
    element of the context. The type \texttt{T} corresponds to the 
    various -Left rules of the reasoning logic.
\item \texttt{trivial} --- Proves $\top$.
\item \texttt{hole} --- Proves any goal, but represents an incomplete proof.
\end{itemize}

\subsection{Capture avoidance}

The use of explicit substitution means that the proof-checker must also contend
with another troublesome problem which Prolog conceals: avoiding \emph{accidental
capture} when expanding a call to a goal. Consider the goal $\mathsf{nat}(X)$.
We would expect this goal to succeed twice, first with the solution $X = z$. 
................................................................................
$X \mapsto Y$ to $\forall Y, P(X,Y)$; clearly, one of the $Y$'s will need to 
be renamed. Since the $Y$ in the substitution is most likely bound somewhere
else in the derivation, we choose to rename the $Y$ bound within the $\forall$.

\subsection{Unification}

The Robinson unification algorithm is presented, in natural deduction style,
 in figure~\ref{unification-rules}. 
 Note that the same algorithm is used, with minor 
modification, in both the Javascript and Prolog implementations (for a direct
comparison of these two implementations, see our comments below).

\begin{figure}[h!]
\[
\inference[Wildcard]
    {}
    {\_ \equiv t \mid \varepsilon}
\]
................................................................................
    {}
    {\mathtt{[]} \equiv \mathtt{[]} \mid \varepsilon}
\qquad\qquad
\inference[Cons]
    {t \equiv s \mid \theta_1  &  \hat{t} \theta_1 \equiv \hat{s} \theta_1 \mid \theta_2}
    {(t:\hat{t}) \equiv (s:\hat{s}) \mid \theta_2}
\]
\caption{Rules for the Robinson unification algorithm}
\label{unification-rules}
\end{figure}

(This presentation is based in part on that in \mcitet{pfenning2006logic}.)

It is interesting to directly compare the two implementations of unification
(one of the few modules which is semantically similar in both
implementations). Both systems implement the same algorithm. The Javascript 
implementation consists of
136 lines of non-comment code and is quite difficult to follow; the Prolog 
implementation requires only 69 lines,
an almost 50\% reduction! This despite the fact that the Prolog implementation
does \emph{not} use the built-in unification to simplify the algorithm at all;
it would be largely the same if only traditional assignment were used.


\subsection{Proof Checking}

The proof checker is implemented in approximately 450 lines of Prolog code. It
essentially implements the rules of the reasoning logic presented in figure
\ref{reasoning-rules}, by recursively checking the proof tree. That is, it
first checks the root node to ensure that the proof object is consistent with
the conclusion (antecedents and consequent) by ensuring that it has the correct
number and type of subproofs. The subproofs are then recursively checked,
working through the tree until the axiomatic rules are reached (or, in the 
case of an incomplete proof, a \texttt{hole} is found, indicating an unproved
subtree). 

There are two complications to a straightforward top-down recursive 
implementation:
................................................................................
\subsection{Proof Elaboration}

The heart of the incremental proof checking algorithm is the predicate
\texttt{elaborate}  (defined in file \texttt{checker.pl}). \texttt{elaborate}
works in conjunction with \texttt{check}, the proof checker itself. The purpose
of elaborate is to take an incomplete proof, together with a reference to one
of its leaves (i.e., to a $?$), and to ``elaborate'' it with respect to either
its consequent, or one of its antecedents. As described above, the form of the
selected element dictates the forms of the subproofs. \texttt{elaborate} 
``fills in'' these subproofs, to a single level only (i.e., all the subproofs
it constructs are themselves $?$). 

In a logic which did not include unification or the ability to call defined
predicates (atomic goals), we could elaborate any node of the tree, 
independent of the rest of the tree.
................................................................................

\begin{figure}[h!]

All left-rules instantiate the Proof to a term of the form
\[
\text{case}(\text{Type},N,\text{Keep},[\text{Proofs}\ldots])
\]
In the rules below we leave $N$ (the index in the list of antecedents of the 
judgment to be cased upon) implicit and omit Keep (a Boolean flag indicating 
whether the targeted antecedent should remain in the list, or be 
removed), and write them as
\[
\text{Type}(\text{Proofs}\ldots)
\]

\[
\inference[False-Left]
................................................................................
it has the form $P \land Q$, then the output proof will be elaborated to
\[
\proof{P \land Q}{\Gamma}{\text{product}([
    \proof{P}{\Gamma}{\text{hole}},
    \proof{Q}{\Gamma}{\text{hole}}
])}
\]
The subproofs of the product will have the correct consequents and antecedents,
but their own proofs will be empty, ready for further elaboration.


\subsection{Inductive reasoning}

Arend's reasoning logic supports proofs of a universal quantification both
generically and by \emph{induction}. Arend's induction is technically on the
height of derivations, although from the perspective of the user it supports
full structural induction on terms; this subsumes the usual natural number
induction. 

Arend supports only single-induction proofs (i.e., induction on
multiple antecedents is not allowed) and supports only induction global to a 
proof (i.e., nested 
inductions are not allowed, although they can be ``faked'' by using lemmas). 
These restrictions imply that the induction hypothesis can be regarded as
being global to a proof, thus eliminating the need to restrict the scope 
of difference induction hypotheses to different branches of the proof tree.

Internally, induction is implemented by \emph{goal tagging}. When an inductive
proof is declared, a particular goal in the antecedents is selected, by the 
user, to be the target of the induction.\marginnote{For example, in a proof of
$\mathsf{nat}(X) |- \mathsf{add}(X,0,X)$ we would induct on $\mathsf{nat}(X)$.}
This goal must be a user goal; it cannot be a built-in operator such as 
conjunction, disjunction, or unification. The functor of the goal is internally
flagged as being ``big'' (indicated as $\uparrow$) and the induction 
hypothesis is defined in terms of the same goal, but flagged as 
``small'' ($\downarrow$). For example, to prove that 
$\mathsf{nat}(X) -> \mathsf{add}(X,0,X)$ our proof would proceed 
by induction on $\mathsf{nat}(X)$, and
thus we would have the inductive hypothesis
$\mathsf{nat}^\downarrow(X) -> \mathsf{add}(X,0,X)$.

When a $\uparrow$ goal is expanded by case analysis or backchaining,
 any sub-goals in its expansion
are flagged as $\downarrow$, indicating that they are ``smaller'' than the original
goal. The induction hypothesis can only be applied to goals which are ``small'',
................................................................................
being a flaw in our system, this surprising result reflects the fact that one
computational interpretation of $\bot$ is that of a non-terminating computation.
In fact, this result is simply the computational analogue of the well-known 
\emph{ex falso quodlibet} principle of classical logic.

\clearpage
\section{Future work}

There are many areas in which Arend could be enhanced and extended. We 
examine ten possibilities here, including one, equational reasoning, in some
depth.

\begin{itemize}
\item Arend's specification logic is simple enough that formalization of
    its properties should not be difficult, nonetheless, for formal correctness
    and consistency
    work needs to be done to show that it is both \emph{sound} (everything it
    proves true is true) and \emph{non-deterministically complete} (when it 
................................................................................
    conclusion. Since one of the difficulties of the derivation-tree proof 
    format is its often-extravagant use of horizontal space, it should be
    possible to automatically \emph{extract} a sub-proof as a lemma. This 
    could be done even when the lemma is not generally useful, but simply
    as a way of ``naming'' a portion of the proof or commenting on its 
    purpose. 

\item The implementation of Arend is primitively minimal, suitable only for 
    small-scale usage. Real-world usage would require integration with existing
    grading systems and automatic checks of student-submitted proofs. On a higher
    level we could easily imagine a set of features aimed at 
    creating an ecosystem around student-created specifications, proofs, and 
    lemmas, perhaps with collaboration features for multi-student use.

\item Although we believe Arend's derivation-tree presentation of proofs has
................................................................................
\subsection{Equational reasoning}

One very large extension to the logic which we would like to investigate would
be the incorporation of \emph{equational reasoning}. Arend's current concept
of equality is based entirely on unification: two terms are equal if they can 
be made equal by some substitution. Equational reasoning allows equality to 
be defined via rules, for example \marginnote{For computational purposes equational rules are often regarded as 
unidirectional \emph{rewrite rules}: $a = b$ becomes $a \mapsto b$.}

\[
\inference[$=$-Refl]
    {}
    {x = x}
\qquad
\inference[$=$-Symm]
................................................................................
\inference[$\times$-dist-$+$]
    {}
    {a \times (b + c) = a \times b + a \times c}
\]

Equational reasoning is a particularly powerful mechanism for reasoning about
\emph{functional programs} and programming languages; we can state 
equivalences between expressions and
then prove that two programs, or two classes of programs, are equivalent under
those rules. Even more powerfully, if we wish to prove some property of an 
entire language, it is sufficient to show that the equivalences preserve that
property.

Although the additional of equational reasoning to Arend would give the system
a marked increase in logical power, it would also correspondingly increase the
complexity of its implementation and presentation. Equivalence rules require
a different treatment of terms and variables from unification, and it is not
entirely clear how the two will interact, logically. Equivalences over programs
................................................................................
logic variables, but can both contain, and be contained by, them. Although 
logics exist which provide support for reasoning about binding structures
\mcitep{miller05tocl} their usage in proofs involves additional complexity that
may be at odd with our goal of use in \emph{introductory} curriculum.

\clearpage
\section{Appendix I: Syntax  of the Specification Logic}
\label{sec:spec-syntax}

\newcommand{\nterm}[1]{\mathop{\langle\text{#1}\rangle}}
\newcommand{\sterm}[1]{\mathop{\text{\emph{#1}}}}
\newcommand{\term}[1]{\mathop{\text{\texttt{``#1''}}}}

\newthought{The following is a simplified presentation} of the grammar of the 
specification
language. In particular, precedence rules for infix operators have been 
omitted, but are consistent with normal usage. Note that while the grammar 
includes rules defining infix arithmetic and comparison operators, these 
operators have no \emph{semantic} significance within the specification logic.
They are present under the assumption that specifications may want to use them,
and will expect them to have their normal precedence ordering.

\begin{align*}
\nterm{start} &<- \nterm{definitions}? \\
\\
\nterm{definitions} &<- \nterm{definition} \nterm{definitions}* \\
\nterm{definition}  &<- \sterm{Rulename}? \nterm{predicate} \\
                    &\mid \nterm{infix-decl} \\
................................................................................
                &\mid \term{(} \nterm{term} \term{)} \\
                &\mid \sterm{Atom} \\
                &\mid \sterm{Variable} \\
\\
\nterm{compound} &<- \sterm{Atom} \term{(} \nterm{termlist} \term{)} \\
\end{align*}

%\subsection{Operational Semantics}\label{spec-execute}

%This may or may not get written. If it does, it will be based on the 
%stack-based semantics of \mcitet{pfenning2006logic}.

\clearpage
\section[Appendix II: Specification for N, lists]{Appendix II: Specification for $\mathbb{N}$, lists}
\label{sec:nat-spec}

\newthought{The following is an example specification} included with Arend, 
one which