Arend's specification syntax is based on Prolog, but is more flexible, with fewer predefined operators.
#Lexical syntax
Arend's lexical syntax is largely defined in terms of Unicode character categories. Only a few characters are treated specially.
Whitespace is ignored, but required to separate some tokens. Comments normally count as whitespace.
Comments come in two forms:
// Single-line comment
/* Multi-line comment */
Tokens are broken into three groups:
Atomic tokens start with a lowercase alphabetic (Ll or Lo) or numeric (Nd, Nl, or No) character, and are followed by any contiguous sequence of zero or more non-separator (see below) characters. As a special case, the empty list
[]
is considered to be an atomic token.Variable tokens start with an uppercase (Lu) alphabetic character or an underscore, and are followed by any contiguous sequence of zero or more non-separator (see below) characters.
Symbolic tokens are those that start with a character in one of the following classes: Sc, Sm, Sk, So, Pc, Pd, Po. This includes most symbolic and punctuation characters. This can be followed by any contiguous sequence of zero or more non-separator characters. (Grammatically, all symbolic tokens are assumed to be infix operators.)
Separator tokens are the following:
:-
(
)
[
]
{
}
:
,
and.
This list may be extended to include all bracketing (Ps, Pe) and paired quoting (Pi, Pf) characters.
A raw token is any other type of token enclosed in backquotes. A raw token is exactly equivalent to the unquoted token, except that for symbolic tokens it is not assumed to be infix. Thus, the following are equivalent (assuming the term-based parsing described below):
a + b
`+`(a,b)
#Grammatical syntax
Although Arend's specification language has only a handful of built-in operations, it's syntax includes support for a number of common extensions, under the assumption that specifications will want to use them in their natural forms. Thus, e.g., arithmetic operators are parsed according to the usual precedence, even though there are no built-in arithmetic operations.
The specification parser outputs a list of terms, from the terms
module. Atomic and variable tokens are translated into Atoms
and Variables
, respectively. Compound terms of the form head(args...)
or the infix form lhs OP rhs
are translated into Compounds
; for infix operators, the OP
becomes the head.
Bracket-enclosed lists are parsed specially. The internal syntax for lists is
a:b:c:[]
, being equivalent to [a,b,c]
. The elements of the list can be any term.
Curly-braces are currently reserved for future extension, as are single- and double-quoted strings.
The grammatical syntax of Arend's specification language is roughly
specification = definition*
definition = head ":-" body "."
/ head "."
head = term
body = term
term = atomicTerm
/ infixTerm
/ prefixTerm
/ listTerm
/ "(" term ")"
atomicTerm = atom
infixTerm = term OP term // oversimplification
termList = (term ("," term)*)?
prefixTerm = atom "(" termList ")"
listTerm = "[" termList "]"
Note that, due to the ambiguity with ,
, infix terms with ,
as their head inside lists must either be enclosed in parentheses or use the raw form of ,
.
Operator precedence
Infix operators are parsed according to the following precedence table, from lowest to highest. The second column lists associativity.
Operator Assoc. Description
================= ======= ==============================
h(...) (t) Compounds, parenthesized terms
- ! Prefix minus, not
^ right Exponentiation
* / % left Multiplication, division, mod
+ - left Addition, subtraction
< <= > >= == != non Comparison
: right Cons
<user-defined infix operators>
= other* Unification
, & r,l Conjunction
| ; left Disjunction
:- left Predicate definition
Note that in the first group of operators, the descriptions are just the "standard" interpretations. These operators have no predefined meaning in Arend. The second group of operators includes those which have some semantic meaning in Arend.
Associativity of unification: A = B = C
is parsed as A = B, B = C
, except that the entire unification is setup all at once, so it's slightly more efficient.
Also note that while ,
is right associative, &
is left associative.
Note that, in its current state, Arend's grammar requires memoization to be enabled in order to avoid exponential parsing time in some cases. Hopefully this will be remedied in the future.