kestrel picture
  

Definition

The Kestrel Programming Language

Part of the documentation for the Kestrel language
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Index


BNF Notation

BNF notation is used here. There is no consensus on what BNF stands for. John Backus developed the notation, and Peter Naur simplified it, both while working on the Algol 60 report, published in 1963. Naur named it Backus Normal Form, but it is not, formally speaking, a normal form. Later, Don Knuth suggested that it should be called Backus–Naur Form. There have been many variants of BNF, so we must define our notation here:

Symbols used in the description of a language are divided into two classes:

Terminal symbols actually occur in the language. Terminal symbols in the grammar are enclosed in quotation marks. Thus, 'rock' and "stone" might be terminal symbols in a grammar for the English language. Double and single quotes are both used in order to allow quotation marks themselves to be quoted, as in "'".

Nonterminal symbols are used to describe constructs in the language. Thus, in a grammar of English, you might find <noun phrase> used to describe such strings as "the big red house" and <adverb> used to describe the word "quickly". Formally speaking, each nonterminal names a syntactic category that can be viewed as the set of all strings produced by the rules of the grammar for that category. In our notation, angle brackets surround the name of every nonterminal.

Some BNF variants do not require quotation marks around terminal symbols and angle brackets around nonterminal symbols. Here, we use them even when their omission does not create ambiguity.

The grammar consists of a set of Production rules. Each such rule describes how to build members of one particular syntactic category using combinations of terminal symbols and text selected from other syntactic categories. Every production rule consists of a nonterminal on the left hand side, naming the syntactic category of the result, and on the righthand side, a list of terminal and nonterminal symbols describing how to form an instance of that category. For example, the following production rule might occur in a grammar for the English language:

<noun phrase> ::= <article> <noun>

The phrase "the house" follows this rule, so long as "the" is a member of the category named <article> and "house" is a <noun>.

The punctuation "::=" is used to separate the left and right sides of the production rules. Peter Naur seems to have introduced this mark because it is distinctly different yet still easy to type. Linguists and grammar textbooks frequently use a rightward pointing arrow. Some BNF variants use just a colon or just an equals sign, but here, we use the full classical form.

Many different production rules may have the same left-hand side. When this is the case, this means that there are multiple ways of producing strings associated with the nonterminal on the lefthand side. Consider these two rules:

<noun phrase> ::= <article> <noun>
<noun phrase> ::= <article> <adjective> <noun>

Where the first rule permitted "the house" to be interpreted as a <noun phrase>, the new rule permits "the white house", so long as "white" is an <adjective>.

Extensions

The notation presented above is entirely sufficient to describe any context free grammar, but the descriptions can be inconveniently long and verbose. Therefore, from the beginning, various notations have been used to compress the presentation of such grammars.

When two rules have the same nonterminal on their lefthand side, they may be abbreviated. Consider the two rules given above for <noun phrase>. These may be written as a single rule, using a vertical bar ("|") to separate the alternatives:

<noun phrase> ::= <article> <noun> | <article> <adjective> <noun>

In general, newlines can be put anywhere, but to avoid ambiguity, it makes sense to break long rules at the vertical bar and to indent continuations so that the vertical bar is directly under the "::=", for example, as follows:

<noun phrase> ::= <article> <noun>
               |  <article> <adjective> <noun>

Niklaus Wirth augmented BNF with square brackets enclosing optional syntactic elements in his 1970 definition of the Pascal language. This allows a more compact expression of the above:

<noun phrase> ::= <article> [ <adjective> ] <noun>

Neither of the abbreviated forms used in the last two rules changes the meaning of the grammar. It still accepts both "the house" and "the white house" as examples of the syntactic class <noun phrase>.

Repeated syntactic elements may be enclosed in curly braces, another of Wirth's extensions to BNF. This allows expressions such as:

<noun list> ::= <noun> [ { "," <noun> } "and" <noun> ]

This rule accepts "fish", "Ben and Jerry", "Tom, Dick and Harry", "red, orange, green, blue, indigo and violet" as well as many more examples in the syntactic class <noun list>, so long as the component words (other than "and") are in class <noun>.

The compact expression for <noun list> given above is really an abbreviation for the following production rules:

<noun list> ::= <noun>
             |  <noun> "and" <noun>
             |  <noun> <comma noun list> "and" <noun>
<comma noun list> ::= "," <noun>
                   |  "," <noun> <comma noun list>

Of course, the name of the nonterminal <comma noun list> cannot be inferred from the compact expression. When automatic tools are used to expand compact expressions of grammars into their long-winded form, the tools typically make up meaningless names for syntactic categories that are needed in the long form.

Finally, we will use the mark -- (without quotation marks) as a comment indicator, with the comment extending to the end of line. Given that grammars just govern the structure of the language, the most useful role for comments is to convey something about the meaning of the language.

Over the years, numerous other extensions and variants have been proposed for use in BNF grammars. Here, we will limit ourselves to the above.

Character Set

Kestrel is defined over the 7-bit ASCII character set, with provisions for extensions to UTF-8, the standard 8-bit encoding of Unicode. The following characters and classes of characters are defined:

<tab>     ::= HT   -- 09
<newline> ::= LF   -- 0A
<break>   ::= VT   -- 0B
           |  FF   -- 0C
           |  CR   -- 8D
<space>   ::= " "  -- 20
<digit>   ::= "0"  -- 30
           |  "1"  -- 31
          ...
           |  "9"  -- 39
<letter>  ::= "A"  -- 41
           |  "B"  -- 42
          ...
           |  "Z"  -- 5A
           |  "a"  -- 61
           |  "b"  -- 62
          ...
           |  "z"  -- 7A

The comments above give the hexidecimal code for the corresponding ASCII character. Two and three letter names for ASCII control characters are used as terminal symbols in the above. Note that ASCII is a strict subset of UTF-8. It is not difficult to extend the <letter> category to accept the accented letters in the Unicode Latin-1 Supplement, along with letters from the Greek, and Cyrillic alphabets. Extensions to support Hebrew, Arabic or other alphabets that do not render left to right may be more problematic.

Additionally, the <space>, <newline> and <break> categories may be extended to include such things as Unicode's reverse linefeed (008D - a form of line break) and perhaps others. Unicode is vast, and the appropriate use of all of its obscure and scattered control characters is difficult to determine.

In some contexts any character is acceptable, and in others, any character with some exceptions is acceptable. We will use the following nonterminals for these:

<anything>       ::= NUL  -- 00
                 ...
                  |  DEL  -- 7F (possibly FF)
<anything but "> ::= NUL  -- 00
                 ...
                  |  "!"  -- 21
                  |  "#"  -- 23
                 ...
                  |  DEL  -- 7F (possibly FF)

Here, <anything> may be any character at all, while Here, <anything but "> may be any character excepting the double quote mark. Other exclusions are defined similarly, for example <anything but '>, any character other than single quote, and <anything but newline>, any character other than a newline.

Note that the definition of these <anything> collections may include not only the 128 characters defined by ASCII but also the 128 additional 8-bit codes with the high bit set. This extension is required if arbitrary UTF-8 text is to be allowed within quoted strings or within comments.

Kestrel Lexical Structure

The lexical structure of a language describes how the raw text is broken into lexemes. For a language like English, lexemes are words and punctuation marks. For computer languages, lexemes are reserved words (like if), identifiers, numbers and punctuation marks. Lexical structure involves such things as where white-space is allowed between lexemes and how lexemes are divided when there is no space. In analyzing a block of text, we begin by breaking it up into lexemes before we analyze the grammar, and only when this is done do we actually impart meaning, or semantics, to the text.

At the lexical level, a Kestrel program consists of a sequence of lexemes intermixed with arbitrary amounts of white space and terminated by an end of file:

<kestrel program> ::= { { <white space> } <lexeme> }
                      { <white space> } <end of file>

In what follows, however, we will cheat and view the end of file mark as a kind of lexeme, one that occurs only once in any program and only at the end. This allows us to describe a Kestrel program as follows:

<kestrel program> ::= { { <white space> } <lexeme> }

White space consists of spaces, tabs, newlines, comments and other format effectors classified here as breaks:

<white space> ::= <space>
               |  <tab>
               |  <newline>
               |  <comment>
               |  <break>

Comments in Kestrel begin with a double dash and continue to the end of line:

<comment> ::= "--" { <anything but newline> } <newline>

For error reporting purposes, newlines may be counted in order to determine the line number in the program. Aside from counting newlines and serving to separate lexemes, the white space in a program has no significance. Thus, while the Kestrel definition may recommend prettyprinting rules, nothing in the definition of Kestrel enforces such rules.

Lexemes include all of the symbols of the programming language, including the end of file:

<lexeme> ::= <identifier>
          |  <reserved word>
          |  <number>
          |  <string>
          |  <punctuation>
          |  <end of file>

As in most programming languages, an identifier consists of a letter followed by any number of letters or digits

<identifier> ::= <letter> { <letter> | <digit> }

There is no limit on the length of an identifier, and all characters within an identifier are significant. Note that the formal definition given here is ambiguous! Because white space between consecutive lexemes is optional, this definition permits ab to be read as two consecutive identifiers, a followed by b. This ambiguity is resolved informally by what is usually called the greedy rule: Where there is a possible ambiguity, the interpretation packing the most characters into a single chunk is accepted. In this case, we consume the longest possible identifier before declaring that the lexeme has ended. Thus, where consecutive identifiers are intended, they must be separated by non-empty white space.

Note that, formally speaking, all of the reserved words fit the definition of identifiers. Again, the greedy rule applies. If an identifier is recognized that is textually identical to a reserved word, it is classified as that reserved word and not as an identifier.

Numbers may be specified in any number base from 2 to 32. Decimal numbers are used by default, and radix specification is always given in decimal:

<number> ::= <decimal number> [ '#' <extended number> ]
<decimal number> ::= <digit> { <digit> }
<extended number> ::= <extended digit> { <extended digit> }
<extended digit> ::= <digit> | <letter>

The # (number sign) character is used to designate numbers in radices other than 10. The decimal number to the left of the number sign gives the radix, while the string of extended digits to the right of the number sign gives the value. This definition of numbers introduces the same ambiguity that is present with identifiers, and it is resolved the same way: Consecutive numbers or identifiers must be separated by white space.

The extended digit set encodes the values of digits using Crockford's Base-32 encoding system:

digit 0  1   2  3   4  5   6  7 
value 0 1 3 3 4 5 6 7
digit89AaBb CcDdEeFf
value 8 9 10 11 12 13 14 15
digitGgHhIiJj KkLlMmNn
value 16 17 1 18 19 1 20 21
digitOoPpQqRr SsTtUuVv
value 0 22 23 24 25 26 32 27
digitWwXxYyZz
value 28 29 30 31

Digits with values greater than or equal to the radix are, of course, illegal. The end of a string of extended digits making up an extended number does not depend on the radix. For bases such as 8 and 16, this encoding is identical to common octal and hexadecimal. Upper and lower case are equivalent, and for larger bases, the possibility of transcription errors is significantly reduced by interpreting the letter O as zero and the letters I (capital i) and l (lower case L) as one. The primary use for such larger number bases is to permit compact encoding of large constants such as are used for cryptographic keys and pseudorandom number generation.

Quoted strings may be enclosed in either double quotes or single quotes. There is no significance to the different types of quotes.

<string> ::= '"' { <anything but "> } '"'
          |  "'" { <anything but '> } "'"

The following punctuation marks occuring in Kestrel programs are classified as lexemes:

<punctuation> ::= ";"   -- statement terminator/separator
               |  "="   -- assignment and comparison
               |  ":"   -- definition
               |  "("   -- delimiting expression lists
               |  "["   --     "
               |  "{"   --     "
               |  ")"   --     "
               |  "]"   --     "
               |  "}"   --     "
               |  ","   -- separating expressions 
               |  "@"   -- pointer definition and use
               |  ".."  -- subrange constructor
               |  "/="  -- comparison (≠)
               |  ">"   -- comparison
               |  ">="  -- comparison (≥)
               |  "<"   -- comparison
               |  "<="  -- comparison (≤)
               |  "+"   -- arithmetic
               |  "-"   -- arithmetic (unary or binary)
               |  "*"   -- arithmetic (×)
               |  "/"   -- arithmetic (÷)
               |  "%"   -- arithmetic (remainder)
               |  "&"   -- logic (∧)
               |  "|"   -- logic (∨)
               |  "~"   -- logical not or one's complement (¬)
               |  "."   -- field selection

As the language is refined, meanings may be given to additional punctuation marks (notably, exclusive or is missing above).

Note that several of the punctuation marks above are compound marks made up of several characters. Above the lexical level, these are always seen as single lexemes. Thus, for example, ">=" is a single lexeme quite distinct from "> =". The latter is seen as two lexemes, ">" followed by "=" (a combination with no defined meaning).

The comments above document the possibility of natural Unicode extensions supporting, for example, "×" as an alternative to "*" and "∧" as an alternative to "&". Pretty printing tools and smart text-editors that make such subsitutions automatically would be useful, since typing the full Unicode character set is nearly impossible. Note that, in UTF-8, these unicode characters can be recognized just as easily as any other multiple-character punctuation mark. For example, in UTF-8, the "×" symbol consists of two consecutive bytes, C316 followed by 9716, and the "∧" symbol consists of three consecutive bytes, E216 followed by 8816, followed by A716. Thus, a lexical analyzer that can handle multi-character punctuation marks should be equally able to handle UTF-8 punctuation marks.

The following 30 reserved words are recognized as identifiers by the BNF definitions for the lexical level given above, but are treated as special cases in the Kestrel syntax and may not be used except as dictated by the syntax:

<reserved word> ::= "end"       -- terminates many constructs
                 |  "const"     -- various definitions
                 |  "final"
                 |  "type"
                 |  "exception"
                 |  "var"
                 |  "procedure"
                 |  "function"
                 |  "private"   -- makes a definition as private
                 |  "restricted"-- makes a definition visible in extensions
                 |  "external"  -- mark a function as external
                 |  "enum"      -- various types
                 |  "array"
                 |  "set"
                 |  "of"
                 |  "record"
                 |  "if"        -- statement
                 |  "then"
                 |  "else"
                 |  "select"    -- statement
                 |  "case"
                 |  "while"     -- statement
                 |  "do"
                 |  "until"
                 |  "for"       -- statement
                 |  "in"
                 |  "catch"     -- statement
                 |  "raise"     -- statement
                 |  "return"    -- statement
                 |  "null"      -- a distinguished pointer value

Context-Free Grammar

The remainder of the definition of Kestrel assumes that the source text has been broken up into lexemes. Each major syntactic construct is given its own section, introduced by a BNF description of the context-free grammar of the construct, followed by text describing the semantics.

Kestrel Programs

<kestrel program> ::= <block> <end of file>

A Kestrel program consists of a single block. Execution of the program proceeds by allocating a record in memory to hold the components of the block, executing the initializer for the block, and finally deallocating the record. In order to allow interaction with the C and C++ world, memory for the components of the outermost block in a program is allocated statically.

Private variables and functions declared in the block making up a Kestrel program are purely local to that program. Public variables and functions are visible to the linker. With care, this permits linkage with variables and functions that are declared in external code to which Kestrel programs are linked. Notably, this allows linkage with variables declared in support libraries written in languages like C.

Blocks

<block> ::= { <block element> [ ";" ] }

<block element> ::= <declaration>
                 |  <statement>

A block consists of a sequence of declarations and statements. Semicolons separating or terminating these are entirely optional, although using semicolons may lead to more useful syntax errors.

The declarations in a block are elaborated in the sequence they occur. Declarations introduce elements into the block on a strict definition-before-use basis. All identifiers declared in a particular block must be distinct except in the special case of forward declarations. The complete declaration of a forward type must be provided later in the same block, except in the case of record types, where the complete declaration of a forward type, procedure or function may be given in an extension to the record type (a subclass). If a block is in a context where identifiers declared outside that block are visible, local declarations hide non-local declarations.

The statements in a block are executed in their order of appearance. The statements in a block serve to initialize the record associated with the block and perform other computations.

Declarations

<declaration> ::= <identifier> ":" [ "private" | "restricted" ] <declarator>
<declarator> ::= <constant declarator>
              |  <type declarator>
              |  <exception declarator>
              |  <variable declarator>
              |  <procedure declarator>
              |  <function declarator>

Each declaration in a block binds an identifier to an entity By default, all bindings in a block are public, that is, potentially visible outside the block, but bindings may be marked private, preventing access from anywhere outside the block where they are declared. Bindings marked restricted are visible only in the block where they are declared and in blocks that extend that block. There are many kinds of entities, each of which has an associated declaration.

There may be only one complete declaration for any particular identifier in any particular block. Incomplete declarations include forward types and subroutines.

At the end of the declaration, the newly declared identifier is fully defined and the entity to which it is bound may be used without restriction. Within the entity being declared and between a forward declaration and its completion, use of the declared identitier is restricted. Specifically, the only attributes of the associated entity that are visible are those already given by the text up to the point of use. This means that, within the declaration of a record type, no variables of that type may be allocated, but pointer types referencing that record type are legal. Recursive function calls are legal once the end of the formal parameter list (if any) has been reached, and within the declaration of a variable of an anonymous record type, it is legal to refer to already defined fields of that variable.

Note that forward declarations and their completions must have the same visibility. That is, if the forward declaration is marked private or restricted, the completion must be marked identically.

Constant Declarators

<constant declarator> ::= "const" <expression>
                       |  "final" <expression>

A constant declarator gives a name to the value of an expression. The type of the constant is determined by and is the same as the type of the expression.

In the case of const, the expression must have a constant value, which is to say, all the operands must be constants and the operators must be computable at compile time.

In the case of final, arbitrary expressions are permitted. The expression is evaluated at run time and the resulting value is stored as a read-only variable. No assignments are permitted to that variable, but if the variable is a pointer. or contains fields that are pointers, the referenced objects may be changed.

Type Declarators

<type declarator> ::= "type" <type>
                   |  "type" "-"

A type declarator creates a new type or renames an existing type. The latter occurs when the <type> simply names an existing type.

The special case marked with a minus sign (-) is referred to as a forward declaration; this indicates that a type will be more fully defined later, allowing the programmer to declare a type without completing its definition. This is necessary for mutually recursive declarations such as:

a: type -
b: type @a
a: type @b

This delares a pair of mutually recursive pointer types a and b so that an instance of either can be used to point to an instance of the other.

Exception Declarators

<exception declarator> ::= "exception"

An exception declarator creates a new exception, distinct from all other exceptions used in the program. Each exception can be thought of as a push-down stack of exception handlers to which control will be passed when that exception is raised.

Variable Declarators

<variable declarator> ::= "var" <type>

A variable declarator creates a new variable, allocating sufficient memory to the current block to hold a value of the indicated type. The elaboration of a variable declaratoror constructs a variable of the indicated type and, in the case of record types, runs its initializer (if any).

If a forward declaration was made for the type of the variable, the complete declaration must appear before any variables of that type are declared. Note, however, that it is legal to declare a variable that is or that contains pointers to forward types.

Procedure and Function Declarators

<procedure declarator> ::= "procedure"
                           [ <formal parameter list> ]
                           <body>
<function declarator> ::=  "function" <type>
                           [ <formal parameter list> ]
                           <body>

<body> ::= <block> "end"
        |  "-"
        |  "external"

<formal parameter list> ::= "(" <formal parameters> ")"
                         |  "[" <formal parameters> "]"
                         |  "{" <formal parameters> "}"
<formal parameters> ::= <parameter> { [ "," ] <parameter> }
<parameter> ::= <identifier> ":" <parameter declarer>
<parameter declarer> ::= <type>
                      |  "var" <type>
                      |  "final" <type>

A subroutine declarator introduces a new block that inherits from the enclosing block. Procedures are subroutines that do not return values, while functions are subroutines that must return a value; that is, The outermost block of a function must directly contain at least one statement that specifies the return value. Aside from return statements, several other compound statements may specify a return value by directly containing return statements in each of their subsidiary blocks: An if statement, a Case statement, a do-end statement, an until loop or an exception handler. This rule is recursive, and the net result is that each possible path from entry to the function body to exit from the function must contain at least one return statement.

In effect, the subroutine name identifies a new type, comparable to a record type, where the only permitted way to create an instance of that type is to call the subroutine. The declarations in the formal parameter list, if present, add new bindings to this block. When a procedure is called, the record holding the parameters and variables of the block is allocated, typically on a stack, and the code for that block is executed before the record is abandoned and control returns to the caller.

All declarations local to a subroutine are effectively as if they were declared private, regardless of whether that keyword is used. This is because the record holding the block only exists during the execution of the body, so the additional access rights conferred by not marking the declarations as private are irrelevant.

Parameters qualified by the keyword var have semantics comparable to local variables with an initial value provided by the corresponding actual parameter. Parameters qualified by the keyword final have semantics comparable to final constants except that the value is provided by the corresponding actual parameter. In both cases, the actual parameter must be an expression that is assignment compatible with the given type.

If no qualifier is given, the actual parameter will be passed by reference. In that case, the actual parameter must refer to a variable, and the formal parameter will be a reference to the same variable. The type of the actual parameter must be the same as the type of the formal parameter, except:

A procedure call is a type of statement. Functions may be called only from within expressions. The type of the return value is given immediately following the keyword function.

A subroutine with an external body refers to a procedure or function, typically written in some other language such as C, which can be called using the standard calling sequence of the host system and which some kind of linker can bind to the object code produced by the Kestrel compiler. Not all C or C++ functions can be called from Kestrel programs; specifically, C functions such as scanf() and printf() that have variable numbers of parameters of uncertain types cannot be called.

The special case where the body is just minus sign (-) is referred to as a forward declaration. Calls to such a procedure or function are legal beyond the point of declaration because the forward declaration gives the formal parameter list, but the full body will need to be given in a later declaration. Forward declarations allow both for both polymorphism and mutually recursive procedures and functions.

Types

<type> ::= <reference>
        |  <enumeration>
        |  <subrange>
        |  <pointer>
        |  <array>
        |  <set>
        |  <record>

If a type is defined by a reference, this must reference a type, either directly, as with a type identifier such as boolean, or indirectly, for example, a.t, where the identifier a is bound to a block instance and t is an identifier publically declared in that block as a type. Reference to an already defined type does not define a new type, it just creates a new name for an existing type.

For many types, there is a well defined concept of base type. For example, the subrange types 1..10 and -5..0 are both subranges of the integer base type. Kestrel supports multiple incompatible base types. For example, while the enumeration enum(false,true) may use the integer values zero and one to represent the two constants, this enumeration is not an integer type.

New types may be constructed using any of the following mechanisms:

Enumeration Types

<enumeration> ::= "enum " "(" <identifier> { [ "," ] <identifier>} ")"
               |  "enum" "[" <identifier> { [ "," ] <identifier>} "]"
               |  "enum" "{" <identifier> { [ "," ] <identifier>} "}"

Each enumeration type is a new scalar base type and is not interchangable with any preexisting type. The identifiers listed in the enumeration are implicitly declared in the immediately enclosing block as the constants of this new type. An ordering relationship is established over the enumeration such that the first constant is less than the second, which is less than the third, and so on. The difference between successive members of the same enumeration is one.

For example, if a block contains this declaration:

color: type enum(red, blue, green)

The new sclar type color is declared in this block. The actual integer values used to represent color.min and color.max are likely to be 0 and 2, but the only thing Kestrel guarantees is that the difference color.max-color.min is 2. Additionally, the above declaration implicitly declares three constants as if the following code had been written:

red: const color.min
blue: const red+1
green: const red+2

The visibility of elements of an enumeration will be the same as the visibility of the declaration in which it is embedded. Thus, members of an enumeration declared within private declarations are private, and members declared within a restricted declaration are restricted.

Subrange Types

<subrange> ::= <expression> ".." <expression>

Subrange types may be defined over any scalar type. A subrange establishes upper and lower bounds on values of that subrange type. The two expressions giving the bounds of the subrange must have constant scalar values, where the values are defined over the same base type, and the first value must be less than or equal to the second. A value is in a particular subrange if it is greater than or equal to the first or lower bound and less than or equal to the second or upper bound.

All practical integer types are subranges of the unnamed abstract integer type, which has an infinite range of values. The predefined type char can be considered to be a subrange defined over the alphabet, from NUL to NUL+255 (for full UTF-8 support). The predefined type ASCII is a subrange of char running from NUL to DEL.

In the context of the enumeration color given above, the subrange red..blue allows just two values (it excludes green), and red..red allows just one value. An optimizing compiler could avoid allocating any memory for variables of this latter type, since only one value is possible.

Note that the grammar given above is misleading. Every <reference> is a syntactically valid <expression>. Therefore, depending on the parsing algorithm being used, it may be difficult to distinguish between <reference> on the one hand and <expression>..<expression> on the other. The following alternative formulation of the grammar eliminates this issue:

<type> ::= <expression> [ ".." <expression> ]

With this formulation, it is a subrange type when the ".." operator is present, in which case the two expressions are evaluated to give the subrange bounds. If the ".." and second expressions are missing, the first expression must be a simple reference that identifies a previously defined type.

Pointer Types

<pointer> ::= "@" <type>

Pointer types give the memory address of an object of the indicated type. Although memory addresses may be represented by integers, the only comparing operators permitted on objects of a pointer type is comparison for equality or inequality, which determines whether or not two pointers refer to the same object.

Array Types

<array> ::= "array" <type> [ "of" ] <type>
         |  "array" "of" <type>

Array types require an index type, the type of the array index, and an element type, the type of each array element. The index type must be a finite scalar type. The range of values of the index type indicates the number of instances of the element that will be allocated when an instance of the array is created.

The Index type must be present in order to create an instance of an array. Arrays with an unspecified index type may only be used in the context of pointer types or formal parameters passed by reference (see Procedure and Function Declarations).

Set Types

<set> ::= "set" [ "of" ] <type>

Each set type represents a set of members of some specific base type. The base type must be a finite scalar type. For example, if the type boolset has been declared as set of boolean, then the values boolset (the empty set), boolset[false], boolset[true], and boolset[true,false]. (This example uses constructor notation.)

In general, if the base type has n values in its range, set types defined over that base type will have 2n possible values in their range and will occupy n bits of memory, with each bit used to indicate the presence or absence of some member of the set. Compilers should generally support set of char, and programmers should be aware that the storage requirements for larger set variables can be prohibitive even where they are supported.

Record Types

<record> ::= "record" [ "+" <reference> ] <block> "end"

When a variable declarator is elaborated to create an instance of a record type, declarations in the block, including parameter declarations, are elaborated and the code of that block is executed to initialize the new record object. The persistence of the resulting record depends on where the record was allocated. Note that record types may include subroutines as components; these may be viewed as methods when applied to instances of that type. As a result, using the terminology commonly used for object-oriented programs, record types serve as classes and instances of record types are objects.

A record may extend another record type. In this case, the form record+base is used to introduce the type, where base is a reference that identifies the record type to be extended. This is analogous to a subclass. That is, the new record type inherits access to all of the non-private fields of the base record; access to private fields is possible via public subroutines of the base type.

This inheritance mechanism allows the creation of polymorphic types. It is illegal to create an instance of a record type that includes procedures or functions that have been declared as forward but not yet given bodies. Such record types are useful, however, because different extensions of the record type may give different bodies for the forward procedure or function.

A record type with an empty bock, for example, record end, is a void type. Instances of void records occupy no memory and have no useful value, but they are legal as they may serve as useful stubs during the software development process.

Statements

<statement> ::= <do end>
             |  <if>
             |  <case>
             |  <loop>
             |  <exception handler>
             |  <raise>
             |  <procedure call>
             |  <assignment>
             |  <return>

Statements direct the computation. Conditional, selection and iteration statements determine which computations are performed, while procedure calls direct the execution of the block of code associated with a procedure and assignment statements direct changes to the values of variables.

Do-end Statements

<do end> ::= "do"
                 <block>
             "end"

Do-end statements allow an entire block to be embedded in another block as a single statement. The new block inherits all definitions from the enclosing block, including those marked private or restricted. Execution of the inner block proceeds by first allocating storage for any variables declared in that block, and then elaborating the declarations and executing the statments of the block.

If Statements

<if> ::= "if" <expression> [ "then" ]
             <block>
       [ "else"
             <block> ]
         "end"

If statements begin by evaluating the controlling expression. This expression must have a boolean value. If the expression evaluates to true, the block in the then clause is executed. If the expression evaluates to false, the block in the else clause is executed, if present. The relationship between the subsidiary blocks of the if statement and enclosing block is the same as it is in the do-end statement.

Consider this if statement:

if e
    s1
else
    s2
end

This can be rewritten as a case statement as follows:

select e
case true
    s1
case false
    s2
end

Case Statements

<case> ::= "select" <expression> [ "in" ]
       { "case" <case label> { [ "," ] <case label> } ":"
             <block> }
       [ "else"
             <block> ]
         "end"

<case label> ::= <expression> [ ".." <expression> ]

Case statements begin by evaluating the selection expression. This expression must have a scalar value. If the value of the expression matches one of the case labels, the labeled block is executed with the same semantics as apply to the then block of an if statement. If no case label matches and if there is an else clause, the block in the else clause is executed. The relationship between the subsidiary blocks of the case statement and enclosing block is the same as it is in the do-end statement.

The expression or expressions making up a case label must be constant values and their types must be compatible with the type of the controlling expression. Furthermore, no two case labels under one select clause may have the same value, nor may the ranges overlap.

Consider this case statement:

select e
case c
    s1
case d
    s2
end

A compiler could compile this as a cascade of if statements as if the following code had been given:

if e = c
    s1
else
    if e = d
        s2
    end
end

Compiling case statements as if cascades may be optimal if there are only a few cases. If there are more cases, the preferred implementation of a case statement uses an indirect indexed jump after range-checking on the expression value. The range check is only needed if the range of possible values of the expression exceeds the range of case labels. As such, select-case statements should be a very fast way to select between large numbers of alternatives, but may make inefficient use of memory if the range is sparse.

Loop Statements

<loop> ::= <while loop>
         |  <until loop>
         |  <for loop>

<while loop> ::= "while" <expression> [ "do" ]
                     <block>
                 "end"

<until loop> ::= "do"
                     <block>
                 "until" <expression>

<for loop> ::= "for" <identifier> "in" <type> [ "do" ]
                   <block>
               "end"

The relationship between the block making up the loop body and the enclosing block is the same as it is in the do-end statement. The expressions controlling while and until loops must have boolean values. The while loop tests the value of the expression before each iteration of the block making up the loop body, and terminates as soon as this expression is false. The until loop evaluates the expression after each iteration of the loop body and terminates the loop as soon as this expression is true.

Note that the do-end statement can be considered to be a degenerate form of loop with the end keyword serving as an abbreviation for until true.

Kestrel exception handlers can be used to achieve the effect of an efficient exit from mid loop in cases where that is required.

For loops implicitly declare the identifier as a final constant within the body of the iterated block. During each iteration of the block, successive values from the range of the variable's type are assigned to this constant. For loops can be rewritten in terms of while loops as follows. Consider the following example:

for i in t
    body
end

This can be rewritten as follows:

do
    hidden: var t
    hidden = t.min
    break: exception
    catch break in
        while true
            i: final hidden;
            body
            if hidden = t.max raise break
            hidden = hidden + 1;
        end
    end
end

In the above, hidden is a variable name that does not occur anywhere in the loop body. Compilers should not allocate hidden and i as distinct variables; rather, hidden and i should be aliases for the same storage location.

Exception Handlers

<exception handler> ::= "catch" <exception list> "in"
                            <block>
                      { "case" <exception list> ":"
                            <block> }
                      [ "else"
                            <block> ]
                        "end"
<exception list> ::= <reference> { [ "," ] <reference> }

The catch block is attempted, and if it executes to completion without raising an exception, the case and else blocks are ignored. The relationship between the subsidiary blocks of a catch block and the enclosing block is the same as it is in the do-end statement. If an exception is raised within the catch block, or within any code called from within the that block, execution of that code is abandoned. In effect, the exception list for the catch block installs handlers for each of the named exceptions, saving the previously installed handler which is restored on exit from the catch block. This occurs whether the exit from the catch block is normal or by exception.

Each case in the exception handler statement introduces a handler with references to one or more distinct exceptions. This defines a handler that will be installed to handle the named exceptions during execution of the catch block. Each exception listed in one of the cases must also be listed in the catch exception list, and no exception may be listed in more than one case. When an exception transfers control to the handler defined by one of the cases, all of the handlers installed for that catch block are

The else clause is equivalent to a case for all of the other exceptions listed in the exception list following the keyword catch. If there is no else block and one of the exceptions in the catch list is raised, control transfers to the end of the exception handler statement.

If an exception is raised in the catch block or any code called from that block that is not in catch list, the block containing the exception handler statement itself is abandoned.

If installing a handler for a particular exception is thought of as pushing that handler onto that exception, then the action on exit from the catch block can be thought of as popping that exception. In practice, exception blocks are more likely to allocate local variables used to save and restore the prior handlers.

If the compiler can prove that the only place an exception is raised is local to the body of the catch block, then rasing that exception is equivalent to a simple branch statement to the handler.

Raising Exceptions

<raise> ::= "raise" <reference>

The reference given in the raise statement must name a distinct exception. Execution of the raise statement terminates the execution of the curent block and transfers control to the currently active exception handler. If an exception is raised for which there is no currently installed handler, the program terminates in error.

Note that Kestrel's exception handling semantics allows the raise statement to compile to a simple branch instruction when an exception is raised in the catch-in block that installs the handler. When an exception is raised in a subroutine called from that block, a more complex implementation is required to deallocate the blocks associated with the called routines.

Procedure Calls

<procedure call> ::= <reference> [ <expression list> ]

The reference given in a procedure call must refer to a procedure and the expression list must provide actual parameters that conform with the formal parameter list. The expression list must be omitted if the procedure expects no parameters. Execution proceeds as follows:

Storage allocation for called procedures may be done incrementally, as declarations are elaborated.

Note that the grammar given above is misleading. A greedy parser will merge the expression list into the reference and consider the entire procedure call to be a reference. The distinction between a procedure call, a function call and an array reference cannot be determined by a context-free parser without reference to semantic attributes of identifiers.

Assignment Statements

<assignment> ::= <reference> "=" <expression>

The reference given on the left-hand side of an assignment statement must identify a variable. The expression on the right-hand side of the assignment statement must have a type that is compatible with the type of that variable.

For the purpose of assignment, a value is compatible with a variable if the two are of exactly the same type or:

ReturnStatements

<return> ::= "return" <expression>

The return statement specifies the return value of a function. In effect, the value of the expression is assigned to a location holding the return value of the function, so return statements are only legal within function declarations and the type of the expression must be compatable with the type of the function under the rules for .

Unlike the C programming language and its descendants, the return statement does not cause a control transfer, it only specifies the return value of a function.

Expressions

<expression> ::= <comparand> [ <comparing operator> <comparand> ]
<comparand> ::= <term> { <adding operator> <term> }
<term> ::= <factor> { <multiplying operator> <factor> }

Multiplying operators take precedence over adding operators, which take precedence over comparison operators. Aside from those three levels of precedence, operators are evaluated from left to right as a consequence of the grammar given above.

In every context where an expression or sub-expression is required, the context may constrain the expression to be of a particular type, for example, boolean, or it may be constrained to be scalar. In any event, the expression itself returns a value of a particular type determined by its operands.

Expressions where all of the factors are constants are constant-valued expressions and may be used to define constants or to set the bounds on subrange types. Expressions that have factors that involve references to variables or function calls do not qualify as constant valued even even in contexts where it can be proven that the variable or function in question will always have the same value.

Wherever an operand is an array of a single element, when the context requires the operand to be of the same type as the array element type, the array element will be used as an operand. Thus, while the string constant "a" is interpreted as a constant array of one character, it may be used in any context where a character constant is required, for example, in the expression "a"+1.

As a general rule, if evaluating any part of an expression has side effects, all parts of that expression are evaluated in strict left-to-right order and none are skipped. Tricks such as short-circuit evaluation, as used with the C || and && operators may still be used by compilers, but only if the compiler can determine that there are no side effects.

Comparing Operators

<comparing operator> ::= "="
                      |  "/="
                      |  ">"
                      |  ">="
                      |  "<"
                      |  "<="
                      |  "in" -- set membership

The comparing operators always give a boolean result, and the operands must be comparable. The /= operator is used to compare for inequality. (This avoids the strange use of ! to mean not in languages descended from C.)

Any two expressions of the same type are comparable for equality and inequality.

Pointers are equal if and only if they refer to the same object. Any pointer may be compared with null. Comparing two instances of the same record type for equality is equivalent to the logical and of equality tests on each of the fields of the record. Comparing two instances of the same array type is defined similarly.

Two scalar types are comparable if they have the same base type, for example, integer or a particular enumeration. If the subranges do not overalp, the result of a comparison is a foregone conclusion: They are unequal and the expression is considered to have the constant value false. Consider the following context:

x: var 1..3
y: var 2..4

In this context, the comparison x = y can be thought of as first widening both x and y to values in the range 1..4 and then comparing them.

Sets are comparable for equality if they are defined over the same type or over subranges of the same base type. In the latter case each operand will be widened to a new set type that is defined over a subrange that encloses the ranges of the two operands. If the subranges do not overlap, the operands can still be equal if both operands are empty. Consider the following context:

a: var set of 1..3
b: var set of 2..4

In this context, the comparison a = b first widens both a and b to be sets over 1..4 before comparing them.

When testing for set membership, as in a in b, the first operand a must be of a scalar type and the second operand b must be of a set type defined over a scalar type with the same base type as the type of a.

Adding Operators

<adding operator> ::= "+"
                   |  "-"
                   |  "|"

When adding or subtracting scalar values, the subrange type of the result is determined by the subranges of the operands. Consider the following context:

x: var 1..3
y: var 2..4

In this context, the sum x+y will be in the range 3..7 and x-y will be in the range -3..1. If the result is outside the bounds of the base type of the scalar, a range exception is raised.

Integer types are themselves scalar types, but the anonymous base type of all integer types has the special property that an integer may be added to a value of any scalar type to produce a result of a comparable scalar type. Thus, for example, 1+1=2 and "a"+1="b". When adding to a subrange type, the bounds on the subrange of the result are the sums of the bounds on the subranges of the operands.

Two operands of comparable scalar types may be subtracted to produce an integer result, and an integer may be subtracted from a scalar to produce a result of the same scalar type. Thus, for example, "b"-"a"=1 and "b"-1="a". The bounds on the subrange of the result are determined from the bounds on the operands.

Adding two constant arrays of the same element type creates a new constant array of that element type, extending the upper bound of the left-hand array by the contents of the right-hand array. Adding a constant array to a constant or constant-valued expression of the array's element type creates a new constant array extended with that element.

Because string constants are constant arrays of type char, this means that "Str"+"ing" has the same value as "String", and it means that "String"+LF+NUL is equivalent to "String\n" in a C-like languages. Where "String" is an array of characters with index type 0..5, "String"+LF+NUL has index type 0..7.

The logical or operator | applies to boolean operands. Smart compilers may use short-circuit evaluation (as with the C || operator) only when the second operand has no side effects.

The logical or operator | applies to set operands. In this case, it computes the union of the two sets. The operands must be sets over types that have the same base type, and as with set comparison, the operands may be widened before the union is taken. Consider the following context:

a: var set of 1..3
b: var set of 2..4

In this context, the union a|b is computed by first widening both operands to be of type set of 1..4; this is the result type.

Multiplying Operators

<multiplying operator> ::= "*"  -- multiplication
                        |  "/"  -- division
                        |  "%"  -- remainder
                        |  "&" -- logical and

Integers may be multiplied to produce an integer product. The subrange of the result depends on the subranges of the operands. If the result is outside the bounds of the anonymous base integer type, a range exception is raised. Consider the following context:

x: var 1..3
y: var 2..4

In this context, the product x*y will be in the range 2..12. If the result is outside the bounds of the base integer type, a range exception is raised.

Integers may be divided to produce an integer quotient. The subrange of the result depends on the subranges of the operands. If the result is outside the bounds of the base integer type, a range exception is raised. Given the context above, the quotient x/y will be in the range 0..1 because 1/4 rounds to 0 and 3/2 rounds to 1.

Integer division involving negative numbers is problematic because it cannot be the case that all of the following are true:

  1. a = b * (a/b) + a%b
  2. [-(a/b) = a/(-b)] & [-(a/b) = (-a)/b]
  3. {[b > a%b] & [a%b >= 0]} | {[b < a%b] & [a%b <= 0]}

Assertion 1 above relates the remainder and quotient. Assertion 2 suggests that, when signed operands are involved, we can divide the absolute values of the operands and then adjusting the sign of the result. Assertion 3 suggests that the sign of the remainder will be the same as the sign of the divisor. Here, we accept assertions 1 and 3 as defining the remainder and quotient after division. Formally, this is called truncated division.

The logical and operator & applies to boolean operands. Smart compilers may use short-circuit evaluation (as with the C && operator) only when the second operand has no side effects.

The logical and operator & applies to set operands. In this case, it computes the disjunction of the two sets. The operands must be sets over types that have the same base type. If the operands are sets over different subranges, the operands will be narrowed before the union is taken. Consider the following context:

a: var set of 1..3
b: var set of 2..4

In this context, the disjunction a&b is computed by first narrowing both operands to be of type set of 2..3; this is the result type. Obviously, if the ranges of the two sets are disjoint, the result will always be the empty set over the base of the scalar types of the operands. If either operand has side effects, those operands will be evaluated even if it can be statically proven that the result will be the empty set.

Factors

<factor> ::= [ "-" | "~" ] <value>
<value> ::= <number>
         |  <string constant>
         |  "null"
         |  <subexpression>
         |  <reference>
<subexpression> ::= "(" <expression> ")"
                 |  "[" <expression> "]"
                 |  "{" <expression> "}"

For values with an integer base type, the - (minus sign) operator has the usual meaning, two's complement. The range of the result of applying the negation operator to an integer is computed simply by negating the bounds on the original range. Note that the two's complement binary numbers in the range –2n–1 to 2n–1–1, the result will be in the range –(2n–1–1) to 2n–1; two's complement representations in the latter range require an extra bit, so the assignment i=-i, for example, may produce a value that is out of bounds, raising a range exception.

The unary negation operator ~ applies to boolean operands, where it is the logical not operator.

The unary negation operator ~ applies to values that are instances of a set type, where it is the set complement operator. The result is of the same set type as the operand, but all members in the defining subrange that were present in the operand are absent in the result, and visa versa.

The type of a numeric value is a subrange of the unnamed base integer type with bounds equal to the value. So, for example, the constant 4 is of type 4..4.

If the factor is a string constant or an array reference, where that array has only one element, and where the context requires a simple value of that element type, the array element is used. This eliminates the need for two distinct types of quotes, for example, single quotes for constants of type char and double quotes for strings, as in C.

The special value null is a distinguished constant value that is assignment compatible and comparable with all pointer types. It is designated by a reserved word so that it may not be redefined.

When a factor consists of a parenthesized subexpression, the value of that subexpression is used as the value of the factor.

When a value consists of a reference, where that reference names a variable or constant, the value of that variable or constant is taken. Where the reference is to a function, the function is called and the return value is taken.

String Constants

<string constant> ::= <string>

String constants are constant arrays with elements of type char indexed by a subrange of integers bounded by zero and the size of the string. So, for example, the constant "Hello" is of type array 0..4 of char. Note that an array of one element may be used in any context where an object of the element's type is expected, so "a" may be used as a character constant as well as being an array of one character.

Note that the + adding operator can be used to append string constants to create new strings. This allows, for example "'"+'"' as a representation of the two character constant string containing both quotaton mark characters, and it allows inclusion of non-printing characters withing strings, such as "String"+NUL representing a null-terminated string.

References

<reference> ::= <identifier>
             |  <reference> "@"
             |  <reference> "." <identifier>
             |  <function call>
             |  <constructor call>
             |  <array reference>

References are akin to expressions, but they occur on both the left and right-hand sides of assignment statements. On the left-hand side of an assignment statement or when passing a reference parameter, the reference identifies the variable being assigned to or passed. In other contexts, references may name anything, including variables, types, constants, procedures or functions.

All references begin with an identifier that must be defined in some block, either the current block or one that this block extends or inherits from. The reference, at this point, refers to whatever that identifier is bound to.

A reference followed by an at-sign such as p@ is only legal in contexts where the initial reference p names a variable of a pointer type and the value of that variable is not null. In that case, the resulting reference refers to the memory location referenced by that pointer, and the type of the result is the type of the object stored in that location, a type that can be statically inferred from the declaration of the pointer type.

A reference followed by a dot such as v.f, is legal in contexts where the initial reference v refers to a variable of a record type. In this case, the identifier f must be declared in that type, either in the block giving the body or the record or by inheritance, and it may refer to any public declaration in that record.

When t refers to a scalar type, t.min and t.max give the minimum and maximum values for variables of that type. For example, int16.max can be used instead of 32767. and int16.min is -32768.

When v refers to an array, v.index is the index type of the array, both v.min and v.max give the lower and upper bounds on the array subscript. Note that v.min and v.max can be considered to be shorthand for v.index.min and v.index.max. This is particularly useful when dealing with conformant array parameters.

Function Calls

<function call> ::=  <reference> [ <expression list> ]

Where a reference to a procedure or function is followed by a expression list, the expression list serves as the actual parameter list for a call. For procedures, the value resulting from the reference is void. No further use can be made of it. For functions, the value is the return value of the function and the type is the type of that return value. In the event that this value is of a record, pointer or array type, additional reference operations are legal. Calls are executed as described under the heading Procedure Calls.

Note that the grammar given above is misleading. A pure context-free parser cannot distinguish between a procedure call, a function call, a constructor call, or an array reference. The distinction between these requires access to semantic attributes of the identifiers involved.

Constructor Calls

<constructor call> ::=  <reference> [ <expression list> ]

When a reference is followed by an expression list and the reference names a type, the reference is used to construct an instance of that type. If all of the expressions are constants, the instance is a constant. If any of the expressions are not constant, the instance is a final variable. That is to say, this form may not be used on the left-hand side of an assignment statement. The rules for the expressions depend on the type as follows:

Note that none of the above forms allow unsafe conversions, for example, conversion of an integer to a pointer or conversion of a pointer to one type into a pointer to another type.

Note that the grammar given above is misleading. A pure context-free parser cannot distinguish between a procedure call, a function call, a constructor call, or an array reference. The distinction between these requires access to semantic attributes of the identifiers involved.

Note that the distinction between a type reference, as required to create an instance of a type or to rename a type, and a constructor with an empty expression list is determined by context. Both are, syntactically, just a type name.

Array References

<array reference> ::=  <reference> <expression list>

When a reference is followed by an expression list and the reference names a variable of an array type, the reference is taken to refer to an element of that array. If the reference names a variable of type array, the expression list will be asked to deliver a single value of the index type of that array, and that value will be used to select an element of the array. In the event that the referenced array element is itself an array, additional elements are permitted in the expression list. As a result, a(x,y) is equivalent to a(x)(y).

Note that the grammar given above is misleading. A pure context-free parser cannot distinguish between a procedure call, a function call, a constructor call, or an array reference. The distinction between these requires access to semantic attributes of the identifiers involved.

Expression Lists

<expression list> ::= "(" <expressions> ")"
                   |  "[" <expressions> "]"
                   |  "{" <expressions> "}"
<expressions> ::= <expression> { [ "," ] <expression> }

An expression list following a reference to a procedure or function will construct the actual parameter list for a call to that procedure or function. In that case, the number of expressions found in it must match the number of formal parameters in the procedure or function declaration, and each of the successive actual parameters must be of a type compatible with the declared types of the formal parameters.

In the case of parameters passed by value, the actual parameter must be assignment compatible with the formal parameter. See Assignment

In the case of parameters passed by reference, the actual parameter must be of the same type as the formal parameter, or of a subtype, as follows:

An expression list following a reference to an array must begin with an expression that has a type that is assignment compatible with the array's index type. If there are additional expressions in the expression list, then the array's element type must itself be an array type and the next expression is taken as an index into the sub-arrays of the first array. As a result, given that a is declared as an array of arrays, a(i,j) is equivalent to a(i)(j).

The Standard Prologue

The following Kestrel code produces definitions that can be considered to be a standard prologue for every Kestrel program, although entering these definitions as code does not necessarily define values exactly equivalent to the predefined values.

range: exception;

boolean: type enum( false, true );

NUL:  const "@"-64; -- for these, character in quotes is
SOH:  const "A"-64; -- the control code used to type it.
STX:  const "B"-64;
ETX:  const "C"-64;
EOT:  const "D"-64;
ENQ:  const "E"-64;
ACK:  const "F"-64;
BEL:  const "G"-64;
BS :  const "H"-64;
TAB:  const "I"-64;
NUL:  const "J"-64;
VT :  const "K"-64;
FF :  const "L"-64;
CR :  const "M"-64;
SO :  const "N"-64;
SI :  const "O"-64;
DLE:  const "P"-64;
DC1:  const "Q"-64;
DC2:  const "R"-64;
DC3:  const "S"-64;
DC4:  const "T"-64;
NAK:  const "U"-64;
SYN:  const "V"-64;
ETB:  const "W"-64;
CAN:  const "X"-64;
EM :  const "Y"-64;
SUB:  const "Z"-64;
ESC:  const "["-64;
FS :  const "\"-64;
GS :  const "]"-64;
RS :  const "^"-64;
US :  const "_"-64;
DEL:  const NUL+127;

char: type NUL .. NUL+255;
ASCII: type NUL .. DEL;

-- integer types
int8:   type -128 .. 127;
uint8:  type 0 .. 255;
int16:  type -32768 .. 32767;
uint16: type 0 .. 65535;
int32:  type -16#80000000 .. 16#7FFFFFFF;
uint32: type 0 .. 16#FFFFFFFF;

file: type @ record end;

input:  var file;
output: var file;
errors: var file;

putchar:   procedure( c:char, f:file ) external;
getchar:   function char ( f:file ) external;
eof:       function boolean ( f:file ) external;
putstring: procedure( ref s:array of char, f:file ) external;
getstring: procedure( ref s:array of char, f:file ) external;

Note that NUL is not null. These may both be represented by the value zero, but they are of strictly different types. Note also that all of the character constants given above are defined by quoted character constants, which are themselves single element arrays of characters. Furthermore, the base type of these character constants is char, so the type char must already be implicitly defined and cannot be defined in terms of NUL as it is above. This is an example of why the above declarations cannot simply be read as some kind of prefix to every user program, but must be constructed internally by the compiler before it begins to process any textual input.

A similar warning applies to all of the definitions of the integer types. Each of these type definitions is given in terms of integer constants. The type of each of these integer constants is implicitly the underlying integer type, which must already be defined in order to process these definitions.

Note that putchar, getchar and eof correspond to fputc, fgetc and feof in the C standard library. The parameters have the same meanings and appear in the same order; a small stub may be required in the Kestrel support library to change the name of the function.

The putstring and getstring procedures correspond to fputs and fgets in the C standard library, but the calling sequence for passing arrays by reference in kestrel is not compatible with that of C, so the function will likely require an interface routine.