11. SMAL Installation and Maintenance

Part of the SMAL Manual
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Index

General
Installation
Efficiency
Adding Operations

11.1. General

The SMAL32 assembler was originally written in Pascal with the intent that it be easily transportable. The current version is written in C, the result of passing the original through ptoc, a widely available Pascal to C converter. The code was then minimally edited to free it from the ptoc runtime library, and further edited in the years that followed to add new features. The revision history at the head of the source file documents this history. The following sections introduce the problems which will be encountered in installing it on a new machine, and in removing machine independence in order to improve efficiency.

The SMAL32 assembler is a two-pass assembler where the function onepass() is called twice, once to assemble each pass. The assembler uses a top-down parsing model with one lexeme of lookahead, where the lexical analyzer is advanced one lexeme by each call to nextlex(). There are two symbol tables, optab which holds opcodes, including macro names, and symtab holding labels and other defined symbols. Both tables are hashed, and the actual text of identifiers and strings is stored in strpool, the string pool. The string pool and opcode tables are initialized before the first call to onepass(). On first encounter with any opcode name or symbol, new entries are added to the appropriate table and to the string pool. The string pool is filled from the low end up to poolpos with text of symbols. The high end of the string pool, down to poolsp, holds a stack that is used for macro actual parameters.

11.2. Installation

The interface between the assembler and the file system of the host machine will probably have to be changed if the assembler is moved out of the Unix environment. In order for the USE directive (see Section 7.1) to be properly executed, the assembler must run in an environment where it has direct access to the file system.

getfiles() is used to encapsulate the problem of getting the name of the source file from the user and using this to synthesize the names of the object and listing files. These textual file names are stored in the global variables infile[0], outfile, and objfile, which are used, respectively, to open the text files inp[0], output, and obj. The text files inp[1] to inp[3] are opened by insert(), which is called whenever a USE directive is encountered. The text files inp[i] can thus be considered to be a short stack of files used for input. If more levels of nested USE directives are to be allowed, this stack must be enlarged, using the defined constant maxgetl.

All matters concerning listing generation are encapsulated within the listline() and newpage(); these will require changes if the system being used does not respond to an ASCII FF character to initiate a form feed in printed listings. Typical changes might include the introduction of Fortran-style carriage control characters at the start of each listing line, or the printing of sequences of blank lines instead of special page feed characters.

In addition to converting the SMAL32 assembler to use the appropriate file system, the sizes of the various static data structures must be specified. These are all set with defined constants, but the values of the constants in the machine independent version are all set low enough that the static data structures are easily filled for testing purposes. The constants symsize, opcodes, and linelen may be set as large as desired with no change to other parts of the assembler. The constant poolsize is limited to 32767 by the code for pushint() and popint(), which are used to push and pop pointers into the string pool on the stack within the pool. A number of other constants can be modified to do such things as change the listing format or change the number of allowed macro parameters. Limits on the values of these constants are documented in comments on their definitions.

In some environments, it may be reasonable to add an EBCDIC directive and a supporting translation array in order to allow use of the EBCDIC character set. When this is done, quoted strings should retain their ASCII interpretation as values unless some special new quote mark (perhaps `) is used to indicate an EBCDIC interpretation.

11.3. Efficiency

The machine independent version of the SMAL32 assembler was written to be easily moved to a new machine, but this does not imply that it will run efficiently in that environment. In fact, transportability and efficiency are in direct conflict, especially in the areas of bit packing and integer arithmetic. If efficiency becomes an issue, the following steps outline the order in which different changes to the assembler should be made to remove machine independance and speed things up:

The lexical analysis routine number() used to parse numeric constants has a significant amount of cumbersome code to allow it to detect and report 32-bit using a 32-bit unsigned arithmetic model that ignores overflow. If the SMAL assembler is moved to on a machine with a larger word size, this code can be significantly simplified.

11.4. Adding Operations

To add additional directives or machine instruction symbolic opcodes to the assembler, note the following:

a) Each symbolic opcode or directive for which there is a distinct assembly rule has its own opcode type that must be included in the enumerated type optypes. (Instructions that share the same assembly format may share an opcode type.)

b) Each symbolic opcode or directive must be included in the opcode table; this is done by including a line in opinit() calling the procedure op() to make the definition and bind the symbolic name to an opcode type and an opcode value.

c) For each opcode type, there must be an entry in the case added in onepass() to parse the operand field of opcodes of that type and generate appropriate object code. The code generated may depend on values in the operand field and on the value associated with the opcode itself.

d) Operand field parsing is supported by expresbal() to parse an expression and force balanced parentheses, getcomma() to read and skip a required comma, and nextlex() to read the next lexeme.

e) Object code is generated by calls to putobj(), specifying the format of the value being generated, the absolute value to be generated, and the relocation base to be used (both of the latter are provided by the expression parsing routines). If new formats are added, it will be necessary to modify both putobj() and the listing routine listline() to handle the new formats; in each case, there are case statements depending on the format.