11. SMAL Installation and Maintenance
The SMAL32 assembler was originally written in Pascal with the intent that it be easily transportable. The current version is written in C, the result of passing the original through ptoc, a widely available Pascal to C converter and then minimally editing the result to free it from the ptoc runtime library. The following sections introduce the problems which will be encountered in installing it on a new machine, and in removing machine independence in order to improve efficiency.
The interface between the assembler and the file system of the host machine will probably have to be changed if the assembler is moved out of the UNIX environment. In order for the USE directive to be properly executed, the assembler must run in an environment where it has direct access to the file system.
The global procedure "getfiles" is used to encapsulate the problem of getting the name of the source file from the user and using this to synthesize the names of the object and listing files. These textual file names are stored in the global variables "infile", "outfile", and "objfile", which are used, respectively, to open the text files "inp", "output", and "obj". The text files "inp" and "inp" are opened by the procedure "insert", which is called whenever a USE directive is encountered. The text files "inp[i]" can thus be considered to be a short stack of files used for input. If more levels of nested USE directives are to be allowed, this stack must be enlarged, using the defined constant "getlevels".
All matters concerning listing generation are encapsulated within the procedures "listline" and "newpage"; these will require changes if the system being used does not respond to an ASCII FF character to initiate a formfeed in printed listings. Typical changes might include the introduction of FORTRAN style carriage control characters at the start of each listing line, or the printing of sequences of blank lines instead of special page feed characters.
In addition to converting the SMAL32 assembler to use the appropriate file system, the sizes of the various static data structures must be specified. These are all set with defined constants, but the values of the constants in the machine independent version are all set low enough that the static data structures are easily filled for testing purposes. The constants "symsize", "opcodes", and "linelen" may be set as large as desired with no change to other parts of the assembler. The constant "poolsize" is limited to 32767 by the code for "pushint" and "popint", which are used to push and pop pointers into the string pool on the stack within the pool. A number of other constants can be modified to do such things as change the listing format or change the number of allowed macro parameters. Limits on the values of these constants are documented in comments on their declarations.
In some environments, it may be reasonable to add an EBCDIC directive and a supporting translation array in order to allow use of the EBCDIC character set. When this is done, quoted strings should retain their ASCII interpretation as values unless some special new quote mark, such as `, is used to indicate an EBCDIC interpretation.
The machine independent version of the SMAL32 assembler was written to be easily moved to a new machine, but this does not imply that it will run efficiently in that environment. In fact, transportability and efficiency are in direct conflict, especially in the areas of bit packing and integer arithmetic. If efficiency becomes an issue, the following steps outline the order in which different changes to the assembler should be made to remove machine independance and speed things up:
The lexical analysis routine for parsing numeric constants has a significant amount of cumbersome code to allow it to handle unsigned 32 bit values on a machine using a signed 32 bit representation. If the SMAL assembler is run on a machine with a larger word size or if it is run on a machine where arithmetic overflow is not checked, this code can be significantly simplified.
A key cause of inefficiency in the SMAL32 assembler is the implementation of 32 bit two's complement arithmetic in a manner which will run consistently on any machine which supports the desired range of integer values. The routines responsible for this are grouped at head of the assembler and may easily be changed to use external routines coded in machine code. On versions of C supporting unchecked 32 bit arithmetic, it may be sufficient to replace "add(a,b)" with "a+b". Although this change will have a significant effect on the speed of these operations, they may not be used frequently enough to justify the change.
To add additional directives or machine instruction symbolic opcodes to the assembler, note the following:
a) Each symbolic opcode or directive for which there is a distinct assembly rule has its own opcode type that must be included in the enumerated type optypes.
b) Each symbolic opcode or directive must be included in the opcode table; this is done by including a line in the procedure opinit calling the procedure op to make the definition and bind the symbolic name to an opcode type and an opcode value.
c) For each opcode type, there must be an entry in the case statement in procedure onepass to parse the operand field of opcodes of that type and generate appropriate object code. The code generated may depend on values in the operand field and on the value associated with the opcode itself.
d) Operand field parsing is supported by the procedures expresbal to parse an expression and force balanced parentheses, getcomma to read an skip a required comma, and nextlex to read the next lexeme.
e) Object code is generated by calls to procedure putobj, specifying the format of the value being generated, the absolute value to be generated, and the relocation base to be used (both of the latter are provided by the expression parsing routines). Currently only one format is supported, word. If new formats are added, it will be necessary to modify both putobj and the listing routine listline to handle the new formats; in each case, there are case statements on the format.
A typical change might involve adding an instruction to assemble a word consisting of two half-words, each of which may be separately relocated. This would involve adding a new directive, perhaps H with two comma separated expressions as operands. The object code generated by H would itself be an H directive (in the same way that the object code generated by W is a W directive), but with less symbolic content in the operands. This would require adding a new format to putobj and listline, and it would require adding modifications to the loader to support the new directive.