Exam 1: Midterm

Solutions and Commentary

Part of the homework for 22C:60, Fall 2007
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Exam Distribution

Median = 4.6
Mean   = 4.49           X   X
              X X X     X   X
        X     X X X     X   X
 _______X_____X_X_X___X_X_X_X_____X_____X_________________________
   0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15

Machine Problem Distribution (MP1-MP2)

Mean   = 7.16                             X  
Median = 7.5                X X           X X
                        X   X X   X       X X
 ___X_________X_____X___X___X_X___X_X_X_X_X_X__
   0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10

Homework Distribution (HW1-HW4)

                                                  X
Mean   = 10.45                                    X
Median = 11.0                                 X   X
                                              X   X
                                              X   X X
                                        X     X X X X
 _________________________X_X_____X_____X_X___X_X_X_X_
   0 . 1 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12

Overall Distribution

Mean   = 22.09
Median = 23.0                               X     X 
                                  X       X X   X X X X   X
 _________________X_____________X_X_______X_X___X_X_X_X_X_X_X_____X____
   0 . 2 . 4 . 6 . 8 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32
                    - - D D + + - - C C + + - - B B + + - - A A + +

Exam and Solutions

  1. Background: Consider this shell script:
    #!/bin/tcsh
    #  examscript -- a shell script for the exam
    @ number = 1
    foreach parameter ($argv)
            echo $parameter > file$number
    	@ number = $number + 1
    end
    

    For the purpose of this question, consider a substantial shell command to be one that has effects that are observable from the outside world, either because they produce output or because they make changes that persist after the script completes.

    a) Ignoring commands pertaining to control structures, what substantial commands does this script execute if called with examscript a b c

    echo a > file1
    echo b > file2
    echo c > file3
    

    This was based on HW2 problem 1, and the shell script given in the problem statement is only slightly different from the solution distributed on the web. Many students had difficulty with the concept of "substantial" command as defined here.

    6 did well, 5 earned no credit, partial credit was available for merely underestanding that the echo command was the only substantial command, and there were small penalties for failure to get the number of iterations right (some wanted a zeroth iteration).

    b) What are the observable effects of these commands.

    Files named file1, file2 and file3 are created, containing the text a, b, and c, respectively.

    5 did well, 4 earned no credit. Half credit was available for those who thought the echo commands output the parameters to the screen. Partial credit was available for merely mentioning the creation of files.

  2. Background: Suppose we added program parameter type checking to Unix as an upward compatable extension. Object files are extended with a new type of segment, the parameter type segment. Launching a program from a file with such a segment causes the parameter types to be checked before launch, while launching a program that lacks this segment behaves just like current Unix.

    The simplest format for the parameter type segment would have it consist of a list of integers, one per parameter, where the integer gives the parameter type.

    A more complex format would check each parameter by matching it against a regular expression.

    a) Which of the following parameter types would cause problems for the simple system? A valid filename, the name of a data file, an integer, an option to the current command, a real number, a user-name. Explain, for each one that causes problems, the nature of the problem.

    The name of a data file could pose problems because it is difficult to determine if a file contains data, particularly if there is an application dependent notion of data type.

    Options to the current command cause a bigger problem, since ther set of possible legal option values depends on the program, and the set of programs is open ended.

    In sum, to go well beyond a minimum answer, the problem with the integer encoding scheme is that it only works for a closed finite set of program options. We can imagine an encoding like this:

    1. Integer number
    2. Real number
    3. Syntactically valid file name
    4. Name of an already existing file
    5. Syntactically valid user name
    6. Currently registered user name
    7. Arbitrary string

    This could be extended, for example, to include types for file names ending with specific extensions.

    Only 2 did well, while 14 earned no credit. Many students seemed to have no idea how to interepret the specification "where the integer gives the parameter type".

    b) Which of the following parameter types would cause problems for the regular-expression system? A valid filename, the name of a data file, an integer, an option to the current command, a real number, a user-name. Explain, for each one that causes problems, the nature of the problem.

    Regular expressions cannot determine if a file exists or if it is a data file.

    Regular expressions cannot determine if a potential user name is the name of a real user.

    In sum, to go beyond a minimum answer, regular expressions can only test the syntactic validity of the parameter, and as a syntax checking mechanism, they can only check that the parameter conforms to simple constraints.

    Here are some example regular expressions, using the Unix regular expression notation, with comments indicating what they recognize:

    • [0-9]+ strings of one or more digits (integers)
    • [0-9]+\.[0-9]+ numbers with a decimal point
    • [A-Za-z][0-9A-Za-z]* a letter followed by letters or digits
    • [0-9/A-Za-z]*.bin file names that end in .bin
    • -big|-normal|-small a specific list of literal options

    Only two did well here, 15 earned no credit. It appears that the biggest problem was lack of knowledge of what a regular expression is. Regular expressions are not just tricks in the Unix grep command, they are a fundamental concept in computer science. A literal string is a regular expression, two regular expressions concateated is a regular expression, a regular expression iterated zero or more times is a regular expression. The choice between two regular expressions is a regular expression.

    c) Suppose we combined the simple system (a finite set of pre-programmed checks) with the regular expresson scheme. What useful checks could this perform that could not be performed by either simple system alone?

    This combined scheme can both apply a finite set of programmed checks such as asking whether a user or file exists and also checking open-ended syntactic constraints. For example, checking not only that a particular file exists, but also that the name of the file ends with a particular (application dependent) suffix.

    To go beyond the minimal answer, this scheme greatly simplifies the finite list of programmed checks that may be associated with each parameter, since the regular expression mechanism can easily check such details as syntactic correctness of integers, real numbers or file names, eliminating the need for many of the special cases that would be required if there were just a finite list of parameter types.

    Only 1 got this right, while 15 earned no credit. The problems people had with parts a) and b) were obviously compounded in part c.

  3. Background: Consider a computer system where a call to prog=load(file) simply loads the contents of the indicated file into any block of memory large enough to hold that file and then returns the address of the program. If this is C code and the variable is declared correctly, the call prog(argc,argv) could be used to run the loaded program.

    a) What constraint is there on the addressing modes used within the loaded program for references to other parts of the loaded program?

    If the program is "simply loaded", eg, just read without going through a relocating loader, then all the addressing used within the loaded code to reference other parts of the loaded code must be position independent code using relative or self-relocating forms of addressing.

    8 earned full credit, 8 earned no credit. Performance here was disappointing, as many of the wrong answers suggested that the term addressing mode was unknown, yet a course on computer organization or architecture of some kind is a prerequisite to this course, and that is a standard topic. Also, several answers suggested that there was a problem with storing operands in the program segment, yet essentially all post 1970 computer architectures separate programs from data, storing data either on a heap or on the stack.

    b) Give a concise English description of the type of the variable prog. Note: A syntactically correct C declaration is not sufficient, since what is requested is an explanation of the type.

    The variable prog is a pointer to a function that returns void and takes two arguments, an integer and an array of pointers to char. Although not required, here is the C declaration.

    void (*prog) (int argc, char *argv[]);
    

    This definition is so bizarre that you really can't tell that it's defining prog and not something else. It is, however, well defined standard C and C++.

    4 got this right, 9 got it wrong, while a number of others made other errors. Declaring prog as simply a void pointer or a character pointer, for example, would not allow the function to be called.

  4. Background: Unix files are manipulated (primarily) with the read(), write() and seek() operations. For now, we will focus just on read(fd,buf,len), where fd is the integer number of an open file. buf is the address of the buffer into which data is to be read, and len is the number of data bytes to be read.

    Deep in the system, each device driver must include a routine (a method) to read from the device, and for disk devices, to read from a file on that device. If dd points to a device driver description record or an open file description record, dd->read(dd,buf,len) will read that file.

    a) From the above, you can infer a minimal data structure that allows a user call to read() to map to the a call to the appropriate method of the appropriate driver. Concisely explain this structure.

    The integer file descriptor fd must map to a pointer to a device descriptor dd. Device descriptors must then contain pointers to each applicable method.

    int read( int dd, char * buf, int len)
    {
    	struct device * dd = file_table[dd];
    	return dd->read(dd, buf, len);
    }
    

    1 did well here, 19 had wrong answers. Common errors involved assuming some kind of case-select construct or completely forgetting that the read system call takes an integer file descriptor. An attempt to do MP2 should have driven home the fact that read takes an integer.

    b) Explain why dd is mentioned twice in the C code dd->read(dd,buf,len).

    The first use, dd->read, is used to select the read method of the device descriptor object. The second use read(dd,buf,len) is used to inform this method what specific device it is being used to operate on, since the device descriptor contains not only the pointers to the methods, but also instance variables the device interface may require, such as pointers to blocking-deblocking buffers, pointers to device interface registers, and so on.

    6 got this right, 7 earned no credit. Partial credit was given for a coherent description of either half of the answer, or for poor but broad attempts at addressing both halves of the answer.

    c) The read() system call returns the number of bytes actually read. Explain why the value of num returned by the call num=(fd,buf,len) may differ from the value of len passed to the call. Warning: There are two distinct reasons this could occur.

    Reason one: The end of file was reached before filling the user's buffer to the limit.

    Reason two: There was a device error that terminated the read early.

    Reason three: The file was configured for line-at-a-time reading, and the user hit the return key before the line had filled the buffer.

    Reason four: The file was configured for non-blocking input, and there were fewer characters in the input queue than would fill the buffer.

    5 did well here, 8 earned no credit. This was dissapointing, since a successful solution to Machine Problem 2 was only likely if someone had studied read and the O_NONBLOCK option. The first two reasons given above were the most popular.

  5. Background: Consider an output handler for an old-style printer connected to the parallel port. This port has an interrupt-enable bit and a ready-bit in the status/control register, and a data register. It is a low-performance device on a system with many higher performance devices.

    The driver includes a fairly large output queue and an interrupt service routine as well as the suite of routines called directly by application programs. The output queue is constructed from a linked list of buffers, each with a capacity of several hundred bytes. Buffers are allocated from a large system buffer pool that is also used for other devices. When a new buffer is needed, enqueue calls b=getbuf() which returns a pointer to a buffer, or null if there is no buffer available. When a buffer has been emptied, dequeue calls putbuf(b) to return the empty buffer to the buffer pool.

    a) When and by what software component(s) should the printer's interrupt-enable bit be reset?

    The enable bit should be reset when there are no characters enqueued in the output queue. This is done by the interrupt service routine.

    4 did well here, 9 did badly. Partial credit was offered for correct answers to either half of the question. There were two common problems: First was confusing the interrupt enable bit with the interrupt request bit, while the second involved the unreasonable assumption that each buffer holding several hundred bytes represented a print job. This second assumption is totally unwarranted by the problem statement and quite unreasonable unless the documents to be printed are only several hundred bytes long -- perhaps it would work for theater tickets.

    b) When and by what software component(s) should the printer's interrupt-enable bit be set?

    The enable bit should be set whenever a character is enqueued in the output queue. This is done by the write() routine in the device driver. (A more intelligent solution might be to set the interrupt enable bit only when write() discovers that the queue had been empty, but the logic needed to be selective probably just adds computational overhead.)

    Two got this right, 15 earned no credit. Partial credit was given for correctly answering either half of the question. The answeers to parts a and b do not depend on whether the queue is implemented as a simple bounded buffer, a linked list of characters or a linked list of larger buffers. So long as we are dealing with a parallel port that handles character-at a time output, the answer remains the same. In short, changes in the implementation of the abstraction don't change anything in the context of the use of the abstraction.

    c) A simple dequeue routine on a bounded buffer needed just two varialbes, head and tail, plus a single array, call it buf, the buffer itself. Describe the variables needed to implement the new linked-list buffer used here.

    1. headbuf -- points to to the first buffer in the queue
    2. headchar -- points to to the first character in the first buffer
    3. tailbuf -- points to to the last buffer in the queue
    4. tailchar -- points to to the last character in the last buffer

    1 did well, 12 earned no credit. A surprising number suggested data structures suggesting an elaborate circular linked list. Many of these were worth partial credit, although most of these answers implied algorithms far more comlex than a correct answer.

    d) After finding that the buffer is not empty, the dequeue routine on a simple bounded buffer dequeued a character with the code ch=buf[head++]. Describe the corresponding code for the new more complex linked list of buffers, using the variables you described in part c. Don't forget to complete the job far enough that the next call to dequeue would detect an empty queue, and in this case, the next call to enqueue would make the queue non-empty again.

    ch = headbuf->buf[headchar++];
    if ((headbuf == tailbuf)&&(headchar == tailchar)) {
            /* we are working on the last and have consumed the last character */
            putbuf(headbuf);
    	headbuf == NULL;
    	tailbuf == NULL;
    } else if (headchar > buflim) {
            /* we consumed the last character of a full buffer */
    	struct buffer * temp = headbuf;
            headbuf = headbuf->next;
            putbuf(headbuf);
            if (headbuf == NULL) tailbuf = NULL;
    }
    

    Nobody did well here, 19 earned no credit. A sensible answer here was contingent on a sensible answer to part c, so this is not surprising. The most remarkable wrong answers involved loops that attempted to output the entire "job" in response to a single interrupt.

    e) Does the code for part d include any critical sections were interrupts must be disabled to prevent trouble?

    There are indeeed critical sections here, but there in no need to guard them because this code is exeucted within the interrupt service routine. If the interrupt service routine itself is interruptable, it will only be interrupted by higher priority interrupts; these involve different devices and therefore different data structures, so no special care is required.

    4 did well here, 18 earned no credit. Many skipped this question -- it was the last on the exam, so that is not really a surprise.