Exam 1: Midterm

Solutions and Commentary

Part of the assignments for 22C:169, Spring 2010
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Grade Distributions

Exam

          X               X
          X           X   X X
      X X X     X   X X   X X X     X
____X_X_X_X_X___X___X_X_X_X_X_X___X_X_________X_____X_____________
 . 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10. 11. 12. 13. 14. 15. 16. 18

Homework

                                                              X
                                                              X
                                                              X
                                                              X
                                                              X
                                                            X X
                                                            X X
                                    X     X       X   X     X X
______X___X_____X_____X_______X_X_X_X___X_X_______X_X_X_X___X_X_X_
 . 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25

Overall

mean   = 27.69                            X
median = 28.2                           X X
                      X   X     X X X   X X   X X   X
____________X_______X_X_X_X___X_X_X_X_X_X_X_X_X_X_X_X_____X___X___
 . 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30. 32. 34. 36. 38. 40
       D               C               B               A

Solutions and Commentary

Background: Consider the problem of testing a program prog that reads all of its input from standard input. Consider the following test script, under a Unix system:
```
 > fuzztest | prog
```
The program fuzztest outputs an infinite string of random 8-bit bytes. On the average, each possible byte value occurs 1/256 of the time. The technical term for this type of test is a fuzz test.
a) What major potential behaviors of prog does this fail to test. Hint: When you launch an application, what are the different ways you can communicate with that applicaiton. Are all of these tested? (1 point).

First and foremost, this only tests standard input. If prog accepts parameters or opens files, this test ignores them.
9 did well, 6 more mentioned parameters or injection without saying enough to connect the issue with the problem -- in fact, for example, this fuzz test will test some injection vulnerabilities involving input from standard input. 5 more got partial credit for intelligent but non-specific comments about the limitations of fuzz testing.

b) Suppose you were testing a sort utility. What is wrong with this approach? Hint: Think about both the nature of sorting, on the one hand, and the features you want to find in a sort utility. (1 point).

The input stream is infinite, but the sort program cannot really do anything until it has a finite set of data to sort, regardless of whether it sorts characters, lines of text, or whatever.
Only 3 did well and only 6 earned partial credit. Many commented, uselessly, about the input being syntactically invalid to be sorted, while, in fact, any decent sort program should survive such syntactic problems. As an aside, note that the Unix sort utility, by default, sorts lines of text into alphabetical order, and it is generally happy to sort random nonsense formed into lines of random length.

c) Give criteria by which you would judge that an application has passed a fuzz test. (1 point)

The program should not hang or terminate with an unhandled exception, but should either terminate normally or with sensible error messages.
This was easy. 21 did well and there were 3 more who earned partial credit.
Background: There are two general classes of objects in programming languages. For the purpose of this question, we'll call them first and second class objects. First-class objects are free-floating, referenced only by a handle or pointer, and not embedded within any other object. A second-class objects exists as part of some other larger object and is referenced as components of that larger object. If it is possible to create a handle or pointer to the second-class object, this handle or pointer is itself a second-class pointer. All objects in Python are first class. Native integers in Java are second class, although Java also supports first-class integers, if you want to use them. In C, most objects are second class, but programmers may create first class objects.
a) Explain how the presence of second-class objects naturally leads to the possibility of buffer-overflow attacks. (1 point)

If all objects are first class, it is hard to predict the relationship between different objects. In contrast, second-class objects are embedded in fixed locations within another objects, so an attack that overflows a second-class object can have a predictable effect on the enclosing object.
9 did well, there was very little partial credit. Most good answers were more specific to buffers embedded in activation records on the stack. Wrong answers frequently suggested serious problems with fundamental prerequisite material, as elaborated on in the comments on part b.

b) In the successful buffer-overflow attack demonstrated in class, the buffer was a second-class object in some larger object that was itself a second-class object in some yet larger first-class object. Identify these larger objects, their relationship to the buffer and to the object that was attacked. (1 point)

The buffer was embedded in an activation record (or stack frame) that was itself embedded in the stack.
2 answered well, 9 earned partial credit. 6 left this question blank. Among wrong answers, a large number demonstrated that they were seriously confused about fundamental material they should have learned in an elementary computer architecture or assembly language course. Objects are represented in memory as pure data, with no embedded code. Even second-class objects are not embedded in code, except if they are constant.
Background: Most of the computers we discuss use Von Neumann architectures where the code and data are in the same address space. An old alternative to this is the Harvard architecture, where code and data are in distinct address spaces. The code address space in a Harvard architecture is frequently read-only, while all of the RAM is in the data address space. Obviously, with a Harvard archtecture, there is no way to inject machine code, but some classes of attacks may still be open.
a) In the context of a Harvard architecture, how could an attacking program set an arbitrary value in the program counter. (1 point)
A Harvard architecture still permits buffer overflow attacks that change the return address when it is stored in the activation record of a vulnerable function.

9 did well, 10 more earned partial credit by saying things like "attack the return address" without saying how this might be possible. Many wrong answers made strange assumptions about operating systems (none were mentioned in the problem statement) or explicitly stated that they would change code in RAM, even though, by the definition given, all code on a Harvard machine is in ROM.

b) Under what circumstances could injecting an arbitary value into the return address on a Harvard architecture allow some kind of injection attack? (1 point)

If the application includes any kind of interpreter that can interpret some kind of code in RAM, then there may be attacks that can inject arbitrary code for interpretation by this interpreter.
1 did well, the remainder earned no credit. Many wrong answers attempted to change the ground rules by assuming that the CPU could be made to execute codein RAM or by introducing some kind of operating system.
Background: Consider a computer system where system calls are handled as follows:
- Our system has virtual memory. Given an address a, the functions page(a) and word(a) extract the page field of the address and the byte-in-page field.
- Each application runs in its own address space, with the MMU turned on in order to define the properties of that address space. You can assume that the global variable pagetable points to the page table, which is an array of structures, each with fields rights, the access rights, and frame the page frame holding that page. The contents of a page frame can be addressed as an array of bytes. The minimum access is zero, while the maximum is READ|WRITE|EXEC.
- System calls are all done by causing traps from the running application program. Within any handler, the varialbe SAVEAREA points to a structure with fields pc and sp as well as others that do not concern us.
- The code to return from a system call restores the registers, turns on the MMU, and resumes the calling program.
- Users pass parameters to system calls by pushing them on the user stack. The stack pointer always points to the last byte pushed on the stack, and the stack grows by decrementing the stack pointer.
- System calls operate on their own stack within the operating system's memory region.
- As generally true in C, sizeof(int) and sizeof(void*) tell you the sizes of integers and pointers in bytes.
Consider the Unix system call read(fd,buf,len) with three parameters.
a) Within the code for the system call, how would the system compute the physical address of the first byte of the parameter len as lenpa. This is a simple one-liner in C using the information given above. (1 point)
```
lenpa = pagetable[page(saveeara->sp)].frame + word(savearea->sp);
```
None gave perfect answers. 26 earned no credit. Among those who earned partial credit, 5 did not use the page table, 5 (not necessarily the same) did not use the register save area, a few failed to use either the page or the word field of the virtual address.
b) Assume that lenpa has been calculated as above, and assume that the system used mylen=*(int*)lenpa to get the length parameter. This will usually work, but once in a while, even if the user had legitimate access to the stack, it will fail spectacularly. What mistake did the system programmer make? (1 point)

If the word holding the parameter happens to straddle a page boundary, this will fail.
4 did well. Among wrong answers, some were apparently very confused, commenting that the asterisks in the *(int*) construct were wildcards in the Unix shell language. This was all C code and it is an extraordinarily strange reading of the problem to imagine that the shell has any relevance here!

c) Since the user just pushed the parameters on the stack, there is no need for the system to check the access rights to the memory pointed to by the user's stack pointer. Is this statement true? If not, why not? (1 point)

The statement is false. A malicious program could set the stack pointer to a random memory address and then force a system-call trap, so the system must defend itself against this.
21 recognized the statement as false but gave reasons that were far from convincing. As a result, nobody earned full credit.

d) Assume that bufpa has been calculated more or less as suggested in part a, and assume that the system correctly computed mybuf, the virtual address of the user's buffer, avoiding the error that was made in part b. How would the system compute the address of the first byte of the user's buffer? (1 point)
```
bufad = pagetable[page(mybuf)].frame + word(mybuf);
```
1 did well, 3 made errors comparable to those discussed in part a that lead to partial credit. Of the rest, who earned no credit, 10 left this blank.
e) Explain why a correct and safe system is likely to apply the translation described in part d to the address of each successive byte in the user's buffer. Part b hints at the problem. (1 point)

If the buffer spans a page boundary, the computation needs to be redone as the new page is entered.
2 did well, the rest earned no credit, including 13 who left this blank.

f) What access rights should the system check for to the pages making up the user's buffer? (1 point)

All that is needed is write access.
15 did well, including those who said read-write access. 3 got partial credit for saying read-write-execute. The remainder earned no credit.
Background: When the exec family of system services is used in Unix, the system looks at the first bytes of the file being exeucted to see what interpreter to use. One magic number launches the physical CPU, while another (the # character) causes exec to launch the named interpreter (frequently but not always a shell).
Suppose we decided to permit the SUID and SGID bits to be operational when launching an interpreter. One reason that this is not done in most Unixes is that the Unix shells are almost all very vulnerable to injection attacks. There are other reasons we are interested in here.
Suppose bob launches myscript, owned by carol which begins #/bin/yourinterp, where the interpreter is owned by alice
a) Given that myscript has rights rwx-----x and /bin/yourinterp has rights rwx-----x, what user's domain does myscript execute in? (1 point)

Bob, because nothing changed the user ID after bob launched the script.
14 did well, and an approximately equal number earned no credit. Nobody left this blank.

b) Given that myscript has rights rws-----x and /bin/yourinterp has rights rwx-----x, what user's domain does myscript execute in? (1 point)

Carol, because the script has the set-UID bit set, and the interpreter does not change anything.
12 did well and 3 got partial credit for saying Alice because that wrong answer made much more sense than Bob.

c) Given that myscript has rights rwx-----x and /bin/yourinterp has rights rws-----x, what user's domain does myscript execute in? (1 point)

Alice, because the interpreter has the set-UID bit set, and the script does not change anything.
9 did well, and 4 got partial credit for saying Bob, based on the author of the script's natural assumption that the rights set by the author would matter.

d) Given that myscript has rights rws-----x and /bin/yourinterp has rights rws-----x, is there a problem? (1 point)

The problem is that both the script and interpreter have the SUID bit set and it is not obvious which one should win.
7 did well. 2 more earned partial credit for answers that presumed that both the interpreter and the script would somehow set the ID (only one can succeed). 5 more earned partial credit by stating that there was a conflict without an explanation or with an incorrect explanation.
Background: Consider the following list of files and access rights on a Unix system:
```
FILE OR DIRECTORY   OWNER  GROUP RIGHTS
/bob/dropbox/       bob    nice  rwx-wx-wx
/bob/public/        bob    nice  rwxr-xr--
/alice/dropbox/     alice  nice  rwx-wx-wx
/alice/public/      alice  nice  rwxr-xr--
/ted/dropbox/       ted    bad   rwx-wx-wx
/ted/public/        ted    bad   rwxr-xr--
```
The nice group contains bob and alice. The bad group contains ted.
a) Give the access matrix describing this system. (1 point)

bob/
dropbox bob/
public alice/
dropbox alice/
public ted/
dropbox ted/
public
Bob rwx rwx -wx r-x -wx r--
Alice -wx r-x rwx rwx -wx r--
Ted -wx r-- -wx r-- rwx rwx

b) Give the set of access control lists describing this system. (1 point)
```
/bob/dropbox/    Bob:rwx Alice:-wx Ted:-wx
/bob/public/     Bob:rwx Alice:r-x Ted:r--
/alice/dropbox/  Bob:-wx Alice:rwx Ted:-wx
/alice/public/   Bob:r-x Alice:rwx Ted:r--
/ted/dropbox/    Bob:-wx Alice:-wx Ted:rwx
/ted/public/     Bob:r-- Alice:r-- Ted:rwx
```
c) Give the set of capability lists describing this system. (1 point)
```
Bob    /bob/dropbox/:rwx   /bob/public/:rwx
       /alice/dropbox/:-wx /alice/public/:r-x
       /ted/dropbox/:-wx   /ted/public/:r--

Alice  /bob/dropbox/:-wx   /bob/public/:r-x
       /alice/dropbox/:rwx /alice/public/:rwx
       /ted/dropbox/:-wx   /ted/public/:r--

Bob    /bob/dropbox/:-wx   /bob/public/:r--
       /alice/dropbox/:-wx /alice/public/:r--
       /ted/dropbox/:rwx   /ted/public/:rwx
```
For problem 6 as a whole, 3 earned full credit, 8 earned 2/3 credit or more, and 6 earned 1/3 credit or more. A significant number who did not complete this problem ran out of time, earning full credit for as much as they completed. Others had serious difficulty. A significant number did not seem to know how to lay out an access matrix, either adding the groups to the matrix (they play no part in the result) or trying to include the entire 9-bit Unix style access rights field in each intersection of the matrix. Access matrices with only 3 rows and 3 columns showed up too frequently.

	bob/ dropbox	bob/ public	alice/ dropbox	alice/ public	ted/ dropbox	ted/ public
Bob	rwx	rwx	-wx	r-x	-wx	r--
Alice	-wx	r-x	rwx	rwx	-wx	r--
Ted	-wx	r--	-wx	r--	rwx	rwx