Assignment 3 Solutions

Part of the homework for 22C:112, Spring 2009
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Always, on every assignment, please write your name legibly as it appears on your University ID and on the class list! All assignments will be due at the start of class on the day indicated (usually a Friday). The only exceptions to this rule will be by advance arrangement unless there is what insurance companies call "an act of God" - something outside your control. Homework must be turned in on paper and in class! Late work may be turned in to the teaching assistant's mailbox, but see the late work policy. Never push late work under someone's door!

Background: The Unix fork mechanism assumes that each process has its own address space, so that a forked process can be created by duplicating the memory of the original.
Consider a system whith no memory protection, where all memory could be addressed by any process, with no restrictions. Obviously this system would be insecure, but ignore that.
a) When a process is forked, would it be necessary to copy the code? In answering this, you may find it useful to consider each type of address that might be embedded in the code. (0.5 points)

Code may contain absolute addresses of variables. If the variables are copied to a new location, the absolute addresses, must be edited, and therefore, the code in which those addresses are embedded must be duplicated.

b) When a process is forked, what kind of relocation is required? Assume that the stack contains local variables and return addresses, assume that there are local variables that point to linked lists, trees etc. For simplicity, you can assume that each process has its own heap for allocation of dynamic data structures. Do not worry about feasibility, worry hear about what would be needed. (0.5 points)

All absolute addresses of material that is duplicated in the process must be relocated to point to the new location. This includes all pointers embedded in data structures, all return addresses (pointing to code that is duplicated) and all addresses of variables embedded in the code.

c) Now, discuss what information the operating system would need to fork a process. (0.5 points)

For each process, there would need to be some kind of description allowing the system to find all the addresses in the code and data. Perhaps a bit-vector with one bit per address in the process indicating which addresses contain addresses.
Background: Consider trying to design a better program launching mechanism than the exec() services of Unix. Specifically, focus on the goal of a general parameter passing mechanism. Our goal is to eliminate the odd idea that the parameters to the main program must be text strings, as described by argv and argc in the C/Unix model. Ideally, what we want is the ability to pass any mix of data types to a main program, just as we can do so within a program.
Suppose we wanted to write a main program that looked like main( int a; char b; widget c) We cannot just call newexec( myprogram, a, b, c), where a, b and c have the appropriate types, because newexec() does not know the types of these parameters.
a) First, we need a way to specify the parameters in the calling environment. We will need to call our new newexec() service with a parameter list that describe the parameters to be passed. Suggest appropriate parameters for newexec() to describe these parameters. (0.5 points)

Imagine, for example newexec( argc, argv, argt ... ) where argc gives the argument count, argv gives the addresses of all the arguments, and argt gives the types. This would be easy for a short list of built in types such as char, int, long int, but the problem is, any short list of types is uninteresting. What we really want, for each argument, is a description of its type, including not only the types of the primitive components, but also how they are arranged in arrays, structures, etc. This would require, for each parameter, a full type description.

b) Second, we need a way for the called program to check that the parameters passed were the right type. If the result is that the called program does not look like main( int a; char b; widget c), as we had hoped, do not be too worried so long as it is possible, in theory, to translate such a high-level specification of the main program interface into your new scheme. (0.5 points)

There are two basic classes of solution. One is to have each binary file associated with a vector of type descriptions (the same kind of data structure suggested for argt in part a). In this case, the system would check the correctness of the types and then pass an array of pointers to the arguments as the receiving program's argv.
The second solution would be to pass the raw data provided by the caller to the new program, and leave it to the new program to make sense of things. In this case, the system library would most likely be augmented with routines the program could call to check that the parameters were of the desired types. This is more flexible than the first alternative, but for most applications, more cumbersome.

c) Finally, consider the consequence of this for the command language. The Unix model of an array of strings made it fairly easy to write a shell. Suggest how the syntax of a shell might allow specification of parameters of a variety of types. (0.5 points)

If we go with a system-mandated affix on each binary file, the shell could read this and use it as instructions for parsing the command line arguments. If we go with letting the receiving program take whatever it gets, then we need a self-evident syntax the shell can use to let it construct arbitrary types. This would be easy for simple types such as char and int, where there are reasonably self-evident notations, but we would need notations for constructing arrays and records that allowed arbitrary parameter types to be constructed.
Note that we need these notations for the system affix also, but in that case, the program being launched tells what it expects, directing the parsing of the parameter list. In the second case, the parameter list must completely describe itself.