CS:2820 Notes, Lecture 41

Consider what happens when you open a file in Java. You write something like this:

File f = new File( "filename" );

From a java perspective, the result is that f is the handle on or pointer to an object of class File, where methods operating on that class do things like reading or writing the file.

It is not that simple! Files are operating system objects. That means that you can open the same file (in this case, a file named filename) whether you are writing your program in machine language, FORTRAN, C, COBOL, C++, Java or Perl. Some of these languages aren't at all object oriented, and some of them are not type safe. C++, for example, is object oriented but it is not type-safe. A C++ programmer can create an object and then trick the programming environment into treating it as an object of a completely different class.

Non-type-safe languages are required for some applications. For example, to implement the storage manager and garbage collector that underly Java, you need a language that allows you to do arbitrary arithmetic on pointers. You can do this in C or C++.

Operating systems on modern computers that have memory-management units (also called MMUs), system objects such as files and timers are created and managed inside a region of memory that is inaccessible to the user program. When you open a file, the operating system allocates a file object in this system memory, and then it hands the user program a handle for this object.

If you are writing a Java program, the Java run-time system allocates a Java object of class File in your user space, but this object is not the system file object. Instead, it is an interface or wrapper around the system object. One field of this object is the handle for the system object, and the methods of class File serve primarily to translate Java's file access methods into those of the underlying operating system.

The problem with this idea is that the file handle is in the user's address space. If you are writing your code in assembly language, FORTRAN, C or C++, none of which are type safe, your program could freely use any memory address as a file handle, causing the operating system to access random areas of its own memory and try to use them as files. Protecting against abuse of this is very difficult. What if, by chance, a malicious user manages to invent the memory address of a real file object in the operating system's address space that does not belong to that user?

This scheme allows the user to mess up, but the worst that the user can do is access a file that that user already is permitted to use.

While most of us think of cookies as a phenomonon of the World-Wide Web, where web sites leave cookies on your machine, forcing you to carry information about yourself on behalf of the web site (such as tracking information or personal browsing histories), the term appears to have originated in the documentaiton for the standard library of the C programming language. Specifically, the library includes services called called ftell() and fseek() that can be used inquire about the current position of an input/output stream and to set the position in that stream.

In the original Unix implementation of the C stream file model, positions in the stream were simply the integer number of bytes from the start of the stream, but when C was ported other operating systems, the developers quickly learned that some file systems made it difficult to count bytes from the start of a file. Here is how they documented file positions in the Unix Programmer's Manual, Vol I, part 3, published by Bell Laboratories in 1979, revised 1983:

The general definition of a magic cookie you can infer from this brief mention is that it is a value returned to the user by some system where the system understands the construction and use of that value but the user is not expected to be able to interpret or manipulate it. The only useful thing the user can do with a cookie is give it back to the system so that the system can interpret it.

So, how do we use cookies to represent objects in a world where we still have something akin to classes, but all fields and methods are static? We use cookies as object handles, and we have each class maintain a collection of all objects of that class. Here is a rewrite of the example class given above under these constraints, using integer cookies:

static class C {
	static private int alloc = -1;   // used to allocate instances
	static private Field f[maxSize]; // all of the instances
	static public int C() { // the initializer
		alloc = alloc + 1;
		f[alloc] = ??
		return alloc;   // return a handle for the new instance
	}
	public Field M( int handle ) { // an example method
		return f[handle];
	}
}

The above implementation assumes that the constant maxSize gives the maximum number of members of class C that will ever be needed. More sophisticated implementations will allow users to deallocate objects as well as allocate them, and will include (in the allocator) a search for a free object so that storage can be reused.

In the late 1960s and early 1970s, when object-oriented programming was just being invented, people like David Parnas developed methods of modularizing programs, creating software architectures that were almost object-oriented while writing code in languages like Fortran IV. Fortran IV does not support classes, but it is straightforward to write compilation units (separately compiled pieces of a large program) where each unit contains one callable subroutine per method of a logical class and a common block holding the representations of all instances of that class.

This method remains in use today in distributed systems where code is divided between clients and servers. In such a system, it is natural to have one server to implement each class, where the server holds all values of that class in its internal memory and offers clients the ability to perform actions on members of the class.

The biggest weakness of the basic cookie idea is that it has no security. Handles are integers, and if a program accidentally (or intentionally) uses the wrong handle, it will access the wrong object. Nothing in the type checking of a typical programming language prevents the interchange of an integer handle for an object of class C with the handle for an object of some other class, so errors (or security violations) will be very difficult to catch.

Andrew Tannenbaum invented a very nice solution to this problem. In effect, his solution creates a range of values for handles that is huge compared to the number of valid handles. As a result, if a user accidentally or intentionally uses the wrong value of a handle, the likelihood of that handle referencing a valid object is minimal.

His solution involves adding salt to the integer handle we have used above (regardless of whether an improved allocation scheme is used). The allocator adds the salt to the handle, and whenever the handle is used, the class removes the salt, but only after checking to see that it is the right salt for this object. Here is the code:

static class C {
	static private int alloc = -1;    // used to allocate instances
	static private int salt[maxSize]; // the salt for each object
	static private Field f[maxSize];  // all of the instances
	static public int C() { // the initializer
		alloc = alloc + 1;
		f[alloc] = ??
		salt[alloc] = random();
		return alloc + salt[alloc] * maxSize; // return salted handle
	}
	public Field M( int handle ) { // an example method
		// first break the handle into the salt and the real handle
		int mySalt = handle / maxSize;
		int realHandle = handle % maxSize;
		// check that the salt is right
		if (mySalt != salt[realHandle]) complain loudly
		// finally, do the job
		return f[realHandle];
	}
}

Tannenbaum built an operating system called Amoeba where object handles are implemented in essentially this way. The Amoeba system was designed (with funding from the European Space Agency) to create supercomputers from clusters of inexpesive small computers. Amoeba uses this framework extensively, and since it implements the basic "salted cookie" mechanism in standard "boilerplate" code that is distributed with the system, users can think in entirely object-oriented terms when developing applications on Amoeba, without ever worrying about building their own "salted cookies".

Amoeba is more complex than indicated here because it adds, to each object handle, a set of access rights. The salting scheme is further modified to prevent users with a limited-access handle from increasing their access rights, for example, given a read-only file handle, a user cannot convert it into a read-write handle without knowing the correct value of the salt for that handle, and this is cryptographically hidden in a very clever way.

41. Operating Systems, Networks and Cookies

Objects in Operating Systems

Cookies