Homework 10 Solutions

22C:116, Spring 1999

Douglas W. Jones

One way to empirically determine whether a thread package was a user or kernel package would be to compare the cost of some simple thread operations with the costs of known kernel operations and known library operations. For example, a thread could measure the time it takes for a sequence of several thousand simple nonblocking thread operations with a similar number of equally simple nonblocking kernel operations and a similar number of calls to computationally simple library functions. Consider using wait() signal() operations on a thread-package semaphore that was initialized to prevent blocking, and compare this with a sequence of lseek() operations on a file, where all of the seeks cause no change in the file position, and to a sequence of calls to abs(). If the time taken for the thread operations is similar to the time taken by kernel calls, they are kernel threads. If the time taken for the thread operations is similar to the time taken by library function calls, they are user threads, and if the times taken are not clearly distinguishable, more experiments are needed.
Another experiment would involve launching multiple threads and using one thread to call a kernel that is known to be a blocking primitive. If the effect is to block all threads, the thread package is a user-level thread package. If other threads continue running, there must be some kind of cooperation between the thread package and the kernel, and it is likely that the kernel is directly implementing the threads.
Part a: The quantifyable aspects of system use that determine how CPUs are best distributed are:
- Communication demand of the video display
- Inter-user communication demand
- Communication demand between display server and applications
- Communication demand between user processes
- User to file-system communication demand
- User-to-user load balance
Part b: If the user-to-user load balance is good, there is little reason to share CPUs between users. In this case, centralized CPU clusters make sense if the communication demand between user processes and the file system is higher than the communication demand between user processes and the display or display server, and centralized CPU clusters make sense if inter-user communication has a higher bandwidth than the communication between user processes and the display or display server.
The use of a display server at the desktop, as opposed to a long video cable, is justified if the video display is relatively static. If the display contains large amounts of real-time animation, the bandwidth needed to update the display server's data structures would actually exceed the bandwidth of a standard video signal, so it would be less expensive to centralize the display servers and only use long wires for the video signals themselves.
If the user-to-user load balance is poor, there is reason to share CPUs between users. In this case, if the processes of a single user need higher communication bandwidth than is required for communication between user processes and displays or display servers, then such balancing will be most economical within a computer. If this is not the case, it may be reasonable to use local clusters of CPU's at each user's desk.
Problem: Before explaining the difference between supply driven load balancing and demand driven load balancing, it is useful to note that the resources we are dealing with are things like available CPU power and memory. The computers in the net are suppliers of these resources, and processes are the consumers.
Thus, a supply-driven load-balancing system places the responsibility for initiating load balancing with the computers that have a surplus of CPU power or memory. Such machines would seek processes that could consume these resources.
A demand-driven load-balancing scheme, in contrast, puts the responsibilty for initiating load blanancing on the kernels of the machines that are attempting to serve an excessive load. In this case, these kernels, acting as agents for processes that are creating this excessive demand, seek places for those processes to run.
Part a: The read() system call returns EWOULDBLOCK or EAGAIN if the read would have blocked and the file allows nonblocking I/O, and the fcntl(d,F_SETFL,f) system call allows the FASYNC flag to be set, causing a SIGIO signal to be delivered when input is available. These two hints, found in man pages, suggest solutions. I only followed up on the first.
To make reads from a file non-blocking, the O_NONBLOCK flag can be set, either as part of the flags argument to open() or as part of the flags to fcntl(d,F_SETFL,f).
When a call to read() would otherwise block, if O_NONBLOCK has been set and characters are available, read() gets these characters and the return value is positive. If no characters are available, read() returns -1 and indicates that it would have blocked using the error codes EWOULDBLOCK or EAGAIN. If the number of characters requested was greater than the number available, so that read() would normally block, it only returns the available characters and may not set any flag value. In this case, distinguishing between the would-block condition and a normal return requires comparing the counts and a knowledge of whether or not the read() was being done on a line-sequential file or a character sequential file.
Part b: Nonblocking read is valuable to user-level thread packages so that other threads can be scheduled during the execution of a thread-package read. So, for example, the top-level outline of a thread-read operation might be:
```
thread_read(...)
   force file into nonblocking mode
   loop
      return = read( ... )
      if read completed normally, exit loop
      relinquish
   end loop
   restore file mode
```