======================================================================
                    P O O 
                    doc: Thu Apr  2 11:59:51 1992
                    dlm: Fri Jun 26 15:32:58 1992
                    (c) 1992 ant@ips.id.ethz.ch
                    uE-Info: 120 0 T 0 0 72 3 2 8 ofnI
======================================================================

This file describes some interna of the inetray-packet. It's name
derives from Principles of Operation.

Overview
--------
The program inetray is responsible for dispatching and scheduling the
rayshade requests. In the usual terminology it acts as the client
requesting services from a number of remotely running servers. It does
that using SUN RPC. Rendering requests do not block, therefore inetray also
listens continuously on a socket to check for incoming results. The data
which is received from the workers is written to the file whenever this
is possible.

The program rpc.inetrayd serves two purposes: it services a number of
rpc-requests dealing with initialization and management. Whenever it
receives a rendering request, it spawns of a worker child and continues
to service rpc requests (a restricted number).
The worker now renders a part of a frame and then directly contacts the
dispatcher to send it the result. This is done using a XDR/TCP
connection.

inetray.start is a simple RPC daemon servicing requests.

Authentication & Security
-------------------------
If the servers (rpc.inetrayd) are started as root, they try to change to
the user id supplied to them. This is usually the user id of the user
running the dispatcher (inetray). Any user, however can set a different
user id for servers started by inetray.start. No server can run as root
(uid == 0).
If the uid is illegal on the server it exits with an error message in
the syslog.
No server ever produces an output file. This therefore limits the
security concerns to changing the access time of files. Of course it is
possible that there are loopholes in this concept; I just haven't found
one yet.
If the server is not started as root, it will continue to run under the
uid it was started as. One has to check the permissions of the accessed
files for reading access for that user.
The actual usernames under which the servers are running is diplayed by
both inetray and inetray.ping.

Session Keys
------------
Whenever a started server receives the first request, a session key is
sent with that. Once a sessions key is installed, only requests with the
same key are serviced. In practice this means that only the person who
issued a inetray call can kill the running servers and workers. The key
is stored in the file .inetray.key in the current directory where
inetray was issued. An eventually existing file is renamed to
.inetray.key.old.
inetray displays the current session key on startup.
inetray.ping uses the special key 0. Therefore, if servers hang after a
inetray.ping, they can be killed with inetray.kill 0.
The program inetray.kill needs a session key supplied with. If one is
given as an argument, this takes precedence. If no key is supplied,
inetray.kill looks for one in the file .inetray.key.

Pathnames
---------
Since servers can be running on machines with totally different
filesystems but must access the inputfiles locally, some pathname
substitution is supported.
All filenames are transferred as is to the servers. Therefore, absolute
pathnames must exist on all machines (even the client, where the
inputfile is first processed to check for errors).
From the working directory where the client is started the home-part is
stripped if possible. This stripped directory is then sent to the server
which in turn adds the home directory of the uid it is to run as. Note
that if nothing was stripped on the client-side, then nothing is added
on the server-side. Note also that the right directory is chosen even when
the server cannot run under that user id. The server tries to chdir to
the directory so constructed. If that fails it continues to run in the
current directory.
The working directories of the servers are displayed by both inetray and
inetray.ping.

Port Numbers
------------
The rendered portions of a frame are sent back using a XDR/TCP
connection. The portnumber for this is defined in config.h (RESULTPORT)
but can be overridden for each user in the .inetrayrc file.

Registering Servers
-------------------
Whenever inetray or inetray.ping are started, they try to register ready
servers.
First, the servers started by inetray.start are started; the servers
started by inetd are started automatically when an INIT-request arrives. 
The order in which the machines are contacted is the following:
	1: All simple hosts given in the Use List (if any)
	2: All directed broadcasts addresses in the Use List (if any)
	3: The Local Network (if option N=0 is not set in the Use List)
After starting, an INIT-request is sent to all machines. Servers that are
to be started by inetd, are started automatically when they receive an
INIT-request. The same order applies.
Servers reply by opening a TCP-connection on the result-port and sending
back status info.

Work Scheduling
---------------
A frame is divided into blocks encompassing > 1 lines. This is done
according to a simple heuristics the parameters of which can be
controlled by editing config.h and/or overriding those values in a
.inetrayrc file (see INSTALL/Appendix B for details).
After n workers have been registered, the block size is calculated as
follows: blockSize = ySize / blocksPerServer / n. After that, the size
is checked against the lower and upper limit (MINBLOCKSIZE resp.
MAXBLOCKSIZE). If it exceeds a limit, it is adjusted accordingly. After
that, the size of the last, possibly incomplete, block is calculated and
the information printed.

In early versions (up to [0.2.0]), a simple round robin scheduling has
been used: subseqent machines got subsequent blocks to trace; whenever
the end of a frame was reached, the whole process started over with only
the non-terminated blocks.
This could lead to quite bad behaviour in the end. Consider for example
the example file mole.ray. Early blocks (bottom half) take much longer
to trace than later ones. If now one machine is heavily loaded, it won't
ever complete its block. This means that there will one early block be
outstanding for a very long time wich will inhibit concurrent writing.
Furthermore, with a little bit of bad luck, this block will be the last
one outstanding which will mean that a lot of machines will calculate
just one block in the end. This block will take a long time to
calculate.
Starting with version [0.2.1] there is a rescheduling inserted in the
middle of a frame. The number of machines which did not yet return a
result is counted and the first n blocks (n being the number of those
machines) not yet calculated are given priority over other blocks. These
blocks are exactly those residing on those slow machines. Hopefully,
these are distributed to faster machines like this.
I my setting, this modification lead to quite a decrease in time needed
to complete the last block.
Notes: - The scheme presented here also works nicely if workers crash
	 during the first half of a frame (which they seem to tend to
	 do).

Concurrent Servers & RPC Program Numbers
----------------------------------------
It is possible for one machine to have more than one server (and worker)
running at a time. This feature is implemented to allow multiprocessor
machines to have as many workers as processors running. A machine
starting more than one worker cannot start it using inetd. Concurrent
servers have different RPC Program Numbers. The first server gets the
program number IRNUM defined in prognum.h. Subsequent servers get
subsequent program numbers.
Like that, registering with the portmapper works correctly. It must be
noted, though, that all broadcasts to servers now must be broadcast for
all program numbers.

Error Logging
-------------
All daemons (rpc.inetrayd and inetray.start) log their errors using
syslog-calls, unless NOASYNCIO_QUIRK has been set during compilation, in
that case, errors are written to a temp-file (see SUPPORT for details).
If error are logged with the syslogd, the use the LOG_ERR level of the
daemon facility. 
Additionally, rpc.inetrayd logs some info on LOG_NOTICE level.
Please note that also all errors produced by the rayshade routines are
logged. This is done using a funny redirection of the stderr to the
syslog using socketpairs and async IO. For this to work under AUX I had
to implement the socketpair() syscall there, since the one built in does
not work (at least in our version).

Error Termination
-----------------
Roughly once every minute, every server checks if the dispatcher is
still running. If that's not the case, it kills it's associated worker
if it has one and then exits with an entry in the syslog.

rpc.inetrayd startup
--------------------
The server can be started up by inetd or inetray.start (or, for
debugging purposes, by hand). It checks its number of arguments to
decide how it was started up. If it is called without any arguments, it
assumes that it is started by inetd. Therefore you have to supply the
necessary arguments if you want to start it by hand. It requires two
arguments a worker id and a user id. The first server always gets id 0.
The effects of the userid are described above.
Before accepting RPC requests, the server redirects its own stderr to
the syslog using a socketpair and asyncronous IO (unless NOASYNCIO_QUIRK
was set during compilation).

rpc.inetrayd requests
---------------------
The parameter and result definitions are defined in inetray.x

INIT:		- exit if already active
	        - set session key
	        - set user id
	        - chdir to appropriate dir
	        - set flag to perform raytrace init when TRACEBLOCK is called
	          for the first time
	        - setup result socket and send back status info
		- DOES NOT RETURN ANYTHING
	
STARTFRAME:	- exit if not initialized
		- check session key
		- get old worker status (clean process table)
		- set flag to perform RSStartFrame when TRACEBLOCK is
		  called for the first time (deferred StartFrame) if
		  it's the first STARTFRAME in this session. If not,
		  then do RSStartFrame immediately.
		- return

TRACEBLOCK:	- exit if not initialized
		- check session key
		- get old worker status (clean proc tab)
		- execute deferred init & startFrame if applicable
		- fork new worker:
			worker:	- set nicevalue
				- Raytrace one block
				- send back result
				- exit
			server: - return

KILL:		- exit if not initialized
	        - check key
	        - kill started worker if exists
		- return

WAIT:		- exit if not initialized
	        - check key
	        - wait for worker (get status)
		- return

TERMINATE:	- check key
		- KILL
		- WAIT
		- clean up
		- exit
		- DOES NOT RETURN ANYTHING
		
