
                          The Inform Technical Manual

                     for revision 5.4, last updated 24/9/94


        1      Introduction
        2      Recondite directives
        3      Unusual constant forms
        4      String indirection and low strings
        5      Game control commands and keyboard reading
        6      Obselete commands
        7      The abbreviations optimiser
        8      Dictionary and parsing table formats



Section 1      Introduction

This is a short collection of notes on low-level matters covering what
is neither in  The Inform Designer's Manual nor the assembly-language
documentation in The Specification of the Z-machine.

The Designer's Manual is, however, intended to be entirely self-contained
for all practical purposes.  If this document contains nothing either
interesting or useful, I feel I shall have achieved my purpose.

It contains much of the commentary which used to be in the source code's
header, such as its modification history, notes on porting the Inform
compiler to new machines and documentation of obselete or internally-used
features.  I anticipate revising this (though not necessarily the
Designer's Manual) each time the source code is updated.

Section 2      Recondite directives

These are the directives airily dismissed as 'recondite' in  A1 to the
Designer's Manual.

Default <cname> <value>;

If the constant has not yet been defined, define it with this value.
(In Verblib this is used to give constants like MAX_CARRIED their
default values if the main game source has not already set them;
hence the name.)


Stub <rname> <n>;

If the routine has not yet been defined, define one which has $n$ local
variables and simply returns false.  (Setting the number of local variables
prevents the game from calling a routine with more arguments than it has
local variables to put them in; this should not do any harm to the
interpreter, but neither does a little caution.)  This is how "entry
point" routines are handled: the Grammar library file stubs out any
undeclared entry points.


Dictionary <name> <text>;

Enters <text> in dictionary, and makes a new constant for its address.
This is not so much recondite as obselete; nowadays one would write
something like

Constant frog_word 'frog';

but in any case now that one can write simply 'frog' the need has gone
away.


System_file;

Declares the present file to be a 'system file'.  The only way in which
these differ from other files is that if Inform has been told to Replace
a given routine, it will ignore a definition of this routine in a 'system
file'.  Thus Parser and Verblib are system files, and conceivably
other user-written library extensions (for magic, say) might want to be.


Lowstring <name> <string>;

Puts string in the "low strings" area of the Z-machine (an area in the
lowest 64K of memory which holds static strings, usually to hold
abbreviations), and creates a constant with the given name to hold its
word address.  Any string which is to be used with the @ string escape
must be declared in this low strings area.  (But the use of the @
string escape is clumsy and there are probably better ways to get the
effect in Inform 5.)


Version <v>;

sets the game file version (3 for Standard games, 5 for Advanced; 4 and 6
are present for completeness).  This directive isn't so much recondite
as redundant; the preferred way is to either set -v3 or some such at
the command line, or to include a switches directive, e.g.

Switches v3;


The remaining directives are for debugging Inform only:

Listsymbols;
Listdict;
Listverbs;
Listobjects;

are fairly self-explanatory (be warned: they can produce a lot of output).
In addition, a number of tracing modes can be turned on and off in
mid-pass:

Trace    Btrace    Ltrace    Etrace
NoTrace  NoBtrace  NoLtrace  NoEtrace

Trace is an assembly-language style trace, with addresses and bytes
as compiled; Btrace is the same, but produced on both passes, not just
on pass 2; Ltrace traces each internal line of code; and Etrace, the
highest-level of these, traces the expression evaluator at work by
printing out the expression trees made and the assembly source these are
reduced to.  (A more vehement, less legible version is etrace full,
which shows the process in minute detail.)


Section 3      Unusual constant forms

There are more constant forms in Inform 5 than are dreamt of in the
Designer's Manual.  Some are obselete, others obscure.  To begin with,
Inform predefines a number of constants which are used by the library:

  adjectives_table   (byte address)
  preactions_table   (byte address)
  actions_table      (byte address)
  code_offset        (packed address of code)
  strings_offset     (packed address of strings)
  version_number     (3 or 5 as appropriate)
  largest_object     (the number of the largest created object + 255)
  dict_par1
  dict_par2
  dict_par3

which can be read by something like

lookup = #adjectives_table;

One does occasionally want to know the largest object number in
high-level code, but the library provides a variable top_object
such that the legal variable numbers are
        1 <= n <= top_object
and using this is preferable.

The dict_par constants are byte offsets into a dictionary entry
of the three bytes of data about the word, and are provided because
these offsets are different between Standard and Advanced games;
thus, the parser uses these constants to ensure portability between
the two.


A constant beginning #a$ means "the action number of this action
routine".  Thus, #a$TakeSub is equivalent to the more usual ##Take


A constant beginning #w$, followed by a word of text, has as value the
address of the given word in the dictionary (Inform will give an
error at compile time if no such word is present).  Largely obselete.


A constant beginning #n$, followed by a word of text, has as value the
address of the given word in the dictionary (Inform adds it to the
dictionary as a new word if it is not already there).  Thus,
#n$leopard is equivalent to 'leopard'.  However, this constant form
is still useful to enter single-letter words into the dictionary
(like y, which the parser defines as an abbreviation for "yes")
since 'y' would instead mean the ASCII value of the character 'y'.


A constant beginning #r$, followed by a routine name, gives the (packed)
address of the given routine.  This is chiefly useful for changing the
routine-valued properties of an object in mid-game, e.g.

lamp.before = #r$NewBeforeRoutine;

where NewBeforeRoutine is defined as a global routine somewhere.


Section 4      String indirection and low strings

Inside a static string (in double-quotes), the string escape @nn,
an @ sign followed by a two digit number, means "print the
n-th string variable here".  nn is a decimal number from 00
to 31.  Now such a variable string can be set with the

String <number> <low-string-constant>;

which means that any string to be used in this way has to have been
defined as a "low string" (see above).  For example,

Lowstring L_Frog "little green frog";
...
String 0 #L_Frog;
"You notice a @00!^";

will result in the output

You notice a little green frog!

Actually, since the first 32 entries of the "synonyms table" in the
Z-machine are reserved for this purpose, the command String n x
is in fact equivalent to

(0-->12)-->n=x;

Due to a minor design infelicity of the Z-machine, the more friendly-looking
usage

String 0 "illegal frog";

will work in a Standard game but may unpredictably fail in an
Advanced one exceeding 128K in length; hence the need to ensure
all relevant strings are "low" (in the bottom 128K of memory).


Section 5      Game control commands and keyboard reading


quit;

(Actually an assembly language opcode.)  This quits the game
(at once, with no confirmatory question to the user): all games must
end this way, since it is illegal to return from the Main routine).

restart;

(Similarly an opcode.)  Restarts the game exactly to its initial state,
losing the previous state for good.

save <label>;
restore <label>;
verify <label>;

Tries to save or load in a saved game file, or to verify that the
existing story file is not corrupted (by calculating a checksum and
comparing it against the one in the header).  In each case, jump
to the given label if successful (otherwise run on into the next
statement as usual).  save and restore are actually commands and
not opcodes because the relevant opcodes function differently between
Standard and Advanced games; this command ensures portability.



Read <a> <b> [<routine>];

This reads from the keyboard (printing no prompt: it is assumed this
has already been done) into buffer a and tokenises it into buffer b.
(a and b are expected to point to global string variables, defined
by something like

Global a string 120;

meaning that a->0 contains the number 120, and that a->1 to
a->120 are bytes of available read/write memory.)  In Standard games,
this command automatically redisplays the status line.  In Advanced ones,
if no routine is given then Inform compiles code to emulate the
Standard game status line automatically; if a routine is given, this
is called instead, and is expected to update the status line itself.
See the Designer's Manual for an example of such a routine.

After read has taken place:
   o       a->1 holds the number of characters typed;
   o       the text, unterminated, is held in a->2 to a->(a->1 + 1);
   o       b->1 holds the number of words typed (note that commas
           and full stops become separate words in their own right);
           from byte 2 onward, b contains 4-byte blocks, one for
           each word, in the form
                   byte address of dictionary entry if word is known, 0 otherwise;
                   number of letters in word
                   first character of word in the a buffer.

More flexible tokenising and keyboard-reading methods are available
by resorting to assembly language; see the aread opcode and the
'special effects' section of the Designer's Manual.

Section 6      Obselete commands

Inform 5 continues to provide a number of out-dated features from Inform 1
to 4; 'out-dated' in the sense that there are now much better ways to do
the same things.  The old features have not been removed because the
largest Inform program in existence ('Curses') still makes use of them;
their further use is not encouraged.

The put command takes the form:

put <addr> byte <index> <v>;
put <addr> word <index> <v>;

which are the old way to use arrays, now superceded by

addr->index=v;
addr-->index=v;


The write command can be used to write to many properties of an object
at once:

write <object> <p1> <v1> [<p2> <v2>...];

and was useful in the days when the only alternative was using the
@put_prop assembly opcode, but is now superceded by lines like

lamp.time_left = 0;

which are clearer and more consistent.

Before Inform provided C-style for loops, it had BASIC-style ones:
these were the so-called 'old-style for loops',

for <var> <start> to <finish> { ...code... }

which were restricted in having only simple finish values (i.e., not
compound expressions) and in requiring braces around the code (even
if it contained only a single statement).  The effect can be duplicated
with

for (<var>=<start> : <var> <= <finish> : <var>++) ...code...

one form of a much more general and flexible construct.

Section 7      The abbreviations optimiser

When the game becomes full, 8 to 10\% of its length can be saved by making
use of text abbreviations: a method under which up to 64 commonly occurring
phrases can be abbreviated whenever they occur.  This makes no difference
to text as seen by the player.  Because checking for these causes a speed
overhead (again, of about 10%) and it isn't worthwhile unless a game is
very large, Inform does not do so except in economy mode (compiling with
the switch -e on).  Abbreviations must be declared explicitly, before
any other text appears, by a directive such as:

Abbreviate "the ";

This causes "the " to be stored internally as only 2 text chunks (5-bit
segments), rather than 4, whenever it occurs: which is very often.
Only 64 may be declared (the remaining 32 slots in the Z-machine's
"synonyms table" being kept for string indirection).

To see how good your current choice of abbreviations is, try compiling with
the -f (frequencies) option set, which will count the number of times each
abbreviation is used, and work out how many bytes it saved.  For instance,
" the " occurs some 2445 times in 'Curses'.  Experiment soon reveals that
parts of speech and words like "there" make big savings, but that almost
any proper noun makes little difference.

Infocom's own compiler does not seem to have chosen abbreviations very
rigorously, since Infocom story files contain just such a naive list.  (This
may have been wise from the point of view of printing speed in the days of
much slower computers.)

In any case, the -u option of Inform (if your computer is large enough and
fast enough to make this feasible) will try to work out a nearly-optimal set
of abbreviations.

The algorithm for doing so is too complex to give here: see the source code.
Briefly, it runs in two phases: building a table of cross-references, and
then running a number of passes looking for good substrings and choosing
good antichains from the partially ordered set resulting.  (The main problem
being that abbreviations interfere with each other: taking both of
"the" and "the " will not give the same saving as the individual savings
added up.)  The result is not guaranteed to be optimal but seems pretty good.
The output it finally produces is a list of legal Inform Abbreviate
commands which can be pasted into source code.

Since there are something like

        300000
       2

possible choices for a typical-sized game, this is bound to be an
expensive job.  A 128K game takes about 45 seconds to compile on my machine,
and slightly under two hours to optimise.  There are three passes, of which
the first is by far the longest.

Reasonable guesswork and experiment (resulting in the words suggested in
earlier editions of this manual) actually doesn't perform too badly, but
when I first optimised a 128K version of 'Curses', the -u option saved
1200 bytes over the best choices made by hand: here is the selection
produced, in the form of -f output:

    How frequently abbreviations were used, and roughly how many
    bytes they saved:  ('_' denotes spaces)
       you   668/  444         with   144/  190        which    92/  182
       urs    58/   38         tion   142/  188          ter   274/  182
       t_w   134/   88          t_s   117/   77          t_o   164/  108
       t_i   167/  110          ing   960/  639         ight   187/  248
       her   283/  188          e_w   146/   96          e_s   160/  106
       e_o   227/  150          e_i   245/  162          e_a   254/  168
       der    87/   57          d_s    61/   40          d_o   122/   80
       d_i    82/   54          d_a   122/   80          and   560/  372
       all   289/  192          You   297/  394         This    47/   92
       The   384/  510      Meldrew    28/  108        It_is    40/  104
 Aunt_Jemima  15/  102           ._   680/  452           ,_  1444/  962
       's_~    42/  109        's_no    41/  106          _un   105/   69
       _to   708/  471        _the_  1328/ 2654          _th   578/  384
       _ro   110/   72          _pr    95/   62          _po    78/   51
       _no   246/  163          _ma   165/  109          _lo   119/   78
       _ho    87/   57          _hi    99/   65          _ha   309/  205
       _gr    67/   44          _ga    60/   39        _from    94/  186
       _for   185/  245          _fi   130/   86          _fa    97/   64
       _ex    89/   58          _ea    61/   40        _door    46/   90
       _di   110/   72         _con    88/  116         _com    72/   94
       _cl    81/   53         _can   164/  217          _ba   120/   79
       _a_   587/  390

On a version of 'Curses' taking up about 240K, using abbreviations saved
about 23000 bytes and added 9 seconds to a 91-second compilation time.

It's interesting how few words in common the naive and optimised lists
have.  Only two proper nouns survived, and they provide the only longish
words.  "is " as such turned out not to be worthwhile.  " the " was
perhaps obvious in retrospect, but I didn't think of it.  The best strategy
for abbreviating seems to be to choose three-character strings which make
a fractional saving each (only one Z-character each time, for the most part)
but which occur very often indeed.

Note also that another 32 abbreviations (which could be accommodated, if the
string indirections mechanism were dropped) would be little help, as
the least worthwhile of these already saves only 38 bytes or so.

Section 8      Dictionary and parsing table formats

Some of the tables Inform writes into the Z-machine have formats which are
not imposed by the Z-machine specification but by Inform's own conventions,
and these are covered here.  These conventions are based on (but different
to) those used in the middle-period Infocom games.


Adjectives are numbered downwards from $ff in order of their appearance in
defined grammar.  The adjective table contains 4-byte entries:

       <dictionary address of word>  00  <adjective number>
       ----2 bytes-----------------  ----2 bytes-----------

To make life more interesting, these entries are stored in reverse
order (i.e., lowest adjective number first).  The address of this table is
rather difficult to deduce from the file header information, so the constant
#adjectives_table is set up by Inform to refer to it.


The grammar table address is stored in word 7 (i.e. bytes 14 and 15)
of the header.  The table consists of a list of two-byte addresses to
the entries for each word.  This list is immediately followed by these
entries, one after another.

An entry consists of one byte giving the number of lines and then that
many 8-byte lines.  These lines have the form

       <objects>  <sequence of words>  <action number>
       --1 byte-  ----6 bytes--------  --1 byte-------

<objects> is the number of objects which need to be supplied: eg, 0 for
"inventory", 1 for "take frog", 2 for "tie rope to dog".  The sequence
of words gives up to 6 tokens following the verb, to be matched in order.
The token values are given by the table:

   noun             0
   held             1
   multi            2
   multiheld        3
   multiexcept      4
   multiinside      5
   creature         6
   special          7
   number           8
   (noun=Routine)   16+parsing-routine-number
   (Routine)        48+parsing-routine-number
   (scope=Routine)  80+parsing-routine-number
   (attribute)      128+attribute number
   (adjective)      adjective number
   ...reserved...   9-15, 112-127

Parsing routines have addresses which are too large to store in a single
byte.  Instead they are numbered from 0, and their (packed) addresses are
stored in the preactions table of the story file.  (This is called
"preactions table" because of what the original Infocom parser used it
for; the Inform library parser has no such concept as 'preaction'.)

The sequence is padded out to 6 bytes with zeros.  (This is a tiresome
convention, as it means that the value 0 can only be understood by
looking back at what has come before, but it's too late to change it now.)


Actions are numbered from 0 upwards in order of appearance in the
grammar.  (Whereas fake actions are numbered from $ff down, but that's
another story.)  The packed addresses of the corresponding action routines
are stored in the actions table.  Once again, Inform puts this table
in its conventional place, but its address is difficult to work out and
so the constant #actions_table is set up to hold it.


Verbs are numbered from $ff downwards in order of appearance,
with synonyms getting the same number (thus, "get" and "take" have
the same verb number); they are entered into the dictionary as they are
defined in grammar.


In the dictionary header, Inform defines only three characters as
'separators' which break up words in tokenisation: these are full stop,
comma and open-double-quote.  (In theory the Z-machine allows any list
here, but these three are conventional in old Infocom story files.)

Inform writes dictionary entries consisting of the word itself, plus
three data bytes.  (This makes them 7 bytes long in Standard games,
9 in Advanced.)
The entries are in alphabetical order, and look like:

       <the text of the word>  <flags>  <verb number>  <adjective number>
       ----4 or 6 bytes------  --1 b--  ----1 byte---  ----1 byte--------

The text is stored in the usual text format, thus allowing up to 6 or 9
characters.  These data bytes can be safely accessed (portably between
either format of game) by, e.g.

address->#dict_par1

which reads the flags byte of the word at address.

The flags (chosen once again to conform loosely to Infocom conventions, not
for any sensible reason) have the eight bits

       7      6  5  4  3     2      1      0
       <noun> .. .. .. <adj> <spec> <meta> <verb>

<verb>, <noun> and <adj> mean the word can be a verb, noun or adjective;
the <spec> bit means the word was inserted by a Dictionary command in the
program, except that <verb> words also have the <spec> bit set (ours not to
wonder why).

Verbs declared as "meta" have the <meta> bit set.  (These are such
out-of-world experiences as 'save' and 'score'.)

Note that a word can be any combination of these at once.  It can even be
simultaneously a verb, adjective and noun, and will be understood as such
in different contexts.


