CX                                                                                                                                                                                                                                             E
                       The Specification of the Z-Machine
                          and Inform assembly language

                              last updated 24/9/94


       1      Introduction
       2      Resources available
       3      History and the six versions
       4      How text is encoded
       5      How instructions are encoded
       6      The early Z-machine
       7      The late Z-machine
       8      Complete table of opcodes
       9      Dictionary of opcodes
       10     Header format through the ages
       11     A few statistics


  1  Introduction

The Z-machine is an imaginary computer originally devised by Joel Berez and
Marc Blank in 1979 to run the Infocom adventure games.  Since the demise of
Infocom much effort by many people has gone into deciphering it and
implementing it with new portable interpreters to allow modern-day players
to run the classic Infocom games.  The Z-machine is also the run-time code
format of the Inform compiler, which means that there are now more
Infocom-format games in play than the ones Infocom actually wrote.

It is well-adapted to its task.  Its behaviour is (very, very nearly)
exactly specified and it has been accurately implemented on virtually every
small computer.  It maintains a hierarchy of objects and possessions, and
does the computationally-intensive part of parsing input itself.

The purpose of this paper is to fully document the Z-machine, discuss to
what extent it is presently implemented and detail how to use Inform as an
assembler.

Only a few of the pieces in this jigsaw were placed by myself, and the
credit belongs to many people.  Old hands at the decipherment game will
no doubt find the opcode table tiresomely familiar: but, as with a chemist
finding Mendeleyev's periodic table on a laboratory wall, so will the hacker
be reassured by the sight.

I gratefully acknowledge the help of Paul David Doherty and Mark Howell, who
each read a draft of this paper and sent back detailed corrections.
Mistakes and misunderstandings remain my own.


To begin, three general points.  The fascination with the letter Z began
with 'Zork': apparently "zork" was a nonsense word used at MIT for the
current uninstalled program in progress, and stuck.  The Z-machine runs what
we shall call Z-code.  Just as we shall use the term "Z-machine" for both
the machine and its loaded program, so ZIP (Zork Implementation Program) was
used to mean either the interpreter or the object code it interpreted.  Code
was written in ZIL (Zork Implementation Language), which was derived from
MDL (informally called "muddle"), a particularly unhelpful form of LISP.
It was then compiled by ZILCH to assembly code which was passed to ZAP to
make the ZIP.  We refer to code as "Z-code" to avoid confusion with "Zip",
the name of Mark Howell's interpreter (by far the best available).

Secondly.  In talking about "the Z-machine", what do we really mean: the
design Infocom had in mind, the syntax which seems to be in their surviving
game files, or what is actually done by various interpreters, theirs or
ours?  Aided by the patient detective work of my predecessors (e.g.
disassembling Infocom-written interpreters, and going through all existing
game files) I shall try to give all three specifications.  (Inform
assembly-language programmers will need to bear in mind that it is the third
that really counts.)

For the standard format (version 3) there are many existing games and there
isn't much conflict.  But for later versions, there are few games, not all
the opcodes were ever used and the interpreters publically available
disagree about what to do with some of the obscure ones.  To some extent
this account is an attempt to settle arguments.

Finally, note that the Z-machine does not provide the bulk of a game's
parser, or its 'operating system'.  The parser has to be coded, and the
tables it uses (which some investigators think are part of the Z-code
format) are in fact the same across different Infocom games only because
they contain similar parsers.  So those are not specified here.  An account
of the parsing tables as generated by Inform can be found in the {\sl Inform
Technical Manual}.  For the usual format of Infocom's parsing tables, see
the C source code to Mark Howell's utility "Infodump".



Hexadecimal numbers are written with an initial dollar, as in $ff, while
binary numbers are written with a double-dollar as in $$11011, according
to Inform conventions.  The bits in a byte are numbered 0 to 7, 0 being
the least significant and the top bit, 7, the most.

  2  Resources available

The four publically available interpreters that I know of are:
       o "Zip", the fastest and most accurate, which is currently
          being updated to interpret even version 6;
       o "InfoTaskForce" (henceforth ITF), which is almost as good
          for most purposes but slightly inaccurate in some screen-handling matters
         and does not provide the necessary features for "undo" in Version 5 games;
       o "Pinfocom", which is competent on version 3 games but
          unable to cope with higher versions;
       o "Zterp", similarly primitive.


Bryan Scattergood has made a considerable enhancement of ITF for his
Psion and Archimedes interpreters.  However, the ITF no longer seems to
exist as such.

The only existing compiler is Inform, since Zilch no longer exists.

Mark Howell's toolkit of utility programs includes a disassembler
called "txd" and a vocabulary dumper called "infodump", together
with other less generally useful programs.

An enhanced version of Zip which will be a source-level debugger for
Inform games, called Infix, will soon be available.

The Infocom story files are, with a few exceptions (the samplers)
copyright and are currently being sold by Activision in the collections
'The Lost Treasures of Infocom'.  They represent excellent value for
money.  They should not be present at any archive site, and if they
are then this is so illegally.

A few other story files, such as 'Curses' and 'Advent', are freely
available.


Most of the above programs have publically available source code (in C)
and many have executables as well; the if-archive at the anonymous
ftp site ftp.gmd.de is the best place to find them.

A curse of these programs is that they almost all use different names for
the opcodes internally (that is, in their source code).  Mark Howell and
I (as authors of the disassembler and assembler, respectively) have agreed
on what we think is a reasonable standard, and these are the opcode names
documented here.  They are used from Inform 5.4 and in recent editions of
txd.


  3    History and the six versions

There were six main versions of the Z-machine, and several minor variant
forms.  These are recognisably similar but with labyrinthine differences,
like different archaic dialects of the same language.  (And, of course, the
job of decipherment is made harder by the fact that the archaeological
record suddenly stops in about 1989 when the civilisation in question
collapsed.)

Broadly, these fall into two groups: early (versions 1 to 3) and late
(4 to 6).  This paper will give an expository account of versions 3 and 5
(as representative of these two groups) but will conclude with brief tables
and specification for all versions.

The six versions are:


Version 1  Early Apple games for DOS 3.2, and the TRS-80 Models I/II
Version 2  Early Apple games for DOS 3.3, and the TRS-80 Models I/II

Version 3  "Standard" series games
Version 4  "Plus" series games
Version 5  "Advanced" series games, or, as the marketing division would
           have it, "Solid Gold Interactive Fiction" - a reference to
           the colour (though not composition) of the boxes they came in
Version 6 & Later games with graphics, mouse support, sound effects, etc.


Infocom called their own interpreters ZIP (versions 1 to 3), EZIP/LZIP
(V4), XZIP (V5) and YZIP (V6).

Versions 1 and 2 are thought to be extinct, though collectors have a few
fossils and Zip and ITF implement them anyway.  Many Version 3 games are
still in circulation, and enough worthwhile Version 4 and 5 ones to make
the format important.

Most of the Infocom games exist in several different releases, and some
were written for one version and then ported to later ones.  'Zork I', for
instance, exists in at least ten editions, two early, seven in version-3
(with release numbers between 5 to 88 in chronological order) and one in
version 5 (release 52 - the releases go back to 1 when the version changes).

There are few version 6 games, and they are of (arguably) poorer quality.
Few interpreters exist for them, because they are inherently difficult to
port to different machines.  However, there will be a brief discussion of
the version-6 format here and in effect a full specification in the
dictionary which follows the opcode table.

The definitive guide to all Infocom story files known to exist is Paul David
Doherty's "fact sheet" file, which can be found at ftp.gmd.de.


The Z-machine as originally constructed was surprisingly similar to that
in use when Infocom ground to a halt.  Version 1 (1979-80) had essentially
the same object format, for instance, and a similar header, but encoded text
with a different character table and had no concept of synonyms.  Its
addresses were all word-addresses and not byte-addresses, so presumably a
small amount of memory was wasted in null bytes to fix parities everywhere.

Version 2 was quite a minor enhancement, presumably made only because a new
interpreter had to be written anyway.  Synonyms appeared, but only in one
32-word bank, and the six-digit serial number appeared in the header,
though it wasn't always the date in those days: Release 7 of 'Zork II',
for instance, is (reputedly) called UG3AU5.

Version 3 changed the text encoding alphabets again, and tripled the number
of synonyms possible.  (Consequently the previous "caps lock" style
permanent changes of alphabet were dropped.)  The "verify" code and verify
checksums appeared; and a new opcode to print the status bar at the top of
the screen was introduced.  (Previously, this was updated only when input
was taken from the keyboard.)  The earliest Version-3 releases ('Deadline',
then 'Zork I' and 'II') were in March and April 1982; the latest (the
'Minizork', a cassette-based Commodore-64 sampler of 'Zork') in November
1987.

A primitive form of screen-splitting (which, presumably, was devised in a
hurry in 1984 and then accidentally became the foundation for the character
graphics designs of later versions) was allowed by some interpreters, in
order to give 'Seastalker' a sonar display.  In order that 'Seastalker'
should run on less enlightened interpreters, the game itself contained code
to check whether this feature was available before using the opcodes.
And 'The Lurking Horror' (1987) has sound effects (on some machines) - another
sign of things to come.

Nevertheless by 1982 the Z-machine had stabilised to a reasonably clean
design.  It was very portable, contained everything reasonably necessary and
most of its complications were optimisations to squeeze a few more bytes out
of the 100K or so available on an early-1980s floppy disc.  (Actually the
Zilch's code generator, although very good at exploiting these tricks, had
little larger-scale optimisation, and some of its code makes disheartening
reading.  But then the same could be said of Inform.)


By 1985 there were two basic pressures to change.  One was that home
computers were larger, and several fundamental restrictions (the game size
being only 128K, the number of objects only 255, the attributes only 32,
the properties only 30) were beginning to bite.  The other was the drive for
more gimmicks - character graphics, flashier status bars, sound effects,
different typefaces, and so on.  The former led to logical, easy to
understand structural changes in the machine.  The latter, in contrast, made
a mess of the system of opcodes.

More does not mean better: just because the price of paper falls is no
reason to double the size of the modern novel, for instance.  Nor is
literature (pace e. e. cummings) much improved by using four different
typefaces and illustrating it with typewriter pictures.  Also, the relieving
of size restrictions only increased design time - or lowered its quality.

Nonetheless, two excellent games resulted from the lifting of size
restrictions.  In August 1985 the first version-4 game ('A Mind Forever
Voyaging') reached production, and it was followed most notably by
'Trinity' (which had previously been shelved as too ambitious for the
version-3 format).  Still, most of the new 1985/6 games remained in
version-3: after all, there were still plenty of 8-bit home computers
around, too small for version-4 games: and, despite critical acclaim,
the new games consequently did not sell as well.

Version 5 games began to appear in September 1987 with 'Beyond Zork' and
'Border Zone'.  Both of these games needed new features - character graphics
gone wild in the case of the former, and real-time keyboard interaction in
the latter.  The number of opcodes grew ever faster as a result.
Although five old games were re-released in Version 5 editions (with an
in-game hints system added, and benefiting from 9-letter word dictionaries,
but otherwise as written), the direction was all too clearly away from
the old text game into graphics.  Having gradually moved this way ('Beyond
Zork' can look like a parody of an early mainframe maze game, for instance)
there was nothing left but to complete the process, and so Version 6 was
born.  After something of a hiatus in 1988, the last few
increasingly-unrecognisable Infocom games appeared: 'Zork Zero', 'Shogun',
'Journey', 'Arthur'.

Infocom gradually ceased to exist during 1987-9 for financial reasons
generally said to be unrelated to their games output.  Whether they would
have continued to release text games of the classical style is arguable.

  4   How text is encoded

Text is stored as a sequence of 2-byte words.  Each of these is divided into
three 5-bit pieces, plus 1 bit left over, arranged as

   --first byte-------   --second byte---
   7    6 5 4 3 2  1 0   7 6 5  4 3 2 1 0
   bit  --first--  --second---  --third--

The bit is set only on the last 2-byte word of the text, and so marks the end.

These pieces are called 'Z-characters' and have values in the range 0 to 31.

There are three alphabets, in which the numbers 6 to 31 mean:

  A0     abcdefghijklmnopqrstuvwxyz
  A1     ABCDEFGHIJKLMNOPQRSTUVWXYZ
  A2      ^0123456789.,!?_#'"/\-:()

(Here the new-line character is written as a circumflex ^).

Character 0 is a space in all alphabets.  Characters 1, 2 and 3 are used for
abbreviations: thus, 1 followed by 14 means "print entry 14 in the synonym
table"; 2 followed by 5 means "print entry 32+5=37..."; 3 followed by 20
means "print entry 64+20=84..." and so on.

The Z-machine provides these for commonly occurring strings to be printed
out as if they were characters, thus saving memory.  Though they are
actually abbreviations, by accident of history they have come to be called
'synonyms'.  (Well chosen synonyms tend to make about a 10\% space saving.)

By default, a character is presumed to be in A0, i.e. to be a lower-case
English letter.  However, the character 4 means that the next one (only) is
in A1; and 5 means the next is in A2.

(Note for purists: actually the full rule is

            A0      A1      A2
    4       [A1->]  [A1->]  [A0->]
    5       [A2->]  [A0->]  [A2->]

but since alphabet changes are (in versions 3 and onward) not permanent,
it seems pointless ever to use 4 and 5 in alphabets 1 and 2.)

Notice that character 6 in A2 is blank.  It isn't a space: it simply isn't
there.  The sequence 5 followed by 6 indicates that the next two characters
define an ASCII value.  This is the way to get at the characters not in any
of the three alphabets.  For example, the familiar message

  *** You are dead ***

takes four Z-characters to produce each of the asterisks.

Finally, note that the end-bit only comes up once every three characters,
so that a way is needed to safely use up any spare characters in the last
2-byte block.  This is done by padding out with 5's.  (5 followed by 5 does
nothing.)

This is especially the case with dictionary entries.  Some dictionary
entries, like "i", ought only to take one 2-byte block, but in order to make
all entries the same number of 2-byte blocks long and so alphabetically
sortable by number, they are padded out by as many 5's in a row as needed
(possibly as many as eight of them).  Dictionary entries are not permitted
to use synonyms and their letters are in lower case (though they can
contain characters from A2).

In practice the text compression factor is not really very good: for
instance, 155000 characters of text squashes into 99000 bytes.  (Text
usually accounts for about 75\% of a story file.)  But the encoding does
at least encrypt the text so that casual browsers can't read it.


Footnotes:

1. The versions 1 and 2 formats are slightly different: see below.

2. In versions 5 and 6, the three alphabet blocks need not be the
default ones A0 to A2 tabulated above, but instead can be chosen by the
story file itself by means of an entry in the game's header.

3. In version 6, it is expected that the ASCII codes for tab (9) and
control-K (11) are printed slightly differently: a tab at the start of a
line should be a paragraph indentation suitable for the font being used, but
anywhere in the middle of a line should be a space; and 11 should be
rendered as a gap between two sentences.


  5    How instructions are encoded

This account is to be read in conjunction with the opcode table and
dictionary, so it does not tabulate or individually discuss opcodes.
Experimenting with Inform as an assembler, while tracing is turned on, may
be helpful.

Except for the printing instructions print and print_ret, which are
simply opcodes followed by an encrypted string, an instruction consists of
the following:

  Opcode               1 byte (possibly 2 in versions 5-6)
  (Types of operands)  1 byte; only for VAR form opcodes
  Operands             Between 0 and 4, each taking 1-2 bytes
  (Store)              1 byte; variable to store a result
  (Branch)             1-2 bytes; offset to branch to

(not all opcodes take "store" or "branch"; a few take both).

Operands
--------

Z-code understands four kinds of operand, and describes these in 2-bit
fields:

  $$00    Large constant (>=256 or <0)   2 bytes
  $$01    Small constant (0 to 255)      1 byte
  $$10    Variable                       1 byte
  $$11    Omitted altogether             0 bytes

Variables are described in one byte.  $00 means the top of the stack,
$01 to $0f are the local variables of the current routine and $10 to
$ff are the global variables, 0 to 239.  Writing to the stack pointer,
or variable $00, pushes something onto the stack; and reading from it
pulls it off.  The stack can also be manipulated with the use of opcodes.
The stack is guaranteed to be at least 512 bytes long, and some interpreters
are more generous.  There isn't any way for a Z-code program to check stack
overflowing, so recursion requires care.

Opcodes
-------

In versions 1 to 4, Z-code opcodes are 1 byte only.  To begin with, look at
the top two bits.  If these are $$11, we shall call it "variable"; if
$$10, "short" (0OP or 1OP, i.e. 0 or 1 operands); and otherwise "long"
(2OP: 2 operands).  In versions 5 and 6, there are also "extended", EXT,
opcodes two bytes long.


For short opcodes, look at the next two bits (4 and 5).  These give the kind
of operand which the code has.  If this is $11, there isn't an operand and
the opcode has no argument at all.  In this event, the opcode number is the
bottom 4 bits (see table of 0OP opcodes).

If the type wasn't $11, then an operand follows of the given type (large
constant, small constant or variable), and the bottom four bits gives the
opcode number (see table of 1OP opcodes).


Long opcodes have two operands.  The bottom 5 bits of the opcode say what
it is (see table of 2OP opcodes).

The alert reader will notice that this only leaves bits 5 and 6 spare to
hold the operand types.  As there are two operands to specify, this ought
to take up 4 bits, which obviously won't fit.  So a more economical form is
used instead.  Bit 6 refers to the first operand, and bit 5 to the second.
A value of 0 means a small constant and 1 means a variable.  Now, type $11
(not really there) operands can't happen, so that's no problem, but there
might well be type $00 (large constant) operands, for example in assembling

@mul x #666 sp;

In this event, the opcode must instead be assembled as a "variable" opcode.


So we must now describe the "variable" or VAR opcode form.  In addition to
the possible opcodes which can arise from overflowing "long" opcodes, there
are others which can only be "variable".  In the former case bit 6 is clear
and in the latter it is set.  In either case the bottom 5 bits contain the
opcode number: see the 2OP or VAR tables accordingly.

Some of these are only of "variable" type because the available codes for
the other types had run out; print_char, for instance.  Others, especially
call, need the flexibility to have between 1 and 4 operands.

In the "variable" type opcode, all eight bits of the opcode have been used
up, so we have to add another byte describing the operands.  This is divided
into four 2-bit fields.  For example, $$00101111 means large constant
followed by variable (and no third or fourth opcode).

Once the opcode is out of the way, the operands are simply stored in one or
two-byte form as appropriate.

Numbers and addresses
---------------------

These are two-byte words, stored in the order high-byte then
low-byte.  The top bit is treated as the sign when needed
(e.g. for numerical comparisons) and not otherwise (e.g. for addresses).
When holding an address such a number can be a byte address, which puts
it necessarily in the bottom 64K of the memory map, or a packed address.
Routines and static strings will be at addresses in memory which can be
pointed to by packed addresses.  Given a packed address p, the formula
to obtain the corresponding 'real address' in bytes is:

 b =    2p    versions 1-3
        4p    versions 4-5
        8p+o  version 6

where the offset o in Version 6 is given in the game header (this can
be used to stretch the memory map another 64K or so beyond the apparent
512K limit).

Strings, stores, branches
-------------------------

print and print_ret are followed by text: this is assembled in the usual
way immediately after the opcode (which may well be at an odd address, but
this doesn't matter) and execution resumes after the last 2-byte word of text
(the one with top bit set).

"Store" opcodes return a value: for example, mul multiplies its two
arguments together, and call calls a routine which must return a value.  Such
instructions are followed by a single byte giving the variable (stack
pointer, local or global as usual) to put it in.  This may look like an extra
operand but is not: there is no need to tell the Z-machine what type it has,
since it must be a variable.

Finally, there are instructions which test a condition.  More opcodes than
just the obvious branch instructions do this; e.g. save does so (in
version 3), the test in question being whether or not the save was
successful.  Branches are stored in two different ways for economy reasons:
nearby ones in a single byte at the end of the instruction, farther ones
in two such bytes.

The top bit of the first byte of a branch is the "flag".  If this is clear,
then a branch occurs when the condition came out false.  If it is set, then
the branch occurs when it was true.

If the next bit (bit 6) is set, then the branch is in abbreviated 1-byte
format and the offset is in the bottom 6 bits (0 to 5).  If not, the offset
is in the bottom 14 bits (0 to 5 of the first byte, and all of the second).
This offset can be positive or negative.  (E.g., all 1's means -1 in the
usual way.)

In the abbreviated form, an offset of 1 in fact means "return true from the
current routine" and an offset of $20 (i.e., -31) means "return false".  An
offset of 1 is never useful but -31 might arise, and so it is essential to
use the long form for such branches.

Working out what the offset ought to be is more complicated than it appears
because the PC has already moved on from the start of the instruction when
it reaches the branch.  The bizarre formula in question is

  Offset = Destination address - Address of this instruction - Length + B

where

  Length = number of bytes in instruction (not counting the branch)

and B is 1 for short branches, 0 for long ones.

(For its own code Inform compiles branches in the long form, considering the
economy to be not worth the nightmarish computation needed to make the
long/short decision.  (One problem is that the number of bytes in each
instruction must be the same in both passes, so that the decision needs to
be made before the value of the offset is known... in a 2-pass compiler this
is insoluble.  Another is that the offsets are affected by the size of the
branch, confusing matters on forward branches.)  However, its assembler
mode allows you to make an explicit choice.)

jump instructions similarly encode their address operand as an offset, but
always as a two-byte (signed) constant.

A few instructions both store results and branch: if so, the store comes first.


Extended set of opcodes
-----------------------

The extended (or EXT) set only applies in versions 5 and 6.  These are two
byte opcodes, of which the first byte is always 190, the second the opcode
number.  Subsequently, they behave exactly as VAR...

...except that, actually, two of them don't.  Two of them, call_vs2 and
call_vn2, have up to 8 operands and so have two bytes of type information
instead of one.  (These are provided for calling functions with up to 7
arguments instead of only 3, the limit in earlier versions.)

(Inform's assembler is unable to use these two opcodes.)


  6    The early Z-machine

Since the majority of extant Infocom story files use it, this section talks
about version 3 unless otherwise stated.  The following section will indicate
how the late Z-machine differs.

The early Z-machine has a memory map at most 128K long.

       An example memory map of a small game (produced by Inform)}

            Start  Contains
Dynamic     00000 header
            00040 synonym strings
            00042 synonym table
            00102 property defaults
            00140 objects
            002f0 object descriptions and properties
            006e3 global variables
            008c3 arrays
Static      00b48 grammar table
            010a7 actions table
            01153 preactions table
            01201 adjectives table
            0124d dictionary
Paged       01a0a Z-code
            05d56 static strings
            06ae6 end of file

The Header
----------

The first 64 bytes contain a header, to be detailed fully later.  It
contains (mainly) addresses of other tables and flags, and is both a
vehicle for the game to tell the interpreter what to do, and for the
interpreter to tell the game what it can do.

To briefly run through the essential points of the version-3 header:
the first 4 bytes are
      03  <Flags>  <Release Number>
                   ----2 bytes-----

(The first byte is the version number.)
Next come seven word addresses, at words 2 to 8:

2     <Start of Routines>    Where routines begin, in bytes

Actually, in some games, read-only data seems to continue here: this
pointer actually tells the interpreter where the "resident" data ends,
i.e. the part of the game which is kept in memory at all times rather
than loaded off disc as and when required.  (Of course modern interpreters
should almost certainly not be swapping pages from the disc anyway, now
that 128K is no longer a scandalous amount of memory.)

3     <Main Routine>         Address of main routine, in bytes, +1

(This +1 is why the Main routine cannot have local variables - it is a
peculiarity of the standard.  Note also that this is uniquely a routine
address in bytes and not a packed address: Main must occur in the lower 64K
of the file.  Inform always sets word 3 to be word 2, plus 1, because it
requires Main to be the first routine defined.)

4     <Dictionary>           The dictionary table address, in bytes
5     <Object tree>          Object table address, in bytes
6     <Variables>            Global variables address, in bytes
7     <Save area size>       The total number of bytes in a saved game

Saving the game is done by saving this many bytes from the beginning of the
machine.  (Saved games also contain the current state of the Z-machine
stack; the stack is not stored anywhere in the Z-machine's memory.)

8     <More flags>

This is followed by the six bytes from byte 18 to 23, which are the version
number string.  (By custom these hold the compilation date in the form YYMMDD.)
Then more words:

12    <Synonyms table>       Synonym table address in bytes
13    <Length>               Length of file, in words
14    <Checksum>             Sum of bytes from 64 upwards, mod $10000

The length and checksum are needed to perform "verify", something which
most games only do when explicitly asked.

Synonyms
--------

We are now at byte address $0040 and by convention we reach the synonyms.
Usually, the actual strings (the expansions of the synonyms) are stored
here, one after another, making up 96 strings.  When that is out of the way,
the actual table begins (and this is what the synonyms address points to).
The table contains 96 word addresses in sequence.

Note: extremely annoyingly (from the point of view of the compiler writer),
these are word addresses and not packed addresses: thus a synonym string
must lie in the bottom 128K of memory.  (Inform has to go to a considerable
amount of extra trouble because of this.)  Of course in the original design
synonym strings had to be resident (hence low in memory) anyway for speed
reasons.

Object Table
------------

Next is the object table.  In fact it begins with what is sometimes called
the "global properties table", though it is actually a table of default
values of properties.  This is a list of 31 2-byte numbers.  There is no
property 0, so the first word is always 0000.  (Recall that there are
30 properties in versions 1 to 3.)
After these 62 bytes, the objects begin, beginning from object 1.  An object
entry consists of 9 bytes, looking like:

   <the 32 attribute flags>   <parent>  <sibling>  <child>  <properties>
   ---32 bits in 4 bytes---   ---3 bytes------------------  ---2 bytes--

The three parent-sibling-child bytes are 00 when the object pointed to is
"nothing".  The properties pointer is the byte address of the list of
properties attached to the given object.

When all these 9-byte entries are out of the way, the property lists
begin.  (Inform keeps these in the same order as the objects they are
attached to but the specification does not require this.)  An individual
property table has the brief header

  <text-length>   <text of short name of object>
  -----byte----   --some even number of bytes---

(where the text-length is the number of 2-byte words making up the text,
which is stored in the usual format).

Then the properties held are listed, in descending numerical order.  (This
order is essential.)  An individual property is stored as

  <size byte>   <the actual property data>
                ---between 1 and 8 bytes--

The size byte is arranged as 32 times the number of data bytes, plus the
property number.
Each list of properties is ended by a 00 size byte.  This is why there is no
property 0.


Global variables
----------------

When all the property tables are done, we come to the global variable table.
Global variables are numbered from 0 to 239, and this table begins with 240
initial 2-byte values for them.  After this is conventionally left space for
all the arrays, dynamic strings and so on which they point to.

We have now reached the top of the save area.  Everything higher in memory
than here is never altered (and not saved when the game is saved, hence
the name).


Grammar and parsing tables
--------------------------

Next is the table of grammar, an actions table, a preactions table and then
an adjectives table.  Note that this is not a part of the specification at
all, and the Z-machine knows nothing about these tables.  The old Infocom
files have certain standards about their formats because they used roughly
similar parsers; Inform follows these conventions to some extent (see the
Inform Technical Manual for the formats it writes here).


The dictionary
--------------

And next the dictionary table, which has the following short header:

  n    <list of ASCII codes>  entry-length  number-of-entries
 byte  ------n bytes--------      byte         2-byte word

The codes listed are word-separators: typically (and under Inform
mandatorily) these are

   .   ,   "
A space character (32) does not appear because these characters will not
only divide words but also come out as words in their own right: thus,

  > fred,go

will be lexically analysed as three words:

"fred"  ","  "go"

Each word entry has 4 bytes of text (i.e. 6 Z-characters, padded out
with as many "pad" characters, that is 5s, as necessary), and
then a few extra bytes of data: almost invariably (and under Inform
mandatorily) three.

Dictionary entries appear in alphabetical order (precisely, this means
in numerical order, regarding the first 4 bytes as an unsigned
integer).  They use only alphabets A0 and A2 (i.e., they don't use
upper case letters).

The contents of the data bytes are not specified by the Z-machine,
which never does anything with them.  (See the Inform Technical
Manual for what Inform does with them.)


The code area and static strings
--------------------------------

Next is the code area.  (In fact some Infocom games, though no Inform
ones, put some static data next before the code begins.)
The code area simply contains a list of routines; the specification
does not require the first routine to be the 'main routine', and indeed
it is not in some existing files (though it always is under Inform).

All routines (and static strings) must occur at addresses which can
be packed addresses (meaning, at even byte addresses in Version 3).
The bytes sometimes left over in between them are unspecified (but under
Inform, always 0).

A routine begins with one byte indicating the number of local variables the
routine has (from 0 to 15), and then with that many 2-byte numbers giving
their initial values.  When a function call takes place, the arguments --
however many there are -- are written into the first few local variables,
over-riding the default values here.  Unlike global variables, these bytes
are not used for the current values of the variables: they are kept on
the stack.

(Inform never makes use of these initialisation numbers, and simply stores
zeros.)

Executable code follows this header.  There is no special marker for the end
of a routine; it is simply expected that in every case a legal return
instruction will be hit.

Finally, from the end of the code to the top of memory are the static
strings.  These are put up here to be out of the way, where they won't clog
up the bottom 64K of memory.  There's no table of their addresses, or pointer
to where they begin; each is referred to by a packed address in code or
data given earlier.


  7    The late Z-machine

  Versions 4 and 5: Architecture
  ------------------------------

The bulk of this section is given over to a detailed discussion of the
differences between version 3 and version 5, since those are the two forms
Inform can produce.  (Version 4 is nearer to version 5 than 3.)  We
begin with the architecture.

The memory map doubles to 256K, a change which is surprisingly easy to make.
But the processor remains 16-bit, so packed addresses are now multiples of
4.  However, this only really affects addresses of routines and static
strings (which are now aligned to longword boundaries, not word-boundaries).

As mentioned in section 6, an annoying exception is that the synonyms table
contains word addresses still, and so assumes that the synonym strings lie
in the lower 128K.  This is understandable because the Z-machine used to
rely on virtual memory (swapping pages of memory on and off of disc), and
the synonyms need to be accessed at virtually all times: keeping them
together in low memory (just after $0040) is therefore efficient, and
giving them addresses divisible by four would waste bytes in the
save-game-area.

The only important change to the header, then, is that the length is in
longwords, being a packed address.

A minor new feature in Version 5 is that the game can change the alphabet
tables used for text decoding, putting a pointer to them in the header at
$34-5: this is usually left as $0000, meaning the default alphabets.  See
section 10.  Also, it seems to be expected that the interpreter tells the game
the dimensions of the screen by writing them into the header itself, in
play. Thus it is fairly safe to consult

  Byte 32 - Screen height
  Byte 33 - Screen width

and it's hard to cope without this information, since games after Version 3
have to construct their own status lines.  (It isn't clear that the various
interpreters all understand the same thing by "height" and "width", though.)

There is effectively no limit on the number of possible objects, since an
object number is no longer expected to fit into a single byte.  This has the
knock-on effect that in most games many properties will have to allow for a
word and not a byte (which is why Inform defaults property definitions as
long in version-5 mode), but the only architectural effect is that object
definitions grow in size.  Since the number of attributes is increased from
32 to 48, and of properties from 30 to 62, this would be needed anyway: and
here is the new form:

   <the 48 attribute flags>   <parent>  <sibling>  <child>  <properties>
   ---48 bits in 6 bytes---   ---3 words, i.e. 6 bytes----  ---2 bytes--

giving a 14-byte block.  As before, the properties field is the byte address
of the property table.

The property table is also altered.  A property is now stored as

  <size and number>     <the actual property data>
  --1 or 2 bytes---     --between 1 and 64 bytes--

The property number now occupies the bottom 6 bits, not 5, of the first size
byte, which is why more properties are available.  But this only leaves two
bits.  If these are $$00, the size is taken as 1, and if $$01, then it
is taken as 2.  (These are the most common sizes in practice.)  Otherwise
the top bit is set, which means that the second byte is present, and
contains the size in its bottom six bits.
However, when present the second byte must also have the top bits set to
$$10.  The reason for this is that the size must be parsable either
forwards or backwards - the Z-machine needs to be able to reconstruct the
length of a property given only the address of the first byte of its data.

There are very many (e.g. 2000) property entries in a story file, so this
optimisation is probably worthwhile.

The formats of the parsing tables are generally different in later
versions, but this isn't part of the Z-machine specification.

Whereas Version 3 games have dictionaries store words in 6 Z-characters,
all Version 4 and above games take 9 Z-characters.  (I.e., four and six
bytes of encoded text respectively.)  This increases the length of entries.
Otherwise, the specification is the same.

The extra resolution makes it reasonable to include hyphenated words, which
might not have been sensible earlier because of the number of five-bit
blocks they would have needed.

These modifications appear at first sight to make much larger, less
efficient code, but this is misleading.  The original version-3 'Curses' was
only 3% larger when first compiled as version-5, and a good part of that
was the extra dictionary resolution.

There is one sensible structural change to the way actual code is written:
in Version 5 (not Version 4, though) the header of a function no longer
contains initialisation values for its local variables.  In practice these
were very often zero, wasting a large number of bytes across the whole story
file.  On the other hand, one peculiarity of the machine is that functions
can be called with 0, 1, 2 or 3 arguments, and routines in version-3 games
used to be able to put a default value in their headers for any argument not
supplied by the caller.  This they can no longer do, so that they are unable
to tell how many arguments actually were supplied: and so a new branch
instruction check_arg_count exists to test this.

Another improvement is in subroutine calls.  In Version 3 code, a call
instruction is always VAR and has a variable argument list, which wastes a
byte even when there are no parameters.  Also, every function call returns a
value, and in Version 3 this value had to be written somewhere even when it
wasn't wanted - wasting another byte.  (In fact Inform used to return this
to the stack, and then pop it from the stack - wasting another one.
Nowadays it stores unwanted return values in a scratch global variable.)  In
Version 4 (and to a greater extent in Version 5), new forms of the call
instruction are provided which automatically throw away the return value.

This leads to the nightmarish position that there are eight variant forms of
call in the Version 5 machine.  Inform christens six of these as follows:

  call_vs  <address> <0 to 3 arguments> <place to put answer>

(which is just as in version 3 call, and compatible with it),

  call_vn  <address> <0 to 3 arguments>

which is the same but throws away the answer, and

  call_1n  <address>                   address();
  call_1s  <address> <answer>          answer=address();
  call_2n  <address> <a1>              address(a1);
  call_2s  <address> <a1> <answer>     answer=address(a1);

Two of the others are called call_vs2 and call_vn2 by Inform: these are
provided for function calls with up to seven arguments, circumventing the
usual restriction on function calls to have at most three: and, uniquely,
they have two bytes of type bits, arranged as eight two-bit fields.  (Inform
does not compile these instructions, and does not make use of them when
coding function calls, because it would be extremely unportable to lower
versions.)  Note that the standard opcode name for all eight opcodes is
call, and this is what appears in disassembly, but that Inform uses these
eight names internally and for assembly.


Versions 4 and 5: Reliable extra features
-----------------------------------------

We now discuss those important extra features which can more or less be
relied upon to be safely interpreted.  Roughly speaking, don't rely on
interpreters other than Zip to correctly perform an opcode not actually used
in any existing Infocom game.

But we must begin with unfortunate clashes with version 3.  Chief among
these is pop which used simply to throw away the top of the stack.  In
version 5 no such instruction exists (there is less need for it anyway given
the new n form of the call opcodes).

Also, the read opcode (although it has the same basic form,

  read text_buffer parse_buffer;

as before) does a subtly different job: it appends the result of parsing the
text to the parse_buffer, rather than over-writing the parse buffer.  It
also no longer prints any kind of status bar.  (To avoid confusion of the
syntax, Inform calls the version-3 opcode sread and the version-5 opcode
aread; and its higher-level command read translates into sensible code
for either.)

And since there is no longer any Z-machine "status bar", the old opcode to
display it (show_status) disappears and in theory becomes illegal.

The random function now makes the random number generator predictable for
a while if given a negative argument (some version 3 games had a #random
opcode - so called because typing #random into the game made it happen).

Cutting and pasting bits of parse buffer is a common job for Z-code parsers,
and there are new opcodes to help with shuffling tables around.  One can
also (using tokenise) parse from any string, with any supplied dictionary
table (not necessarily the default one).  One may also encode_text to
Z-machine text format - which might be useful for constructing dictionary
entries at run-time.

A few opcodes have been moved around, irritatingly, and there have been
three casualties.  not has moved.  save and restore now appear in the
extended set, as a result of which they are no longer branch instructions
(presumably to avoid coping with branch offsets being different for extended
opcodes), and now take a less convenient syntax:

  save <variable>;
  restore <variable>;

These put return codes in the variable.  They return 0 if they fail;
restore returns 1 if successful, save returns either 1 or 2.  The
ambiguity is because a successful restore results in execution continuing
from immediately after the save instruction which produced the save game
file... so in order that the program could know whether a restore had just
taken place, or only a save of a game after which normal execution
continued, the return value is altered.

Being in the extended set does give them extra functions but not very useful
ones.  It is possible to imagine saving a "preferred settings" file, for
instance.
(Inform compiles a little code to make save and restore emulate the
version 3 opcodes, for portability between versions.  To get at the raw
opcodes, they must be assembled in @ mode.)

Character graphics before Version 6
-----------------------------------

Now for the graphics routines.  The simplest of these allows for different
text styles: boldface, underlining and reverse video (e.g. white on black if
text would normally be black on white).  These effects are modelled on the
VT100 (design of terminal) and cannot safely be combined, even though the
codes for them look like bit masks:

  set_text_style 0     Default: Inform calls this "Roman"
                 1     Reverse video
                 2     Boldface
                 4     Underlined (or italic)

An interpreter providing coloured text may implement these with colour
changes: my own represents bold as blue lettering instead of black on white,
for instance, which is quite pleasant.

Some ports of ITF paint entirely-reversed next lines when scrolling
the screen in Reverse video, but this is incorrect.  Some interpreters
do not implement "bold face".  A stone tablet with keywords picked out in
bold might be impossible to decipher to some players.

(There is another option, 8, which forces use of a fixed-spaced font,
used in 'Beyond Zork'.)

An upper (usually status line) screen can be split off from the main screen
with:

  split_window <n>

creating one which is n lines tall.  There are then two screens, 0 (the
main screen) and 1 (the upper one).  Text output can be switched between
them by

  set_window 0       to lower
             1       to upper

The lower window is just a text stream and its cursor position cannot be
set: on the other hand, when it is returned to, the cursor will be where it
was before it was left.

Within the upper window, anyway, the cursor can be moved by

  set_cursor <line> <column>

where (1,1) is the top left hand character.  Printing on the upper window
overlies printing on the lower, and is always done in a fixed-space font,
and does not appear in a printed transcript of the game.

However, before printing to the "status line" screen, it is essential to
change the printing format - this is the buffer_mode opcode alluded to
earlier.  Before printing, execute

buffer_mode 0

and when returning to the normal screen,

buffer_mode 1

Otherwise, if the cursor comes near the edge the interpreter may continue
trying to split lines at word breaks; some ports of ITF make a horrid mess
in this case, though Zip manages.

Also, the status line screen must be tall enough to include all the cursor
positions you want to write to.  If it is not quite tall enough, different
interpreters flounder about in different ways: some will scroll the upper
window, some won't.

A common thing to want to do is to erase areas of screen - especially a
status bar which is being redisplayed.  Opcodes

  erase_window $ffff  - erases whole screen, both windows
  erase_line          - erases from cursor to end of line  [Achtung!]

are provided for this.  If you are in reverse video mode, they erase to the
reversed colour: a particularly unpleasant effect is achieved by

set_text_style 1; erase_window $ffff;

Unfortunately erase_window (which is intended to erase window n, or all
windows if n=-1) is not fully implemented by ITF and cannot safely be used
except in this drastic way.  (E.g., the Version 4 file 'Trinity', for
instance, only uses it thus.)

erase_line is only sometimes implemented and does slightly unpredictable
things in reverse video mode, which is a nuisance since it would otherwise
be ideal for blanking out an out-of-date status bar. However, no existing V4
or V5 game uses this opcode and so it may not be relied upon.  (It's
interesting to note that the Version-5 edition of 'Zork I' - one of the
earliest Version 5 files - blanks out lines by looking up the screen width
and printing that many spaces.)


There are new arithmetic opcodes:

  art_shift x y z    z=x arithmetically shifted y bits
  log_shift x y z    z=x logically shifted y bits

Version 5 games effectively have "undo" provided for them, though the logic
is tricky to get right (from a programmer's point of view).  The two
relevant opcodes are save_undo and restore_undo, which work in exactly
the same way as save and restore except that they save the game
internally to spare memory.  The idea is that if the game is saved before
any action, then the last action can be undone by restoring this
memory-saved game.

save_undo provides one more return code than save: it returns -1 if the
interpreter is unable to manage internal saves (presumably this was provided
for machines tight on memory).  Now, of course, an interpreter which knows
about save_undo enough to return this code probably knows enough to
implement it fully.

Zip does provide this, but the ITF interpreter currently does not (and
save_undo returns 0).  This is probably the biggest feature it lacks.
In any case, "undo" is such a worthwhile feature and so easy to code that
games probably ought to provide it in hope.

Changing input/output streams and reading the keyboard in real time
are, similarly, more reliable under Zip.


Architecture: version 6
-----------------------

The architecture of the Version 6 Z-machine is extremely similar to that
of Version 5.  Packed addresses are expanded again and this allows the
memory map to stretch yet further.  ('Shogun', for instance, is about 335K
long.)

Pictures and sampled sounds are not stored in the Z-machine itself and it
is simply expected that the interpreter has them to hand.  They were
thus stored in different formats for different machines.
A few opcodes are changed (mostly the character graphics ones) and
many new ones are added: see the dictionary.

The graphical features are the most disheartening to interpreter writers,
but most of them seem to be optional.  For instance, the interpreter
can declare itself unable to draw pictures, or to produce sound effects.
It is not impossible to imagine that a fairly portable version-6 interpreter
could be constructed, and Zip is currently going down this road.

The display is expected to be arranged in pixels.  Coordinates are usually
given in the form (y,x), with (1,1) in the top left.  There is a
generalised colour scheme intended to look like the basic IBM PC colours
(which is to say, not very pleasant).  There are eight, instead of two,
windows, and they have more elaborate possibilities; but essentially
similar to the two windows in version 4 onward.

There may be a mouse, but if so it is not expected to do much beyond move
an arrow around and have one or more buttons.  Similarly, there may be a
concept of "menus" - which seems primarily furnished for Apple Macintoshes.


  8    Complete table of opcodes

This table might be called a variorum edition of the Z-machine
specification: it contains all 120 or so possible opcodes for every version
of the Z-machine, from 1 to 6 and (taken with the accompanying dictionary)
documents them and their corresponding Inform assembly language syntax.

A few opcodes do not in fact occur in any existing files, but they can
be deduced by disassembling Infocom-supplied interpreters.  This table
specifies also which opcodes occur in V1 to V5 files, at least.

Inform names (and can assemble) all the opcodes, even the version-6 ones.
This may be useful for preparing test files.  The names here are the set
used by Inform 5.4 and later, extended from a system worked out by Mark
Howell for his disassembler, which we have agreed on as a standard.  We hope
that this will provide interpreter writers and others with a common lexicon.
It would be helpful if interpreter sources use these names internally.

Reading the opcode tables
-------------------------

The two columns "St" and "Br" (store and branch) mark whether an
instruction stores a result in a variable, and whether it must provide a
label to jump to, respectively.

The "Opcode" is written

   TYPE:Decimal

where the TYPE is 2OP, 1OP, 0OP, VAR or EXT: two operands, one operand, no
operands, variable number of said, and variable number of said but occurring
in the "extended" set.  The extended set of opcodes are two-byte opcodes
where the first byte is (decimal) 190.

Briefly, single byte opcodes have types as follows:

  0 to 31, 32 to 63, 64 to 95, 96 to 127:  forms of 2OP, the opcode number
                                               being the value mod 32
  128 to 143, 144 to 159, 160 to 175:      forms of 1OP, the opcode number
                                               being the value mod 16
  176 to 191:                              0OP, the opcode number
                                               being the value mod 16
  192 to 223:                              2OP opcodes implemented in the
                                               VAR form, the opcode number
                                               being the value mod 32
  224 to 255:                              VAR, the opcode number
                                               being the value mod 32
The decimal number is the lowest possible decimal opcode value.  The hex
number is the opcode number within each TYPE.

The "V" column gives the version information.  If nothing is specified, the
opcode is as stated from version 1 onwards.  Otherwise, it exists only from
the version quoted onwards.  Before this time, its use is illegal.  Some
opcodes change their meanings and these have more than one line of
specification.  Others become illegal again, and these are marked
[illegal].

In a few cases, the version is given as "3/4" or some such.  The first
number is the version number whose specification the opcode belongs to, and
the second is the earliest version in which the opcode is known actually to
be used.  A dash means that it is never used at all (in versions 1 to 5 at
least: possibly a few of the 5/- opcodes may be used in version 6).

The table explicitly marks opcodes which remain unused in all six versions
of the Z-machine as ------.  In principle, the interpreter is at liberty
to crash if it finds them, though in practice ignoring them is more polite.

However, the extended set, which could in principle run from $00 to $ff,
stops at $1c: subsequent codes $1d to $ff were never used, even in
version 6.

Inform assembly language
------------------------

An Inform line beginning with an @ is sent direct to the assembler.  The
syntax is as laid out in the tables below.  (Remember that opcodes can only
be used if the game version number is right.)

<variable> and <result> must be variables (or sp, the stack pointer);
<label> a label (not a routine name).  In a branch instruction, the
logical effect can be negated using a tilde ~ before the label name, so
for instance

  @je a b ~Different;  ! Jump to Different if a not equal to b

The programmer must specify whether a branch is in the "near" or "far"
form, the default being "near".  A question mark ? before the label (and
the tilde, if there is one) forces it to be far, it otherwise being "near"
(which is cheaper and more likely).

<string> must be literal text in quotation marks "thus" and it is
translated in the usual Inform way.  When function is listed, a constant
is expected to be a packed address of a function.  Inform assembles these in
the right way if you just name a function at the appropriate point.

Generally speaking any Inform constant term (such as 'a' or 'beetle')
can be used as an operand but a compound expression (which would obviously
incur extra assembly) cannot.

Opcode names changed since Inform 5.2
-------------------------------------

In order to bring Inform into line with the agreed standard names for opcodes,
the following changes have been made to opcode names:

     From              To
     ====================================
     compare_pobj      same_parent
     colour            set_colour
     retsp             ret_popped
     show_score        show_status
     scanw             scan_table
     aparse            tokenise
     encrypt           encode_text
     check_no_args     check_arg_count
                         Two-operand (long) opcodes 2OP

St  Br  Opcode Hex  V  Inform name and syntax

    ------   0  ------
    *   2OP:1    1     je              a b <label>
    *   2OP:2    2     jl              a b <label>
    *   2OP:3    3     jg              a b <label>
    *   2OP:4    4     dec_chk         <variable> value <label>
    *   2OP:5    5     inc_chk         <variable> value <label>
    *   2OP:6    6     same_parent     obj1 obj2 <label>
    *   2OP:7    7     test            bitmap flags <label>
*       2OP:8    8     or              a b <result>
*       2OP:9    9     and             a b <result>
    *   2OP:10   A     test_attr       object attribute <label>
        2OP:11   B     set_attr        object attribute
        2OP:12   C     clear_attr      object attribute
        2OP:13   D     store           <variable> value
        2OP:14   E     insert_obj      object destination
*       2OP:15   F     loadw           table index <result>
*       2OP:16  10     loadb           table index <result>
*       2OP:17  11     get_prop        object property <result>
*       2OP:18  12     get_prop_addr   object property <result>
*       2OP:19  13     get_next_prop   object property <result>
*       2OP:20  14     add             a b <result>
*       2OP:21  15     sub             a b <result>
*       2OP:22  16     mul             a b <result>
*       2OP:23  17     div             a b <result>
*       2OP:24  18     mod             a b <result>
*       2OP:25  19  4  call_2s         function arg1 arg2 <result>
        2OP:26  1A  5  call_2n         function arg1 arg2
        2OP:27  1B  5  set_colour      foreground background
        2OP:28  1C 5/- throw           value stack-frame
        ------  1D  ------
        ------  1E  ------
        ------  1F  ------


                            One-operand opcodes 1OP

St  Br  Opcode Hex  V  Inform name and syntax

    *   1OP:128  0     jz              a <label>
*   *   1OP:129  1     get_sibling     object <result> <label>
*   *   1OP:130  2     get_child       object <result> <label>
*       1OP:131  3     get_parent      object <result>
*       1OP:132  4     get_prop_len    property-address <result>
        1OP:133  5     inc             <variable>
        1OP:134  6     dec             <variable>
        1OP:135  7     print_addr      byte-address-of-string
*       1OP:136  8  4  call_1s         function arg1 <result>
        1OP:137  9     remove_obj      object
        1OP:138  A     print_obj       object
        1OP:139  B     ret             value
        1OP:140  C     jump            <label>
        1OP:141  D     print_paddr     word-address-of-string
*       1OP:142  E     load            value <result>
*       1OP:143  F 1/4 not             value <result>
                    5  call_1n         function arg1


                           Zero-operand opcodes  0OP

St  Br  Opcode Hex  V  Inform name and syntax

        0OP:176  0     rtrue
        0OP:177  1     rfalse
        0OP:178  2     print           <string>
        0OP:179  3     print_ret       <string>
        0OP:180  4 1/- nop
    *   0OP:181  5  1  save            <label>
                    5  [illegal]
    *   0OP:182  6  1  restore         <label>
                    5  [illegal]
        0OP:183  7     restart
        0OP:184  8     ret_popped
        0OP:185  9  1  pop
*                   5  catch           <result>
        0OP:186  A     quit
        0OP:187  B     new_line
        0OP:188  C  3  show_status
                    4  [illegal]
    *   0OP:189  D  3  verify
        0OP:190  E  5  [first byte of extended opcode]
    *   0OP:191  F  5  piracy


                         Variable-operand opcodes  VAR
St  Br  Opcode Hex  V  Inform name and syntax     VAR:224  0  1  call            function ...args... <result>
                       icall           address <result>
                    4  call_vs         function ...args... <result>
        VAR:225  1     storew          table word value
        VAR:226  2     storeb          table byte value
        VAR:227  3     put_prop        object property value
*       VAR:228  4  1  sread           text-buffer parse-buffer
                    5  aread           text parse time function
        VAR:229  5     print_char      ascii-value
        VAR:230  6     print_num       value
*       VAR:231  7     random          range <result>
        VAR:232  8     push            value
*       VAR:233  9  1  pull            <result>
                   5/- pull            stack <result>
        VAR:234  A  3  split_window    lines
        VAR:235  B  3  set_window      window
*       VAR:236  C  4  call_vs2        [not properly assembled]
        VAR:237  D  4  erase_window    window
        VAR:238  E 4/- erase_line      value
                    6  erase_line      pixels
        VAR:239  F  4  set_cursor      line row
                    6  set_cursor      line row window
        VAR:240 10 4/- get_cursor      table
        VAR:241 11  4  set_text_style  style
        VAR:242 12  4  buffer_mode     flag
        VAR:243 13  3  output_stream   number
                    5  output_stream   number table
                    6  output_stream   number table width
        VAR:244 14  3  input_stream    number
        VAR:245 15  4  beep
                   5/3 sound_effect    number effect volume
                       sound_effect    number effect repeats volume
                    6  sound_effect    number effect volume repeats
*       VAR:246 16  4  read_char       1 time function <result>
*    *  VAR:247 17  4  scan_table      x table len form <result> <label>
*       1OP:248 18 5/- not             value <result>
        VAR:249 19  5  call_vn         function ...args...
        VAR:250 1A  5  call_vn2        [not properly assembled]
        VAR:251 1B  5  tokenise        text parse dictionary flag
        VAR:252 1C  5  encode_text     ascii-text length from coded-text
        VAR:253 1D  5  copy_table      from to size
        VAR:254 1E  5  print_table     ascii-text width height skip
     *  VAR:255 1F  5  check_arg_count argument-number



                             Extended opcodes  EXT

St  Br  Opcode Hex  V  Inform name and syntax

*       EXT:256  0  5  save            table bytes name <result>
*       EXT:257  1  5  restore         table bytes name <result>
*       EXT:258  2  5  log_shift       number places <result>
*       EXT:259  3 5/- art_shift       number places <result>
*       EXT:260  4  5  set_font        font window <result>
        EXT:261  5  6  draw_picture    picture-number y x
     *  EXT:262  6  6  picture_data    picture-number table <label>
        EXT:263  7  6  erase_picture   picture-number y x
        EXT:264  8  6  set_margins     left right window
*       EXT:265  9  5  save_undo       <result>
*       EXT:266  A  5  restore_undo    <result>
        -------  B  ------
        -------  C  ------
        -------  D  ------
        -------  E  ------
        -------  F  ------
        EXT:272 10  6  move_window     window y x
        EXT:273 11  6  window_size     window y x
        EXT:274 12  6  window_style    window flags operation
*       EXT:275 13  6  get_wind_prop   window property-number <result>
        EXT:276 14  6  scroll_window   window pixels
        EXT:277 15  6  pop_stack       items stack
        EXT:278 16  6  read_mouse      table
        EXT:279 17  6  mouse_window    window
     *  EXT:280 18  6  push_stack      value stack <label>
        EXT:281 19  6  put_wind_prop   window property-number value
        EXT:282 1A  6  print_form      formatted-table
     *  EXT:283 1B  6  make_menu       number table <label>
        EXT:284 1C  6  picture_table   table


Notes: 1. The opcodes 5, 6, 7, 8 in the extended set were very likely in the
V5 specification, and are named in some interpreter sources (though only
very haphazardly implemented) but they do not occur in any existing V5 story
file.

2. The notation "5/3" for sound_effect is because this plainly version-5
feature was used also in one solitary Version-3 game, 'The Lurking Horror'
(the sound version of which was the last V3 release, in September 1987).
A V3 interpreter may ignore this but may not crash.

3. The opcode 0 (in the 2-operand set, i.e. the actual byte 00) was possibly
intended for setting break-points in debugging.  It was not nop.  (At
time of writing, the Infix debugger uses the actual nop instruction as
a break-point.)


 9    Dictionary of opcodes

This dictionary is alphabetical and includes entries on every opcode listed
in the table above, as well as brief notes on some Inform internal synonyms
which might otherwise be confused with opcodes.  Although concise it
essentially documents correct interpreter behaviour.

The following have been corrected since the first edition: aread,
erase_line, get_cursor, get_wind_prop, input_stream, picture_data,
random, set_cursor and split_window.  picture_table, the last opcode
to be discovered, has been added.


add        Signed 16-bit addition.

and        Bitwise and.

"aparse"   Obselete name for tokenise.

aread  Advanced form of read.  This behaves just as the standard
       form does if the last two operands are not
       supplied, except that: (i) the status line is not redisplayed, and
       (ii) if the parse buffer supplied is zero, no attempt is made to parse
       the input.

       The parse buffer is appended to, not over-written as in version 3.

       If all four operands are supplied, then every time seconds
       while the player is working on her input, the function is called: if it
       returns 1 (true) then the reading process is interrupted.  (The function
       obviously needs to run pretty quickly.)

       The function is called with one argument: the time value.

art_shift   Does an arithmetic shift of number by the given number of
    places, shifting left (i.e. increasing) if places is positive, right if negative.
    In a right shift, the sign bit is preserved as well as being shifted on
    down.  (The alternative behaviour is log_shift.)

beep   Beeps in a more or less irksome fashion and possibly flashes the display.

buffer_mode   If set to 1, text output is buffered up so that it can be
    word-wrapped properly.  If set to 0, it isn't.

call The only call instruction in version-3, Inform reads this as
    call_vs in higher versions: it calls the function with 0, 1, 2 or 3
    arguments as supplied and stores the resulting return value.

call_1n   Executes function(arg) and throws away result.

call_1s   Stores function(arg).

call_2n   Executes function(arg1, arg2) and throws away result.

call_2s   Stores function(arg1, arg2).

call_vn   Like call, but throws away result.

call_vs   See call.

call_vn2   Call with a variable number (from 0 to 7) of arguments, then
    throw away the result.  This (and call_vs2) uniquely have an extra byte
    of opcode types to specify the types of arguments 4 to 7.

call_vs2   See call_vn2.

catch   Opposite to throw, and occupying the same opcode that pop used
    to in versions 3 and 4, but now with a store argument.  catch gives the
    stack frame of the current routine: see throw for what to do with it
    subsequently.

check_arg_count   Branches if the given argument-number (1 being the first of
    these) has been provided by the function call to the current routine.
    (Default values would otherwise be difficult to provide in versions 5
    and 6.)

"check_no_args"   Obselete name for check_arg_count.

clear_attr   Make object not have attribute.

clear_flag   A name once used for one of the not-really-present extended v5
    opcodes.

"colour"   Obselete name for set_colour.

"compare_pobj"   Obselete name for same_parent.

copy_table   Copies size bytes from the first table to the second.  If the
    second table is given as 0, then it zeroes the bytes in the first table.
    If the length is positive, it copies backwards:
       copy_table $1000 $1001 20

    would push the first 20 bytes forward by one.  However, if the length is
    negative, it copies forwards.  Thus the same operation with -20 would
    result in the byte at $1000 being copied into the 20 following bytes.

dec    Decrement variable

dec_chk   Decrement variable, and jump if now equal to value

div    Signed 16-bit division

draw_picture   Displays the picture with the given number from the library of
    pictures which the interpreter is expected to have (which is not resident
    in the Z-machine itself).  The Z-machine knows nothing of what picture
    format is being used.  By default, this appears at the current cursor
    position in the current window.  Y and X pixel coordinates from the top
    left can be given instead, though (the top left having coordinates (1,1)).

    Pictures are numbered from 1 and need not be numbered contiguously.

encode_text   Translates an ASCII word to the internal (z-encoded) text format,
    suitable for dictionary use.  The text begins at from in the ascii-text
    and is length characters long, which should contain the right length
    value even though in fact the interpreter translates the word as far
    as a 0 terminator.  A 6-byte z-encoded string results: this is the
    dictionary resolution in versions 4, 5 and 6 and usually represents
    9 characters of ASCII.

"encrypt"   Obselete name for encode_text.

erase_line   Before version 6: erase the current cursor line in the current
    window.  (Badly interpreted by ITF.)  In version 6: if the value is 1, do
    just that: if not, erase the given number of pixels minus one across from
    the cursor (clipped to the window size).
    In both cases, don't move the cursor.

erase_picture   Like draw_picture, but wipes the appropriate region to
    the background colour for the given window.

erase_window   Erases window with given number (to the background colour in
    version-6), or if -1 it unsplits the screen and clears the lot.  The
    cursor moves back to top left.  (In version 6, -2 means clear the whole
    screen but don't unsplit it.)

extended   This byte (decimal 190) is not really an instruction, but
    indicates that the opcode is "extended": the next byte contains the
    number in the extended set.

get_next_prop   Gives the number of the next property provided by the
    quoted object.  This may be zero, indicating the end of the property list;
    if called with zero, it gives the first property number present.  (If
    called with the number of a property not present, the Z-machine may
    legitimately crash.)

get_prop   Read property from object (resulting in the default value if it
    had no such declared property).

get_prop_addr   Get address of property data for given object's property.

get_prop_len   Get length of property data.

get_child   Get first object contained in given object, branching if there
    are none (i.e., if this is nothing, or 0).

get_cursor   Puts the current cursor row into the first word of the given
    table, and the current cursor column into the second word.

get_parent   Get parent object (note that this has no "branch if nothing"
    clause).

get_sibling   Get next object in tree, branching if this is nothing (i.e. 0).

get_wind_prop   The eight windows (in version 6) have 16 properties, numbered
    0 to 15, which can be read using this call and (mostly) written
    using put_wind_prop.  The 16 properties are:
       0  y coordinate  6   left margin size            12  font number
       1  x coordinate  7   right margin size           13  font size
       2  y size        8   newline interrupt function  14  attributes
       3  x size        9   interrupt countdown         15  line count
       4  y cursor      10  highlight mode
       5  x cursor      11  colour data

    These properties are all explained elsewhere except for 8 and 9, about
    "newline interrupts".  If the countdown is set non-zero, it begins to
    count downwards, once per new-line.  When it then hits zero, the
    interrupt function is called.  This is provided so that text can be
    shaped past crinkly margins (e.g., to roll nicely around a picture)
    because the interrupt function can fix the margins at the crucial moment.
    The interrupt function should not attempt to print anything to the same
    window!

    Window coordinates are relative to the screen; cursor coordinates are
    relative to the window.

    Font size contains two bytes: height then width, in pixels.  Colour data
    similarly gives foreground, then background colour.

icall   This is an Inform internal name for "call to a function whose
    address is supplied, not its name".  It allows calculated calls; but takes
    no arguments.  It stores the result as call does.

inc    Increment variable.

inc_chk   Increment variable, and branch if now equal to value.

input_stream   Switches the input stream (the source of the player's commands).
    0 is the keyboard, and 1 a command file (the idea is that a list of
    commands produced by output_stream 4 can be fed back in again: Zip
    provides this useful feature).

insert_obj   Moves object to destination (it need not be removed from the tree
    first).

je     Jump if a = b.

jg     Jump if a > b  (note: not a>=b).

"jge"   Inform used to call jg this, which was rather confusing, and now it
    is withdrawn.

jl     Jump if a < b  (note: not a<=b).

"jle"   Inform used to call jl this, which was rather confusing, and now it
    is withdrawn.

jump   Jump (unconditionally) to the given label.  It is safe to jump into a
    different routine but care is advisable.  (The operand to jump is always
    a 2-byte signed offset: not an absolute routine address.)

jz     Jump if a = 0.

load   Results in the value of the given variable: so load v1 v2 actually
    does "v2 = v1".  This is better done with store or push as appropriate
    and Inform never uses it in compiled code.

loadb   Stores table->index.

loadw   Stores table-->index.

log_shift   Does a logical shift of number by the given number of places,
    shifting left (i.e. increasing) if places is positive, right if negative.
    In a right shift, the sign is zeroed instead of being shifted on.  (The
    alternative behaviour is art_shift.)

lstore   Inform names this to force store to take the "long"
    form; it is only used internally.

make_menu   Provided for the benefit of the Apple Macintosh, and who are
    we to object.  Interpreters which don't provide menus are supposed to set
    a bit to say so in the header, but anyway this instruction can simply
    do nothing and not branch if there are no menus (or if there are too many
    already).

    The menu number to be added has to be more than 2 (since 0 is the Apple
    menu, 1 the File menu, 2 the Edit menu).  If the table supplied is 0,
    the menu is removed.  Otherwise it is a table of tables.  Each table is
    an ASCII string: the first item being a menu name, subsequent ones the
    entries.

mod    Remainder after signed 16-bit division.

mouse_window   Constrain the mouse arrow to sit inside the given window.
    By default it sits in window 1.  Setting to -1 takes all restriction away.
    (The mouse clicks are not reported if the arrow is outside the window
    and interpreters are presumably supposed to hold the arrow there by
    hardware means if possible.)

move_window   Moves the given window to pixels (y,x): (1,1) being
    the top left.  Nothing actually happens (since windows are entirely
    notional transparencies): but any future plotting happens in the new place.

mul    Signed 16-bit multiplication.

new_line   Print carriage return.

nop    Probably the official "no operation" instruction.  Ironically,
    since there is hardly ever any point in using it (self-modifying code is
    illegal in the Z-machine since the code is outside the save area)
    interpreters sometimes do not bother to implement it... and crash.
    (In any event, no V1 to V5 datafile actually uses this opcode.)

not    Bitwise not (i.e., all 16 bits reversed).  Note: in versions 3 and 4
    this was a one-operand instruction (as would be expected) but in versions
    5 and 6 it was pushed into the extended set to make room for call_1n.
    (Inform knows which to compile to.)
    (Note also that although this opcode seems to belong to V3, it is not
    in fact used until V4.)

or     Bitwise or.

output_stream   Text can be output to a variety of different "streams",
    possibly simultaneously.  0 does nothing.  +n switches stream n on,
    -n switches it off.  The output streams are: 1 (the screen),
    2 (the game transcript), 3 (memory) and 4 (script of player's commands).
    Thus, one can turn the screen off and print only to the transcript,
    for instance.  Zip does now provide 4, which is extremely useful
    in debugging games.  Other interpreters do not.

    Case 3 is more complicated.  Here the syntax is:
       output_stream 3 table

    and the text is printed into the table+2, the first word always holding
    the number of characters printed.  Printing is never buffered in this
    stream, whatever the state of buffer_mode.

    In Version 6, the total number of pixels width is kept in a field in
    the game's header.  Also, the width field may optionally be given,
    and the text will then be justified as if it were in the window with
    that number (if width is positive) or a box -width pixels wide (if
    negative).  Then the table will contain not ordinary text but formatted
    text: see print_form.

    In version 3 (which does not have this opcode) transcripting is caused
    purely by setting the header bit.  In higher versions games do this
    as well anyway, despite using the opcode.

picture_data   Asks the interpreter for data on the picture with the given
    number.  This is a branch instruction: if the picture number is not valid,
    no branch is made.  Otherwise information is written to the table and a
    branch occurs.

    If the number is zero, the first word of the table is simply written as
    the highest legal picture number, and the second word is the highest
    legal picture number.

    Otherwise, the first word of the table contains the height and the second
    the width.

piracy   Branches if the game disc is believed to be genuine by the
    interpreter (which is assumed to have some evil way of finding out).
    Earlier specifications suggested this to be an unconditional branch
    instruction... interpreter writers are urged to code it as such,
    and Z-code programmers not to use it at all.

pop    This exists only in versions 3 and 4, and simply throws away the top of
    the stack.  (The need for it was largely circumvented by the
    call-and-throw-away-result instructions.)  The same opcode was then used
    for catch which tends to crash the machine if used naively.

pop_stack   In Version 6, an honest pop instruction was finally re-invented.
    This throws the given number of items off the system stack, unless a stack
    is given as a second argument, in which case it pops off that one instead.

print   Print the quoted (literal) string.

print_addr   Print (Z-encoded) string at given byte address.

print_char   Print ASCII character.

print_form   Prints a formatted table of the kind produced when the output
    stream is 3.  This is an elaborated version of print_table to cope with
    fonts, pixels and other impedimenta.  It is a sequence of lines,
    terminated with a zero word.  Each line is a word containing the number
    of characters, followed by that many bytes which hold the characters
    concerned.

print_num   Print (signed) number in decimal.

print_paddr   Print the (Z-encoded) string at the given packed address.

print_ret   Print the quoted (literal) string, and print a new-line, and
    then return true (i.e., 1).

print_table   Prints a rectangle of text on screen spreading right and
    down from the current cursor position, of given width and height, from
    the table of ASCII text given.  (Height is optional and defaults to 1.)
    If a skip value is given, then that many characters of text are skipped
    over in between each line and the next.  So one could make this display,
    for instance, a 2 by 3 region of a giant 40 by 40 character graphics
    map.

pull   Pulls value off the stack (crashing if it underflows).  In versions
    5 and 6, the stack in question may be specified as a user one.  A user
    stack is just a table of words in the save area somewhere, whose first
    word always holds the number of spare slots on the stack (so the initial
    value is the capacity of the stack).  User stacks are not well interpreted.

push   Pushes value onto the system stack.

push_stack   Pushes the value onto the user-specified stack, and branches
    if successful.  If the stack was full already, nothing happens and no
    branch is made.

put_prop   Write value to the given property of the given object (this crashes
    the machine if the object has no such property).  The interpreter stores
    a word or a byte as appropriate.

put_wind_prop   Writes a window property (see get_wind_prop).  This should
    only be used when there is no direct command (such as move_window) to
    use instead, as some such operations may have side-effects.

quit   Exit the game.  (Any "Are you sure?" question must be asked by
    the game, not the interpreter.)  It is not legal to return from the main
    routine: this must be used.

random   Returns a random number between 1 and range (supposing range to be
    positive).
    If range is negative, it is used as a seed for the random number generator
    (different interpreters do this in different ways), to make the generator
    predictable.  Random then returns 0.
    If range is zero, some interpreters crash (though they absolutely should
    not).  Correct behaviour is to reset the generator to some suitable seed
    value (say, taken from a real-time clock).  Again, random should then
    return 0.

read   The two forms of read are called aread and sread by Inform,
    for the sake of clarity (Advanced and Standard read).  read is actually
    a high-level Inform command which compiles suitably portable code for
    either version.

read_char   Reads a single character.  The stream (the first operand) is
    always 1, meaning the keyboard for some reason.  Time and function are
    optional and dealt with as in aread.  Function keys return special
    values from 129 onwards:
     up  down  left  right  f1 ... f12  keypad 0...9
     menu click  double mouse click  single mouse click
     (Mice only being at play in version 6.)

read_mouse   The four words in the table are written with the mouse
    y coordinate, x coordinate, button bits (low bits on the right of the
    mouse, rising as one looks left), and a menu word.  In the menu word,
    the upper byte is the menu number (from 1) and the lower byte is the
    item number (from 0).

restore   See save.  In version 3, the branch is never actually made,
    since either the game has successfully picked up again from where it
    was saved, or it failed to load the save game file.  From version 5
    it can have optional parameters as save does, and returns the number
    of bytes loaded if so.  If the restore fails, 0 is returned, but once
    again this necessarily happens since otherwise control is already
    elsewhere.

restore_undo   Like restore, but restores from the internal RAM saved
    game made by save_undo.  (The optional parameters of restore may not be
    supplied.)

restart   Restarts the game.  (Any "Are you sure?" question must be asked
    by the game, not the interpreter.)

ret    Returns the value given.

ret_popped   Pops top of stack and returns that.  This is equivalent to
    ret sp, but is one byte cheaper.

"retsp"   Obselete name for ret_popped.

rfalse   Return false (i.e., 0).

rtrue   Return true (i.e., 1).

same_parent   Compare parent objects of the two given: branch if equal.

save   On versions 3 and 4, this attempts to save the game (all questions
    about filenames are asked by interpreters) and branches if successful.
    From version 5 it moves to the extended set, as a result of which it is
    no longer a branch instruction, and works in a different way (see the
    explanation above).  This returns 0 for failure, 1 for "save
    succeeded" and 2 for "the game is being restored and is resuming
    execution again from here, the point where it was saved".

    The extension also has (optional) parameters, which save a region of
    the save area, whose address and length are in bytes, and provides a
    suggested filename: name is a pointer to an array of ASCII characters
    giving this name (as usual preceded by a byte giving the number of
    characters).

save_undo   Like save, except that the optional parameters may not be
    specified: it saves the game into a cache of RAM held by the interpreter.
    (This is typically done once per turn, in order to implement "UNDO", so
    it needs to be quick.)  It may also return -1, meaning that the
    interpreter is unable to offer this feature.  (Alas, most interpreters
    do not understand this opcode well enough to be able to confess to being
    unable to act on it.)

scan_table   Is x one of the words in table, which is len words
    long?  If so, return the address where it first occurs and branch.  If not,
    return 0 and don't.

    The form is optional (and only used in version 5?): bit 8 is set for
    words, clear for bytes: the rest contains the length of each field in
    the table.  (The first word or byte in each field being the one looked
    at.)  Thus $82 is the default.

"scanw"   Obselete name for scan_table.

scroll_window   Scrolls the given window by the given number of pixels
    (a negative value scrolls backwards, i.e., down) writing in blank
    (background colour) pixels in the new lines.  This can be done to any
    window and is not related to the "scrolling" attribute of a window
    (which controls text scrolling, a different matter).

set_attr   Give object the attribute.

set_colour   If coloured text is available, set text to be foreground-against-
    background, where colour numbers are borrowed from the IBM PC:
    2 - black, 3 - red, 4 - green, 5 - yellow, 6 - blue, 7 - magenta, 8 - cyan,
    9 - white: in addition, 0 means keep the current colour setting, 1 means
    use the default and -1 means the colour of the pixel under the mouse arrow

    One of the V5 games, 'Beyond Zork', uses this (Paul David Doherty reports
    it as used "76 times in 870915 and 870917, 58 times in 871221") and from
    the structure of the table it clearly logically belongs in version 5.

    Text styles such as bold and underline may also be realised with colour
    changes, if this is used.

set_cursor   Move cursor in the current window to (x,y) character position
    (relative to (1,1) in the top left).  (In version 6 the window is supplied
    and need not be the current one.)  Each window remembers its own cursor
    position.  Using this call may result in any buffered text being printed
    out first (if word-wrapping is going on, for instance).

    In V6, set_cursor -1 turns the cursor off, and either set_cursor -2 or
    set_cursor -2 0 turn it back on.  It is not known what, if anything, this
    second argument means: in all known cases it is 0.

set_flag   See clear_flag.

set_font   The (text) font in the given window is changed.  All windows
    (and this includes both windows in Version 5, contrary to common
    interpreter practice) seem to be expected to start with a non-fixed-space
    font.  Anyway font 0 means "keep current one" (this seems less than
    altogether useful), font 1 means "default", font 3 refers to character
    graphics fonts (in versions 5 and 6) and font 4 means a fixed space font.

    No such opcode exists in versions 3 and 4:  turning on and off the
    fixed space font is done by altering a bit in the header as usual.  This
    remains the best way for interpreters to work even in higher versions.

set_margins   Sets the margin widths (in pixels) on the left and right
    for the given window which are by default 0.  These are only used by
    windows which have word-wrapping (i.e., buffer_mode 1) and do nothing
    for others.

set_text_style   Sets printing style: 0 means normal, 1 means inverse video,
    2 means bold, 4 means underline.  (In version 6, 8 means change to a
    fixed-width characters font.)  In principle the interpreter should
    clear flags in the header according to which of these it is unable to
    provide (in practice, few bother, and it doesn't much matter).

set_window   Moves text output to one of the windows.  0 is the default
    (lower) window and 1 means the upper one.  This only just counts as a
    version-3 instruction: it was used by 'Seastalker' on some machines.

    In version 6 this is much more fulsome.  There are 8 windows, 0 to 7,
    which can do almost anything.  In addition, the window number -3
    means "the current window", in this and all the other calls.

"show_score"   Obselete name for show_status.

show_status   (In version 3 only)  Display and update the status line now
    (don't wait until the next keyboard input).  Ideally this should not
    crash in version 5, since the v5 release of 'Wishbringer' (V23) contains
    this opcode by accident.

sound_effect   'The Lurking Horror' used this opcode, but no other version-3
    game did: the v5 game 'Sherlock' also used its full form.  See beep, the
    Inform name for the simpler form of this opcode in versions 4 and 5.

    In Version 6, this produces the given sound (1 meaning a high-pitched
    beep, 2 meaning a low one and other values corresponding to noises
    held by the interpreter) at the given volume (1 to 8: -1 being the
    default, loudest value: Mark Howell suggests $34FB causes fade in
    and $3507, fade out) repeated the given number of times (-1 now meaning
    forever).  The "effect" can be: 1 (prepare), 2 (start), 3 (stop), 4
    (finish with).  (Preparation means in effect loading the sample file
    off disc.)

    Version 5 (and 3) is similar but the parameters seem to be less
    sensibly arranged, as shown.

split_window   Divides the screen into two windows, an upper one (of the
    stated number of lines) which is in effect a big status bar, and a
    lower one (all the rest).  This only just counts as a version-3
    instruction: it was used by 'Seastalker' on some machines.
    In V6, this seems to be used just to bound the cursor movement.  'Journey'
    creates a status region which is the whole screen and then overlays it
    with two other windows.

sread   Standard (version 3) form of read.  For details, see the read
    command's description in section (8).  Note that this automatically
    redisplays the status line before the keyboard is listened to.

store   Set variable to value.

storeb   table->byte = value.

storew   table-->word = value.

sub    Signed 16-bit subtraction.

test   Jump if any of the flags in bitmap are set
    (i.e. if bitmap & flags ~= 0).

test_array   See clear_flag (ITF makes this come out unconditionally false, though).

test_attr   Jump if object has attribute.

throw   Opposite of catch.  This causes the game to behave as if the
    current routine was that whose stack-frame is given (which was found
    using catch at the right moment).  Thus the next return to happen
    will return as if from the "caught" routine.  This is useful for getting
    out of large recursive tangles in a hurry, if an error has occurred.
    (This opcode plainly belongs to the V5 specification, but is not actually
    used in any V5 game.)

tokenise   The parser (strictly speaking, the lexical analyser) from aread.
    The given text is parsed into the given parse table.  Unlike in version
    3, aread appends to the parse table, not over-writes it.

    If a non-zero dictionary is supplied, it is used (if not, the ordinary
    game dictionary is).  If the flag is set, unrecognised words are not
    listed as zero in the parse table: this is presumably so that if several
    tokenises are done in a row, each fills in more slots without wiping those
    filled by the others.

    Parsing a user dictionary is slightly different.  A user dictionary
    should look just like the main one, except that it should have no
    "separator" characters listed (the ones listed in the main one are
    valid instead), and that it need not be alphabetically sorted.  If the
    number of entries is given as $-n$, then the interpreter reads this as
    "n entries unsorted".  This is very convenient if the table is being
    altered in play: if, for instance, the player is naming things.
verify   Some version-3 interpreters are said not to implement this.  It
    counts a (two byte, unsigned) checksum of the file from $0040 onwards and
    compares this against the value in the game header, branching if correct.

vje    Internal Inform name for the variable-length form of je (for
    compiling conditions such as a==1 or 2 or 4).

window_size   Change size of window in pixels.  (Does not change the current
    display.)

window_style   Changes the four attributes for a given window.  The bits in
    question are: 1 - word wrapping (if this is off text is clipped to the
    window size instead), 2 - scrolling, 3 - text to be sent to the printer
    (if transcripting is switched on), 4 - text is buffered.

    The operation, by default, is 0.  0 means "set to these settings".  1 means
    "set the bits supplied".  2 means "clear the ones supplied", and 3 means
    "reverse the bits supplied" (i.e. exclusive or).


  10    Header format through the ages

The initial block of 64 bytes in the Z-machine, the "header", is of
particular fascination to Infocom hackers and many tables have been drawn up
of its contents.  The table here steals from its predecessors (I am
particularly indebted to Paul David Doherty) but also fills some gaps to do
with version 6.

Once again "V" refers to the earliest version in which the feature appeared;
"Dyn" is marked if the entry is dynamic, i.e. changes as the game plays;
"Int" if it is written by the interpreter (otherwise it is set in or by the
game file).

Bits in a byte are numbered from 0 ($01) up to 7 ($80).

 Hex  V  Dyn Int  Contents
  0   1           Version number (1 to 6)
  1   3           Flags 1:
      3       *     Bit 0    (unused: possibly a flag to indicate byte sex,
                              i.e. LSB-MSB or MSB-LSB in 2-byte words, at
                              a time when two different forms of game file
                              was considered: no such forms ever emerged)
                        1    Status line type: clear for score/moves,
                               set for time in hours/minutes
              *         2    (unused: set in V3?)
              *         3    The legendary "Tandy" bit (see below)
              *         4    The interpreter sets this if it cannot
                               produce a status line
              *         5    Interpreter sets if it _can_ split the screen
                               (only `Seastalker' uses this in V3)
              *         6    Interpreter sets if it uses non-fixed-space
                               fonts
                        7    (unused)
       4      *   Flags 1:   Interpreter sets bits to say what it can do:
       4      *     Bit 0    (always set)
       6      *              Colours available?
       4      *         1    (always set)
       6      *              Picture displaying available?
       4      *         2    Boldface available?
       4      *         3    Underlining available?
                             (the only one of these flags any V4/5 games
                             actually ever looked at)
       4      *         4    Fixed-space font available?
       6      *         5    Sound effects available?
                      6,7    (unused)
  2   1           Release number
  4   1           Start of code area (bytes)
  6   1           Main routine address (uniquely, a byte address which
                    points to first byte of code in routine)
  8   1           Dictionary address (bytes)
  A   1           Object table address (bytes)
  C   1           Global variables table address (bytes)
  E   1           Size of save area (bytes)


 Hex  V  Dyn Int  Contents
 10   3   *       Flags 2:
          *         Bit 0    Printer transcripting happens when the game
                               sets this bit
          *             1    The interpreter is forced to use a fixed-space
                               font when the game sets this bit
                               (does not apply in version 6?)
      6   *   *         2    If the interpreter thinks the status line needs
                               redrawing (because, e.g., the player has
                               dragged a menu across it) it sets this bit.
                               The game should notice, redraw the status
                               line and clear the bit itself.
      6                 3    If set, game wants to use pictures
      3                 4    Set in the Amiga version of The Lurking Horror
                               so presumably to do with sound effects
      5                      If set, game wants to use the UNDO opcodes
      6                 5    If set, game wants to use a mouse
      6                 6    If set, game wants to use colours
      6                 7    If set, game wants to use sound effects
      6                 8    If set, game wants to use menus
                               (In each case except bit 6, if the
                               interpreter cannot manage the given feature,
                               it should clear the relevant bit again.)
                        9    (unused)
          *   *        10    Possibly set by interpreter to indicate an error
                               with the printer during transcription
                    11-15    (unused)
 12   2           Serial number (six characters of ASCII, conventionally                 the compilation date in the form YYMMDD)
 18   2           Synonyms table address (bytes)
 1A   3+          Length of file (in words (V3) or longwords (V4,5,6))
 1C   3+          Checksum of file (sum of bytes from $0040 to length
                    by unsigned 16-bit addition)
 1E   4       *   Interpreter number, identifying the machine as one of:
                    1   DECSystem-20     6   IBM PC
                    2   Apple IIe        7   Commodore 128
                    3   Macintosh        8   Commodore 64
                    4   Amiga            9   Apple IIc
                    5   Atari ST        10   Apple IIgs
                                        11   Tandy Color
                    The latest versions of the portable interpreters I
                    have seen are: InfoTaskForce 2 Version A
                                   Zip           6 Version B
 1F   4       *   Interpreter version (a single ASCII character,
                    conventionally running through capital letters from A)



 Hex  V  Dyn Int  Contents
 20   4       *   Screen height (lines): 255 means "infinite", i.e. never
                    worry about screen overflow and never produce [MORE]
 21   4       *   Screen width (characters)
 22   5   *   *   Leftmost screen coordinate
 23   5   *   *   Rightmost screen coordinate
 24   5   *   *   Highest screen coordinate
 25   5   *   *   Lowest screen coordinate
 26   5   *   *   Width in these coordinate terms of a character in the
                  current font
 27   5   *   *   Similarly, font height
                  (Note: it is perfectly permissible for 22 to 25 to be
                  character grid positions, and the width and height both
                  to be 1: or they could all be in pixels.)
 22   6       *   Screen width in pixels
 24   6       *   Screen height in pixels
 26   6   *   *   Font height in pixels
 27   6   *   *   Font width in pixels (defined as width of a '0')
                  (Note: 22-27 are similar in V6 to V5, with the coordinates
                  now being pixels, but the highest and leftmost slots are
                  dropped (both values being 1) to give room for 2-byte values,
                  i.e. for resolutions of more than 255 pixels.)
 28   6           Functions extra offset (longwords): this may be 0.  It is
                    added to all function addresses and effectively allows
                    the program to exceed the 256K maximum address space
                    by the size of the save area
 2A   6           Static strings extra offset (longwords): similar (needed
                    since static strings come last, after the functions)
 2C   6       *   Default background colour
 2D   6       *   Default foreground colour
 2E   6           Address of terminating characters table (bytes)
 30   6   *   *   Slot used when the output_stream is to memory, to record
                    total width of text in pixels
 32  ---          (these 2 bytes unused in any version)
 34   5           Character set table address (bytes), or 0 if the default
                    character set is to be used
 36   6           Mouse data table address (bytes)
 38   6       *   8 bytes of ASCII: the player's user-name on Infocom's
                    own mainframe, used for debugging purposes and
                    possibly allowing users access to special features.

       Some early version-3 files do not contain length and checksum data, hence
       the mysterious 3+.


The "Tandy" bit
---------------

Some early Infocom games were sold by the Tandy Corporation, who seem to
have been sensitive souls.  'Zork I' pretends not to have sequels if it
finds this bit set.  And to quote Paul David Doherty:

   In 'The Witness', the Tandy Flag can be set while playing the game,
   by typing $DB and then $TA.  If it is set, some of the prose will be
   less offensive. For example, "private dicks" become "private eyes",
   "bastards" are only "idiots", and all references to "slanteyes" and
   "necrophilia" are removed.

We live in an age of censorship.

The character set table
-----------------------

Is 78 bytes long, arranged as 3 blocks of 26 ASCII values for what
characters to print when translating text.  (The first two characters of
block 3 are ignored anyway as they correspond to newline and the literal
escape code.)  This feature is implemented by Zip but not ITF, which
means that the German translation of 'Zork I' (which uses the character
set for non-English letters like 'sz') is illegible on it.

The terminating characters table
--------------------------------

Is a zero-terminated list of character codes which cause read to finish
(other than new-line).  An entry of 255 means that any function key
terminates input.

The mouse data table
--------------------

Seems to have been intended to grow at some future time, because the first
word is the length of it.  But the only data is the second and third words:
the mouse x and y coordinates respectively.  The interpreter writes these
and they alter.


  11   A few statistics

To give some idea of the sizes found in typical story files, here are a few
statistics, mostly gathered by Paul David Doherty, whose "fact sheet" file
contains many more.


(i) Length
  The shortest files are those dating from the time of the 'Zork'
  trilogy, at about 85K; middle-period version 3 games are typically 105K,
  and only the latest use the full memory map.  In versions 4 and 5, only
  'Trinity', 'A Mind Forever Voyaging' and 'Beyond Zork' use the full 256K.
  'Border Zone' and 'Sherlock', for instance, are about 180K.  (The author's
  short story 'Balances' is about 50K, an edition of 'Adventure' takes 80K,
  and 'Curses' about 240K.)


(ii) Code size
  'Zork I' uses only about 5500 opcodes, but the number rises
  steeply with later games; 'Hollywood Hijinx' has 10355 and, e.g.
  'Moonmist' has 15900 (both these being version 3).  Against this, 'A Mind
  Forever Voyaging' has only 18700, and only 'Trinity' and 'Beyond Zork'
  reach 32000 or so.  (Inform games are more efficiently compiled and make
  better use of common code - the library - so perform much better here:
  the version 3, release 10 of 'Curses' (128K long, and a larger game than
  any Infocom v3 game) has only 6720 opcodes.)
(iii)  Objects and rooms
  Obviously, this varies greatly with the style of
  game.  'Zork I' has 110 rooms and 60 takeable objects, but several quite
  complex games have as few as 30 rooms (the mysteries, or 'Hitch-hikers').
  The average for version-3 games is 69 rooms, 39 takeable objects.

  'A Mind Forever Voyaging' contains many rooms (178) but few objects (30).
  'Trinity', a more typical style of game, contains 134 rooms and 49
  objects: the version-5 'Curses' has a few more of each.  Of the version-6
  games, only 'Zork Zero' scores highly here, with 215 rooms and 106
  objects.  The average for version 4/5 games is 105 rooms and 54 objects.


(iv) Dictionary
  Early games such as 'Zork I' know about 600 words, but
  again this rises steeply to about 1000 even in v3.  Later games know
  1569 ('Beyond Zork') to the record, 2120 ('Trinity').  (This is achieved
  by heroic inclusion of unlikely synonyms: e.g. the Japanese lady with the
  umbrella can be called WOMAN, LADY, CRONE, MADAM, MADAME, MATRON, DAME or
  FACE with any of the adjectives OLD, AGED, ANCIENT, JAP, JAPANESE,
  ORIENTAL or YELLOW.)  V6 games have smaller dictionaries.


