Mark A. Durham : Introduction to Redcode
Rice University, Houston, TX
durham@ricevm1.rice.edu
Thu, 14 Nov 1991 09:45:13 GMT
I. Preface - Reader Beware!
II. Notation
III. MARS Peculiarities
IV. Address Modes
V. Instruction Set
I. Preface - Reader Beware!
The name "Core War" arguably can be claimed as public domain. Thus, any program
can pass itself off as an implementation of Core War. Ideally, one would like
to write a Redcode program on one system and know that it will run in exactly
the same manner on every other system. Alas, this is not the case.
Basically, Core War systems fall under one of four catagories: Non-ICWS,
ICWS'86, ICWS'88, or Extended. Non-ICWS systems are usually a variant of Core
War as described by A. K. Dewdney in his "Computer Recreations" articles
appearing in Scientific American. ICWS'86 and ICWS'88 systems conform to the
standards set out by the International Core War Society in their standards of
1986 and 1988, respectively. Extended systems generally support ICWS'86,
ICWS'88, and proprietary extensions to those standards. I will discuss
frequently common extensions as if they were available on all Extended systems
(which they most certainly are not).
I will not describe Non-ICWS systems in this article. Most Non- ICWS systems
will be easily understood if you understand the systems described in this
article however. Although called "standards", ICWS'86 and ICWS'88 (to a lesser
extent) both suffer from ambiguities and extra-standard issues which I will try
to address.
This is where the reader should beware. Because almost any interpretation of
the standard(s) is as valid as any other, I naturally prefer MY interpretation.
I will try to point out other common interpretations when ambiguities arise
though, and I will clearly indicate what is interpretation (mine or otherwise)
as such.
You have been warned!
II. Notation
"86:" will indicate an ICWS'86 feature. "88:" will indicate an ICWS'88 feature.
"X:" will indicate an Extended feature. "Durham:" will indicate my biased
interpretation. "Other:" will indicate interpretations adhered to by others.
"Commentary:" is me explaining what I am doing and why. "Editorial:" is me
railing for or against certain usages. Items without colon-suffixed prefaces
can be considered universal.
Redcode consists of assembly language instructions of the form
,
An example Recode program:
; Imp
; by A. K. Dewdney
;
imp MOV imp, imp+1 ; This program copies itself ahead one
END ; instruction and moves through memory.
The is optional.
86: begins in the first column, is one to eight characters long,
beginning with an alphabetic character and consisting entirely of
alphanumerals. Case is ignored ("abc" is equivalent to "ABC").
88: as above, except length is not limited and case is not addressed.
Only the first eight characters are considered significant.
X: can be preceded by any amount of whitespace (spaces, tabs, and
newlines), consists of any number of significant alphanumerals but must
start with an alphabetic, and case is significant ("abc" is different from
"ABC").
Commentary: I will always use lowercase letters for labels to
distinguish labels from opcodes and family operands.
The is separated from the (if there is one) by whitespace.
Opcodes may be entered in either uppercase or lowercase. The case does not
alter the instruction. DAT, MOV, ADD, SUB, JMP, JMZ, JMN, DJN, CMP, SPL,
and END are acceptable opcodes.
86: SPACE is also recognized as an opcode.
88: SLT and EQU are recognized as opcodes. SPACE is not.
X: All of the above are recognized as opcodes as well as XCH and PCT, plus
countless other extensions.
Commentary: END, SPACE, and EQU are known as pseudo-ops because
they really indicate instructions to the assembler and do not produce
executable code. I will always capitalize opcodes and pseudo-ops to
distinguish them from labels and text.
The and taken together are referred to as the A-operand.
Similarly, the combination is known as the B-operand. The
A-operand is optional for some opcodes. The B-operand is optional for some
opcodes. Only END can go without at least one operand.
86: Operands are separated by a comma.
88: Operands are separated by whitespace.
X: Operands are separated by whitespace and/or a comma. Lack of a comma can
lead to unexpected behaviour for ambiguous constructs.
Commentary: The '88 standard forces you to write an operand without
whitespace, reserving whitespace to separate the operands. I
like whitespace in my expressions, therefore I prefer to separate my
operands with a comma and will do so here for clarity.
is # (Immediate Addressing), @ (Indirect Addressing), or <
(86: Auto-Decrement Indirect, 88: Pre-Decrement Indirect).
A missing mode indicates Direct Addressing.
86: $ is an acceptable mode, also indicating Direct Addressing.
88: $ is not an acceptable mode.
X: $ is an acceptable mode as in 86:.
Commentary: The distinction between Auto-Decrement Indirect Addressing
and Pre-Decrement Indirect Addressing is semantic, not syntactic.
is any combination of labels and integers separated by the arithmetic
operators + (addition) and - (subtraction).
86: Parentheses are explicitly forbidden. "*" is defined as a special label
symbol meaning the current statement.
88: Arithmetic operators * (multiplication) and / (integer division) are added.
"*" is NOT allowed as a special label as in 86:.
X: Parentheses and whitespace are permitted in expressions.
Commentary: The use of "*" as meaning the current statement may be
useful in some real assemblers, but is completely superfluous in a
Redcode assembler. The current statement can always be referred to as 0
in Redcode.
begins with a ; (semicolon), ends with a newline, and can have any
number of intervening characters. A comment may appear on a line by
itself with no instruction preceding it.
88: Blank lines are explicitly allowed.
I will often use "A" to mean any A-operand and "B" to mean any B-operand
(capitalization is important). I use "a" to mean any A- field and "b" to mean
any B-field. For this reason, I never use "a" or "b" as an actual label.
I enclose sets of operands or instructions in curly braces. Thus "A" is
equivalent to "{ a, #a, @a, III. MARS Peculiarities
There are two things about MARS which make Redcode different from any other
assembly language. The first of these is that there are no absolute addresses
in MARS. The second is that memory is circular.
Because there are no absolute addresses, all Redcode is written using relative
addressing. In relative addressing, all addresses are interpreted as offsets
from the currently executing instruction. Address 0 is the currently executing
instruction. Address -1 was the previously executed instruction (assuming no
jumps or branches). Address +1 is the next instruction to execute (again
assuming no jumps or branches).
Because memory is circular, each instruction has an infinite number of
addresses. Assuming a memory size of M, the current instruction has the
addresses { ..., -2M, -M, 0, M, 2M, ... }. The previous instruction is { ...,
-1-2M, -1-M, -1, M-1, 2M-1, ... }. The next instruction is { ..., 1-2M, 1-M, 1,
M+1, 2M+1, ... }.
Commentary: MARS systems have historically been made to operate on
object code which takes advantage of this circularity by insisting that
fields be normalized to positive integers between 0 and M-1, inclusive.
Since memory size is often not known at the time of assembly, a loader
in the MARS system (which does know the memory size) takes care of field
normalization in addition to its normal operations of code placement and
task pointer initialization.
Commentary: Redcode programmers often want to know what the memory size
of the MARS is ahead of time. This is not always possible. Since
normalized fields can only represent integers between 0 and M-1
inclusive, we can not represent M in a normalized field. The next best
thing? M-1. But how can we write M-1 when we do not know the memory
size? Recall from above that -1 is equivalent to M-1. Final word of
caution: -1/2 is assembled as 0 (not as M/2) since the expression is
evaluated within the assembler as -0.5 and then truncated.
86: Only two assembled-Redcode programs (warriors) are loaded into MARS
memory (core).
88: Core is initialized to (filled with) DAT 0, 0 before loading any warriors.
Any number of warriors may be loaded into core.
Commentary: Tournaments almost always pit warrior versus warrior with
only two warriors in core.
MARS is a multi-tasking system. Warriors start as just one task, but can
"split" off additional tasks. When all of a warriors tasks have been killed,
the warrior is declared dead. When there is a sole warrior still executing in
core, that warrior is declared the winner.
86: Tasks are limited to a maximum of 64 for each warrior.
88: The task limit is not set by the standard.
IV. Address Modes
Addressing modes subtly (sometimes not-so-subtly) alter the behaviour of
instructions. A somewhat brief description of their general properties is given
here. Specifics will be left to the instruction set section.
An octothorpe (#) is used to indicate an operand with an Immediate Address Mode.
Immediate mode data is contained in the current instruction's field. If the A
-mode is immediate, the data is in the A-field. If the B-mode is immediate, the
data is in the B-field.
If no mode indicator is present (86: or the US dollar sign '$' is present),
Direct Address Mode is used. Direct addresses refer to instructions relative to
the current instruction. Address 0 refers to the current instruction. Direct
address -1 refers to the (physically) previous instruction. Direct address +1
refers to the (physically) next instruction.
The commercial-at (@) is used to indicate Indirect Address Mode. In indirect
addressing, the indirect address points to an instruction as in direct
addressing, except the target is not the instruction to which the indirect
address points but rather the instruction pointed to by the B-field of the
instruct pointed to by the indirect address.
Example:
x-2 DAT #0, #0 ; Target instruction.
x-1 DAT #0, #-1 ; Pointer instruction.
x MOV 0, @-1 ; Copies this instruction to location x-2.
The less-than (<) is used to indicate (86: Auto-, 88: Pre-) Decrement Indirect
Address Mode. Its behaviour is just like that of Indirect Address Mode, except
the pointer is decremented before use.
Example:
x-2 DAT #0, #0 ; Target instruction
x-1 DAT #0, #0 ; Pointer instruction. Compare to @ example.
x MOV 0, <-1 ; Copies this instruction to location x-2.
Commentary: Although Decrement Indirect addressing appears to be a
simple extension of Indirect addressing, it is really very tricky at
times - especially when combined with DJN. There are sematic
differences between the '86 and '88 standards, thus the change in name
from Auto-Decrement to Pre-Decrement. These differences are discussed
below. This discussion is non-essential for the average Redcode
programmer. I suggesting skipping to the next section for the weak
-stomached.
86: Durham: Instructions are fetched from memory into an instruction register.
Each operand is evaluated, storing a location (into an address register) and
an instruction (into a value register) for each operand. After the operands
have been evaluated, the instruction is executed.
Operand Evaluation: If the mode is immediate, the address register is loaded
with 0 (the current instruction's address) and the value register is loaded
with the current instruction. If the mode is direct, the address register
is loaded with the field value and the value register is loaded with the
instruction pointed to by the address register. If the mode is indirect,
the address register is loaded with the sum of the field value and the B
-field value of the instruction pointed to by the field value and the value
register is loaded with the instruction pointed to by the address register.
If the mode is auto-decrement, the address register is loaded with a value
one less than the sum of the field value and the B-field value of the
instruction pointed to by the field value and the value register is loaded
with the instruction pointed to by the address register. AFTER the operands
have been evaluated (but before instruction execution), if either mode was
auto-decrement, the appropriate memory location is decremented. If both
modes were auto-decrement and both fields pointed to the same pointer, that
memory location is decremented twice. Note that this instruction in memory
then points to a different instruction than either operand and also differs
from any copies of it in registers.
86: Other: As above,except there are no registers. Everything is done in memory.
Commentary: ICWS'86 clearly states the use of an instruction register,
but the other operand address and value registers are only implied.
Ambiguities and lack of strong statements delineating what takes place
in memory and what takes place in registers condemned ICWS'86 to eternal
confusion and gave birth to ICWS'88.
88: As above except everything is done in memory and Pre-Decrement Indirect
replaces Auto-Decrement Indirect. Pre-Decrement Indirect decrements memory
as it is evaluating the operands rather than after. It evaluates operand A
before evaluating operand B.
V. Instruction Set
DAT A, B
The DAT (data) instruction serves two purposes. First, it allows you to store
data for use as pointers, offsets, etc. Second, any task which executes a DAT
instruction is removed from the task queue. When all of warrior's tasks have
been removed from the queue, that warrior has lost.
86: DAT allows only one operand - the B-operand. The A-field is left undefined
(the example shows #0), but comparisons of DAT instructions with identical
B-operands must yield equality.
88: DAT allows two operands but only two modes - immediate and pre-decrement.
X: DAT takes one or two operands and accepts all modes. If only one operand is
present, that operand is considered to be the B-operand and the A-operand
defaults to #0.
Commentary: It is important to note that any decrement(s) WILL occur
before the task is removed from the queue since the instruction executes
only after the operand evaluation.
MOV A, B
The MOV (move) instruction either copies a field value (if either mode is
immediate) or an entire instruction (if neither mode is immediate) to another
location in core (from A to B).
86: Durham: MOV #a, #b changes itself to MOV #a, #a.
Commentary: There is a clear typographical error in ICWS'86 which
changes the interpretation of MOV #a, B to something non-sensical. For
those with a copy of ICWS'86, delete the term "B-field" from the next
-to-last line of the second column on page 4.
88: No immediate B-modes are allowed.
X: Immediate B-modes are allowed and have the same effect as a B-operand of 0.
ADD A, B
86: The ADD instruction adds the value at the A-location to the value at the
B-location, replacing the B-location's old contents.
88: If the A-mode is immediate,ADD is interpreted as above. If the A-mode is not
immediate, both the A-field and the B-field of the instruction pointed to by
the A-operand are added to the A-field and B-field of the instruction
pointed to by the B-operand, respectively. The B-mode can not be immediate.
X: Immediate B-modes are allowed and have the same effect as in 86:.
Example: ADD #2, #3 becomes ADD #2, #5 when executed once.
SUB A, B
The SUB (subtract) instruction is interpreted as above for all three cases,
except A is subtracted from B.
JMP A, B
The JMP (jump) instruction changes the instruction pointer to point to the
instruction pointed to by the A-operand.
86: JMP allows only one operand - the A-operand. The B-operand is shown as #0.
88: JMP allows both operands, but the A-mode can not be immediate.
X: JMP allows both operands and the A-mode can be immediate. An immediate
A-mode operand is treated just like JMP 0, B when executed.
JMZ A, B
The JMZ (jump if zero) instruction jumps to the instruction pointed to by the A
-operand only if the B-field of the instruction pointed to by the B-operand is
zero.
88: Immediate A-modes are not allowed.
JMN A, B
The JMN (jump if non-zero) instruction jumps to the instruction pointed to by
the A-operand only if the B-field of the instruction pointed to by the B-operand
is non-zero.
88: Immediate A-modes are not allowed.
DJN A, B
The DJN (decrement and jump if non-zero) instruction causes the B-field of the
instruction pointed to by the B-operand to be decremented. If the decremented
values is non-zero, a jump to the instruction pointed to by the A-operand is
taken.
88: Immediate A-modes are not allowed.
CMP A, B
The CMP (compare, skip if equal) instruction compares two fields (if either mode
is immediate) or two entire instructions (if neither mode is immediate) and
skips the next instruction if the two are equivalent.
Commentary: There is a clear typographical error in ICWS'86 which
changes the interpretation of CMP #a, B to something non-sensical. For
those with a copy of ICWS'86, delete the term "B-field" from the fifth
line from the bottom of the second column on page 5. Also, the comments
to the example on page 6 have been switched (equal is not equal and vice
versa). The labels are correct though.
88: Immediate B-modes are not allowed.
SPL A, B
The SPL (split) instruction splits the execution between this warrior's
currently running tasks and a new task. Example: A battle between two warriors,
1 and 2, where warrior 1 has two tasks (1 and 1') and warrior 2 has only one
task would look like this: 1, 2, 1', 2, 1, 2, 1', 2, etc.
86: SPL allows only one operand - the B-operand. The A-operand is shown as #0.
After executing the SPL, the next instruction to execute for this warrior is
that of the newly added task (the new task is placed at the front of the
task queue). A maximum of 64 tasks is allowed for each warrior.
88: SPL splits the A-operand, not the B-operand. After executing the SPL, the
next instruction to execute for this warrior is the same instruction which
would have executed had another task not been added (the new task is placed
at the back of the task queue). There is no explicit task limit on warriors.
Immediate A-operands are not allowed.
X: Immediate A-operands are allowed and behave as SPL 0, B when executed.
88: SLT A, B: The SLT (skip if less than) instruction skips the next instruction
if A is less than B. No Immediate B-modes are allowed.
X: Immediate B-modes are allowed.
X: XCH A, B: The XCH (exchange) instructions exchanges the A-field and the
B-field of the instruction pointed to by the A-operand.
X: PCT A, B: The PCT (protect) instruction protects the instruction pointed to
by the A-operand until the protection is removed by an instruction
attempting to copy over the protected instruction.
Pseudo-Ops: Instructions to the Assembler
END
The END pseudo-op indicates the end of the Redcode source program.
86: END takes no operands.
88: If END is followed by a label, the first instruction to be executed is that
with the label following END.
X: ORG A (origin) takes over this initial instruction function from END.
Commentary: If no initial instruction is identified, the first
instruction of your program will be the initial instruction. You can
accomplish the same effect as "END start" or "ORG start" by merely
starting your program with the instruction "JMP start".
86: SPACE A, B: The SPACE pseudo-op helps pretty-up Redcode source listings.
SPACE A, B means to skip A lines, then keep B lines on the next page. Some
assemblers do not support SPACE, but will treat it as a comment.
88: label EQU A: The EQU (equate) pseudo-op gives the programmer a macro-like
facility by replacing every subsequent occurrence of the label "label" with
the string "A".
Commentary: A normal label is a relative thing. Example:
x DAT #0, #x ; Here x is used in the B-field
x+1 DAT #0, #x ; Each instruction's B-field gives
x+2 DAT #0, #x ; the offset to x.
is the same as
x DAT #0, #0 ; Offset of zero
x+1 DAT #0, #-1 ; one
x+2 DAT #0, #-2 ; two
but
x! EQU 0 ; Equate label like #define x! 0
DAT #0, #x! ; Exclamation points can be used
DAT #0, #x! ; in labels (in Extended systems)
DAT #0, #x! ; I use them exclusively to indicate
; immediate equate labels.
is the same as
DAT #0, #0 ; A direct text replacement
DAT #0, #0 ; appears the same on every
DAT #0, #0 ; line it is used.