Diana Compiled Language Spec

I'd like to design my own CPU someday. I haven't gotten to it yet, but while playing around I do occasionally write my own ISAs (Instruction Set Architectures) . Then I roughly figure out how to implement them in hardware before deciding I don't feel like spending the next 6 months in KiCad and scrapping the idea...


I've been doing this for years, when I was 19 I decided I'd like to learn how compiler internals work, and what better way to learn than implementing an abstraction layer over one of my esoteric ISA's. I'm writing these articles 2-years after the fact, so excuse any technical oversights.


Chapters:

  1. Language Specification (you are here)

Below you'll find the specification for the language I'd eventually write.


Table of Contents:

The Diana-II ISA

The Diana II is a 6-bit minimal instruction set computer designed around using NOR as a universal logic gate. NOR doesn't allow bit permutations, so I used rotate lookup tables to perform those.

Instructions

BinaryInstructionDescription
00NOR [val] [val]Performs a negated OR on the first operand.
01PC [val] [val]Sets the program counter to the address [val, val].
10LOAD [val] [val]Loads data from the address [val, val] into C.
11STORE [val] [val]Stores the value in C at the address [val, val].

Layout:

Each instruction is 6 bits in the format [XX][YY][ZZ]:

The first operand of NOR can't be immediate, so that allows another four instructions:

BinaryInstructionDescription
001100NOPNo operation; used for padding.
001101---Reserved for future use.
001110---Reserved for future use.
001111HLTHalts the CPU until the next interrupt.

Note: Instructions and operands are both uppercase because my 6-bit character encoding doesn't support lowercase...

Operands

BinaryNameDescription
00AGeneral purpose register.
01BGeneral purpose register.
10CGeneral purpose register.
11-Read next instruction as a value.

Memory Layout

There are a total of 4096 unique addresses each containing 6 bits.

AddressDescription
0x000..=0xEFFGeneral purpose RAM.
0xF00..=0xF3DReserved for future use.
0xF3E..=0xF3FProgram Counter(PC) (ROM).
0xF40..=0xF7FReserved for future use.
0xF80..=0xFBFLeft rotate lookup table (ROM).
0xFC0..=0xFFFRight rotate lookup table (ROM).

Lexical Conventions

Statements

A program consists of one or more files containing statements. A statement consists of tokens separated by whitespace and terminated by a newline character.

Comments

A comment can reside on its own line or be appended to a statement. The comment consists of an octothorp (#) followed by the text of the comment and a terminating newline character.

Labels

A label can be placed before the beginning of a statement. During compilation the label is assigned the address of the following statement and can be used as a keyword operand. A label consists of the LAB keyword followed by an identifier. Labels are global in scope and appear in the file's symbol table.

Tokens

There are 6 classes of tokens:

Identifiers

An identifier is an arbitrarily-long sequence of letters, underscores, and digits. The first character must be a letter or underscore. Uppercase and lowercase characters are equivalent.

Keywords

Keywords such as instruction mnemonics and directives are reserved and cannot be used as identifiers. For a list of keywords see the Keyword Tables.

Registers

The Diana-II architecture provides three registers [A, B, C]; these are reserved and cannot be used as identifiers. Uppercase and lowercase characters are equivalent.

Numerical Constants

Numbers in the Diana-II architecture are unsigned 6-bit integers. These can be expressed in several bases:

Character Constants

A character constant consists of a supported character enclosed in single quotes ('). A character will be converted to its numeric representation based on the table of supported characters below:

0123456789ABCDEF
0x0123456789=-+*/^
1xABCDEFGHIJKLMNOP
2xQRSTUVWXYZSPACE.,'"`
3x#!&?;:$%|><[]()\

If a lowercase character is used, it will be converted to its uppercase representation.

Operators

The compiler supports the following operators for use in expressions. Operators have no assigned precedence. Expressions can be grouped in parentheses () to establish precedence.

!Logical NOT
&Logical AND
|Logical OR
+Addition
-Subtraction
*Multiplication
/Division
>>Rotate right
<<Rotate left

All operators except Logical NOT require two values and parentheses ():

Keywords, Operands, and Addressing

Keywords represent an instruction, set of instructions, or a directive. Operands are entities operated upon by the keyword. Addresses are the locations in memory of specified data.

Operand Types

A keyword can have zero to three operands separated by whitespace characters. For instructions with a source and destination this language uses Intel's notation destination (left) then source (right).

There are 5 types of operands:

Addressing

The Diana-II architecture uses 12-bit addressing. Labels can be split into two 6-bit immediate values by appending a colon followed by a 0 or 1, where :0 is the high-order and :1 is the low-order half. If a keyword requires an address it can be provided as two 6-bit values or a single 12-bit identifier:

Side Effects

Any side effects will be listed in the notes of a keyword; read each carefully. If a keyword clobbers an unrelated register, it will select the first available in reverse alphabetical order, e.g.

Keyword Tables

Operands will be displayed in square brackets [ ] using the following shorthand:

Bitwise Logic Keywords

KeywordDescriptionNotes
NOT [reg]bitwise logical NOT-
AND [reg] [eth]bitwise logical ANDThe second register is flipped; its value can be restored with a NOT operation. If an immediate value is used, it is flipped at compile time.
NAND [reg] [eth]bitwise logical NANDThe second register is flipped; its value can be restored with a NOT operation. If an immediate value is used, it is flipped at compile time.
OR [reg] [eth]bitwise logical OR-
NOR [reg] [eth]bitwise logical NOR-
XOR [reg] [eth]bitwise logical XORAn extra register will be clobbered; this is true even if an immediate value is used.
NXOR [reg] [eth]bitwise logical NXORAn extra register will be clobbered; this is true even if an immediate value is used.

Shift and Rotate Keywords

These keywords simply load the corresponding address from the right and left rotate lookup tables.

KeywordDescriptionNotes
ROL [eth]rotate left storing the value in C-
ROR [eth]rotate right storing the value in C-
SHL [eth]shift left storing the value in C-
SHR [eth]shift right storing the value in C-

Arithmetic Keywords

KeywordDescriptionNotes
ADD [reg] [eth]addAll registers will be clobbered; this is true even if an immediate value is used.
SUB [reg] [eth]subtractAll registers will be clobbered; this is true even if an immediate value is used.

Memory Keywords

KeywordDescriptionNotes
SET [imm]compiles to raw value [imm]-
MOV [reg] [eth]copy from second operand to first-
LOD [add]load data from [add] into C-
STO [add]stores data in C at [add]-

Jump Keywords

KeywordDescriptionNotes
PC [add]set program counter to [add]-
LAB [idn]define a label pointing to the next statement-
LIH [con] [add]conditional jump if trueAll registers will be clobbered, and LIH stands for logic is hard.

Miscellaneous Keywords

KeywordDescriptionNotes
NOPNo operation; used for padding-
HLThalts the CPU until the next interrupt-