MRISC32 – Stabilizing the Base architecture

The MRISC32 instruction set architecture recently reached a seemingly insignificant but major milestone: As of version 0.2 of the ISA the key elements of the Base architecture have been fixed, and it is very unlikely that any changes will be made to it any time soon.

Base architecture?

“Base architecture?” you say. I’m glad you asked!

The MRISC32 ISA is currently divided into the following architecture modules:

The Base architecture (mandatory).
The Vector operation module (optional).
The Floating-point module (optional).
The Packed operation module (optional).
The Saturating and halving arithmetic module (optional).

The Base architecture is the minimum subset of the MRISC32 ISA that a conforming implementation (e.g. a CPU) must support. The other modules further extend the capabilities of an MRISC32 implementation, but they are optional and are still under development.

For more info, see the MRISC32 Instruction Set Manual.

What made the cut?

In short the Base architecture includes the basic scalar integer instructions such as arithmetic, logic, memory load/store and branch operations. Among other things, it includes scalar instructions for:

Load/store and load-effective-address (LDW, LDB, LDUB, STW, LDEA, …) with base + offset and base + scaled index addressing.
Arithmetic operations (ADD, SUB, MUL, DIV, MIN, MAX, …).
Comparison operations (SEQ, SLT, SLE, …).
Conditional and unconditional branch operations (BZ, BLT, BGE, BS, …, J, JL).
Bitwise logic operations (AND, OR, XOR).
Bit field operations (EBF, EBFU, MKBF).
Conditional select (SEL).
Bit manipulation operations that are easy to do in hardware but hard to do in software (CLZ, POPCNT, bit reverse, byte swizzle).

I believe that this is a very good and competent subset of the ISA, while still being simple enough to implement even in lightweight CPU:s.

Late changes

The things that I wanted to pin down were mostly related to instruction encoding and which instructions to include in the Base architecture.

Furthermore, a few instructions had been bothering me as they did not make good use of the instruction word bits, and I also wanted to ensure that the most common PC-relative addressing operations could be done with a single instruction.

Those things were thus prioritized to be fixed before locking the Base architecture.

Replace shift instructions with bit field instructions

A neat trick that I learned from the M88k ISA (designed by Mitch Alsup in the 1980s) is that you can replace the common shift instructions (logic/arithmetic shift left/right, i.e. LSL, LSR, ASR) with more powerful and versatile bit field instructions. More details can be found in the MC88100 RISC Microprocessor User’s Manual (look up ext, extu and mak – you can find a copy of the manual over at bitsavers.org). The shift instructions were thus replaced as follows:

Old instruction	New instruction
LSL (Logic Shift Left)	MKBF (MaKe Bit Field)
LSR (Logic Shift Right)	EBFU (Extract Bit Field Unsigned)
ASR (Arithmetic Shift Right)	EBF (Extract Bit Field)

Although the new instructions require slightly more hardware, the gate delay turns out to be the same as for plain shift instructions, and the instructions can effectively perform two operations at once (mask + shift, including optional sign extension). Since most ISA:s eventually end up including bit field instructions anyway, this change felt like a no-brainer.

As a side effect the 15-bit immediate field of the bit field instructions is put to better use (10 out of 15 bits are used) compared to the old shift instructions (only 5 out of 15 bits were used).

Improve bitwise logic instructions

One thing that bothered me about the logic instructions (AND, OR, …) was that the two-bit T-field of the instruction word was unused. The T-field is normally used for specifying the “type” for packed operations (byte, half-word, word), but for bitwise operations no such distinction exists.

Then it occurred to me that the T-field could be repurposed to mean bitwise negation of the source operands (two source operands, two bits of information in the instruction word). That way the number of logic instructions could be reduced from five (AND, OR, XOR, BIC, NOR) to just three instructions (AND, OR, XOR), and we can perform more logic operations in a single instruction:

and     r1, r2, r3   ; r1 = r2 & r3
and.pn  r1, r2, r3   ; r1 = r2 & ~r3  (replaces BIC)
and.np  r1, r2, r3   ; r1 = ~r2 & r3
and.nn  r1, r2, r3   ; r1 = ~r2 & ~r3 = ~(r2 | r3) (replaces NOR)
or      r1, r2, r3   ; r1 = r2 | r3
or.pn   r1, r2, r3   ; r1 = r2 | ~r3
...

…and so on.

Again, this change made the instructions more powerful with negligible or no hardware costs, and we even reduced the number of instructions in the ISA. What requires two instructions in many other ISA:s (bitwise negate + and/or/xor) can now be done in a single instruction in MRISC32.

Improve PC-relative addressing

One of the most common operations in any code base is to reference code or data relatively to the the program counter (PC), i.e. the address of the current instruction.

The generic solution in MRISC32 is to use the ADDPCHI instruction, which adds a 21-bit immediate value to the upper bits of the PC and stores the result in a scalar register of choice. That performs the first step of a two-instruction combination, where the second step is a load/store, jump or add instruction that supplies the lower 11 bits of the PC-relative address (using the regular 15-bit immediate field of the instruction). This gives a full 21 + 11 = 32 bits of PC-relative range. Note that this mechanism is also found in other RISC ISA:s (e.g. the RISC-V AUIPC instruction does the same thing).

As an example, the code for a two-instruction PC-relative load can look like this (the “+4” is due to the 4-byte address difference between the two instructions):

addpchi  r3, #foo@pchi
ldw      r3, [r3, #foo+4@pclo]

For reference: Over 10% of the instructions in the Quake executable use PC-relative addressing, and the size of the entire program is less than 1 MB (including BSS and statically linked libc etc).

Thus, these operations are very common, and in most cases a full 32-bit PC-relative range (i.e. ± 2 GB) is not required, so it is a big win if we have instructions that can do PC-relative addressing in just a single instruction.

After some statistical studies, I modified and removed some of the existing instructions and added a few new instructions, ending up with the following instructions that all take a signed 21-bit immediate offset value that is multiplied by four, for a PC-relative range of ± 4 MB:

Instruction	Description
J	Jump (can use PC as base register)
JL	Jump and link (can use PC as base register)
LDWPC	Load word PC-relative
STWPC	Store word PC-relative
ADDPC	Add immediate to PC (load effective address)

The typical scenario is that the compiler will output a two-instruction sequence for any PC-relative operation, and then the static linker (once symbol addresses are known) will relax the operation to a single instruction whenever possible.

Provided that “foo” is within ± 4 MB from the instruction, the example above would be relaxed into:

ldwpc  r3, #foo@pc

Assembler syntax changes

A couple of changes were made to the assembler syntax to improve readability.

First, the scalar register names were changed from S0 – S31 to the more traditional R0 – R31. This should make the syntax more familiar to people that are used to other architectures, but perhaps more importantly I find that the new register names are easier to read.

The second change was that load/store address specifiers are surrounded by square brackets, like so:

ldw  r3, [r7, r1*4]   ; Load word at address R7 + R1*4 into R3

This should make the assembler code easier to read, as you can quickly identify which operands are used for defining the memory address.

Future changes

It is unlikely that the Base architecture will change significantly, but a few things are on my radar:

The CPUID instruction, which is part of the Base architecture, will change into a more generic read/write control/status register mechanism. This should not matter much to compiler/HW developers at the time being, as this instruction is currently only used in combination with the optional architecture modules.
I may add some more integer instructions to the Base architecture, such as MLA/MLS (integer multiply-accumulate and multiply-subtract) and IBF (insert bit field). These additions will be few, and they should not change the existing Base architecture instructions.
I will most likely try to implement better support for multi-precision arithmetic (i.e. some form of “carry”-functionality, but without a flags register). The current idea is that it will either take the form of a prefix instruction, or three-input/two-output instructions (of some kind).

Significance

The good part about having a stable Base architecture is that the ISA is mature enough to enable more serious work on the MRISC32 ecosystem, including:

Compilers (GCC, LLVM)
Simulators
Hardware implementations (e.g. soft microprocessors)
Debuggers

While there is already work underway in most of these areas, work is progressing slowly due to resource and time constraints. Thus it would be very exciting to see more people getting involved to take the MRISC32 project further to new levels.

If you are interested, feel free to get in touch, e.g. via the discussion forums or e-mail. See:

https://gitlab.com/mrisc32

Bits'n'Bites