SETNEX  /  Specification  /  v0.8

Open balanced ternary instruction set architecture. Generated from setnex-isa-v0.8.md on 2026-05-07.

Apache 2.0 Balanced ternary v0.8 — specification

Setnex ISA — Specification v0.8

Copyright 2026 Eric Tellier (Terias)

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this specification except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, this specification is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

In accordance with Section 3 of the Apache License 2.0, any implementation of this specification is granted a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, use, sell, and distribute implementations that comply with this specification.


Balanced ternary instruction set architecture, inspired by RISC-V. Balanced ternary {−1, 0, +1}, 27-trit word, 27 registers, fixed-length instructions.


Changelog from v0.7

# Change Rationale
1 Vector extension added — dedicated 27-register vector bank v0..v26, trit-parallel datapath (each lane = 1 trit, 27 lanes per word). Eight new opcodes (+15..+22): VADD, VMUL, VLOG, VSEL, VCMP, VRED, VPERM, VMOVE. See §16. Amortizes fetch/decode/control over 27 simultaneous trit operations and exposes ternary-native primitives (3-way mask, ternary compare, Kleene consensus reduction) that have no clean binary equivalent. Ternary ML at 1.58-bit (BitNet) maps natively.
2 New register bank: 27 vector registers v0..v26, each 27 trits wide. Indexed using the 3-trit register fields of the existing R format — vector opcodes interpret rs1/rs2/rd as v-reg indices instead of GPRs. VMOVE is the only opcode that crosses banks. Keeps the format set unchanged (still R, I, J, U, B). Per-opcode bank dispatch matches RVV practice; no instruction-side mode bit needed.
3 Logic ops merged into VLOG: TAND/TOR/TNOT/TIMPL (LMODE-following) and CONS/ACONS (LMODE-bypass) all share opcode +17 with sub-mode in funct[0..2] Same datapath, same operand shape — fusing into one opcode preserves opcode budget for v0.9 (FMA, gather/scatter, tryte/trybble lanes).
4 Permutation and shift merged into VPERM (opcode +21) — modes: rotate, shift, reverse, shuffle Same lane-permutation network, only the control signal differs. Single opcode covers all.
5 Inter-bank movement merged into VMOVE (opcode +22) — modes: GPR↔v-reg whole-word copy, v-reg↔v-reg copy, single-trit broadcast/insert/extract Bridges the two register files without burning three separate opcodes; lane index when relevant lives in funct[3..5].
6 Two opcodes (+23, +24) left reserved for v0.9 Earmarked for fused multiply-accumulate (VFMA) and gather/scatter (VGATHER/VSCATTER), or tryte/trybble lane modes if explored next.
7 Architectural summary updated — opcode count 52 → 60; register count 27 GPR → 27 GPR + 27 v-reg Vector bank is part of the architectural state visible to a context switch; see §16.6 for the save/restore contract.
8 No change to scalar ISA, CSRs, exception model, MPU, or interrupt controller v0.8 is purely additive over v0.7. Existing binaries run unmodified; vector code is opt-in.

Changelog from v0.6

# Change Rationale
1 Memory Protection Unit (MPU) added — 9 indirect-access regions, NAPO3 size encoding, ternary per-axis permissions ({N = none, Z = kernel only, P = user + kernel}) for R/W/X Turns the §2.3 convention “negative = kernel, positive = user” into a hardware-enforced rule. Prerequisite for running untrusted user code without silent kernel corruption. See §9.
2 Asynchronous interrupt controller added — 9 IRQ lines, three CSRs (IPENDING, IENABLE, IPRIORITY), dispatch through the existing exception machinery (shared EVEC, frame-2 bank for nested cases) Provides external-event delivery (timer, UART, etc.) independent of the instruction stream. Makes preemptive scheduling possible. See §10.
3 Six new CSRs at addresses −1..−6: MPU_SELECT, MPU_BASE, MPU_CFG, IPENDING, IENABLE, IPRIORITY First use of the negative CSR address half. MPU uses indirect access (select-then-access) so region count is independent of CSR budget.
4 Three new EXC_PERM_R / EXC_PERM_W / EXC_PERM_X codes (−9, −8, −7) distinct from EXC_FAULT EXC_FAULT = address invalid (doesn’t exist); EXC_PERM_* = address exists but MPU denies. Three separate codes let the handler dispatch on access type without re-decoding the faulting instruction. Mirrors the RISC-V load/store/instruction page-fault split.
5 Nine new IRQ cause codes IRQ_0..IRQ_8 (+20..+28) Hardware selects the highest-priority pending-enabled IRQ line and writes the corresponding cause code at dispatch time. A single-CSR read tells the handler which line to service.
6 §2.3 address space: “convention” upgraded to “enforcement” — the MPU can make the user/kernel split physically binding Documentation reflects that the split is no longer a soft contract.
7 Spec clarifications (no semantic change vs. v0.6 simulator): EXC_OVERFLOW reclassified “reserved, not raised” (no STATUS enable trit exists); TSHIFT explicitly produces 0 for |val(rs2)| ≥ 27; ECALL imm17[1..16] ignored by the decoder (not “must be Z”); Tekum anchor |r| > 7 decodes to NaR; IPENDING/IENABLE write-N is a no-op Align the spec text with observed v0.6 simulator behavior; close under-specified edges surfaced during the v0.7 review.
8 MPU_CFG size range narrowed from [0, 27] to [0, 26] n = 27 had no well-defined base under the signed NAPO3 model; kernel-mode no-match default already covers the “everything allowed” case.
9 §10.5 handler boilerplate rewritten to select the correct bank via STATUS.depth before reading ECAUSE / ECAUSE2; §10.2 Pending/Enable semantics extended with explicit level-vs-edge rule v0.7 first-draft example silently mis-dispatched on nested entry; edge-triggered peripherals had no documented path to coexist with the level-sensitive sampler.

Changelog from v0.5

# Change Rationale
1 Nested exceptions: a second bank of exception CSRs (EPC2, ESAVE2, ECAUSE2, ETVAL2) added; STATUS.depth (trit t[3]) tracks active frame count v0.5 machine-check reset on any synchronous fault in kernel mode was too brutal — forced a fault-free kernel. One level of nesting is now tolerated; only a third fault (while depth = N) triggers reset.
2 ECALL imm17[0] repurposed as flavor tag: Z = user syscall, P = hypercall, N = debug trap Subdivides the EXC_ECALL cause without consuming a new opcode. Backward compatible with v0.5 binaries (whose imm17 = Z → user syscall, same as before).
3 EXC_ECALL split into EXC_ECALL_U = 0, EXC_ECALL_H = +1, EXC_ECALL_D = +2; all three share a single EVEC Lets the handler dispatch on ECAUSE alone without decoding imm17. Positive codes preserve the convention “negative = fault, positive = deliberate synchronous trap”. EXC_ECALL_U = 0 preserves v0.5 binary compatibility.
4 Triple-fault condition (fault while depth = N) triggers machine-check reset; no ECAUSE code allocated Condition is not observable by any handler, so reserving a cause code would be dead weight. Documented by name in §8.2.

Changelog from v0.4

# Change Rationale
1 r17 repurposed: s2 (callee-saved) → a7 (argument register / syscall number) Reserve a dedicated register for the syscall number, following the RISC-V convention; lets the handler dispatch without re-fetching the faulting instruction
2 Saved register range narrowed: s0s10 (11 regs) → s0s9 (10 regs) r18r25 renumbered from s3s10 to s2s9 so the saved range is contiguous with no gap
3 ECALL clarified: imm17 reserved (must be Z); syscall number passed in a7; ECAUSE ← EXC_ECALL (= 0) on entry v0.4 was ambiguous on whether imm17 overwrote ECAUSE; register-based dispatch avoids a re-decode of the instruction at EPC
4 New §8.4: Syscall dispatch convention Documents the handler-side contract and the EPC + 1 epilogue pattern
5 New §11.5: Syscall calling convention ABI-level wrapper/caller contract, distinct from the standard function call convention of §11.1
6 §11.3 prologue/epilogue rewritten: allocate frame first, then save ra and the old s0; mirrored epilogue v0.4’s prologue overwrote s0 without saving it, and left ra stored outside the allocated stack region between the STORE and the final sp decrement
7 §11.3: stack alignment stated explicitly (sp is word-aligned) Closes an under-specified point of the v0.4 ABI; trivially satisfied by word-addressing
8 New CSR ETVAL at address 9 (MST-first +00); new §8.5 Symmetric with the ECALL/a7 clarification: EXC_FAULT, EXC_ALIGN and EXC_ILLEGAL now have a documented data channel (faulting address / raw instruction) instead of forcing the handler to re-decode EPC
9 §8.2 extended: synchronous fault while STATUS.mode = N → machine-check reset v0.4 had a single EPC/ESAVE save slot; nested synchronous exceptions would silently corrupt it. Proper nested handling deferred to v0.6
10 §8.1 exception table gains an ETVAL contents column Makes per-exception data channel explicit
11 funct[0] mode selector on ADD/SUB inverted: N = saturating (was P), P = with-carry (was N) Rationale: N evokes clamping/constraint, P evokes additive chaining — more natural ternary mnemonic.

Changelog from v0.3

# Change Rationale
1 Overflow flag removed from FLAGS Overflow is carried by FLAGS.carry (P = overflow, N = underflow); a separate overflow trit was redundant
2 TFP extension: FADD, FSUB, FMUL, FDIV, FCMP, FCVT (opcodes +8 to +13) Native ternary floating-point using Tekum T26F format; 6 of 7 reserved TFP opcodes allocated
3 T26F format: 26-trit Tekum in 27-trit register, t[26] = Z Even-width Tekum requirement; compatible with NEG, TABS, LOAD/STORE
4 New §7: Ternary floating-point format specification Tekum anchor, regime, exponent, fraction, special values, properties
5 Opcodes used: 46 → 52 6 TFP instructions added

Changelog from v0.2

# Change Rationale
1 ADC (funct[0]=P on ADD) and SBC (funct[0]=P on SUB) added Multi-precision carry chain with ternary carry {N,Z,P}; funct[0] is now a 3-way mode selector: Z=normal, N=saturating, P=with-carry
2 TSEL added (opcode −2, R format with rp in funct[0..2]) 3-way conditional select on FLAGS.sign — the defining ternary-native instruction
3 BF added (opcode −10, J format, rs1 field = condition mask) Trit-masked branch on FLAGS.sign; 6 comparison branches in 1 opcode
4 BRT3 added (opcode −21, new format B) Ternary three-way branch on LST(rX): P=fall-through, Z=off_z, N=off_n
5 Format B added: 4t opcode + 3t rX + 10t off_z + 10t off_n New format for BRT3; two 10-trit offsets (±29 524)
6 Instruction formats: 4 → 5 (R, I, J, U, B) Accommodates BRT3
7 Opcodes used: 43 → 46 (TSEL, BF, BRT3; ADC/SBC via funct) Was 43 in v0.2

1. Notation and conventions

Symbol Meaning
t trit ∈ {−1, 0, +1}, written N, Z, P
T[n] n-trit word (value in [−(3ⁿ−1)/2, +(3ⁿ−1)/2])
tryte 27-trit word (basic memory and register unit)
val(x) balanced integer value of a ternary word
enc(n, w) balanced ternary encoding of integer n on w trits
sign(x) N if x < 0, Z if x = 0, P if x > 0
t[i] trit i of a word (i=0 = least significant trit)
rd destination register
rs1, rs2 source registers
imm signed immediate (balanced ternary)
LST Least Significant Trit (t[0])
MST Most Significant Trit (t[w−1] for a w-trit word)
zero-extend(x) Extend a narrow field to 27 trits by filling higher trits with Z (preserves balanced value, since Z = 0)

Trits are numbered from 0 (LST) to 26 (MST) within a tryte.

1.1 Textual representation

Trit glyphs: - for N (−1), 0 for Z (0), + for P (+1).

Two conventions coexist in this document:

Each convention is explicitly labeled where it appears. When unmarked, MST-first is assumed.

1.2 Integer value

The integer value of a w-trit field starting at t[i] is: Σ trit[i+k] × 3^k for k=0..w−1.

Words are stored in memory least significant trit first (little-endian).


2. Programming model

2.1 General-purpose registers

27 T27 registers, named r0r26, encoded on 3 trits (address ∈ [−13, +13]).

Register ABI name Conventional role
r0 zero Always 0 (read-only by convention)
r1 ra Return address
r2 sp Stack pointer (grows toward negative addresses)
r3 gp Global pointer
r4 tp Thread pointer
r5r7 t0t2 Temporaries (caller-saved)
r8r9 s0s1 Saved registers (callee-saved), s0 = frame pointer
r10r16 a0a6 Arguments / return values
r17 a7 Argument register / syscall number (see §8.4, §11.5)
r18r25 s2s9 Saved registers (callee-saved)
r26 t3 Extra temporary

Register addresses use balanced ternary encoding naturally (shown MST-first): r0 = 0 → 000, r1 = 1 → 00+, r2 = 2 → 0+-, r3 = 3 → 0+0, …, r13 = 13 → +++, r14 = −13 → ---, …, r26 = −1 → 00-.

Note: the 27 register indices 0–26 are mapped to the 27 balanced ternary T3 values −13 to +13. The mapping is: register rN has address enc(N, 3) for N ∈ {0..13}, and enc(N−27, 3) for N ∈ {14..26}. This wraps naturally in the balanced ternary range.

2.1.1 Vector registers (new in v0.8)

A separate bank of 27 vector registers v0v26, each a full 27-trit word, is introduced for the vector extension (§16). Vector registers are addressed using the same 3-trit encoding as GPRs (one of the 27 balanced ternary T3 values −13..+13); the bank that is read or written is determined by opcode, not by the register field itself.

Opcode range Bank used for rd / rs1 / rs2
Scalar opcodes (−40..+14) GPR bank r0..r26
Vector opcodes (+15..+21) v-reg bank v0..v26
VMOVE (+22) Mixed — direction encoded in funct[0..2]; see §16.5

There is no architectural alias vzero distinct from v0; software conventionally keeps v0 cleared and treats it as the zero vector when needed (analogous to the r0/zero convention, but not hardware-enforced — v0 is a writable register).

The vector bank is part of the architectural state and must be saved/restored across context switches that wish to preserve user-mode vector code; see §16.6 for the recommended save/restore contract. There is no v-reg variant of r0’s read-as-zero contract.

2.2 Control and status registers (CSR)

CSR addresses are T3 (3 trits), providing 27 addressable slots (values −13 to +13) — a symmetric match with the 27 GPRs. Each CSR is a full T27 word. Unused slots are reserved and read as zero.

Address (T3, MST-first) Decimal Name Description
00+ 1 PC Program counter (T27, word address)
0+- 2 LMODE Logic mode (see §6)
0+0 3 FLAGS Arithmetic flags (see §5.5)
0++ 4 EPC Exception program counter
+-- 5 ECAUSE Exception cause (T27)
+-0 6 EVEC Exception vector (handler address)
+-+ 7 STATUS Processor status (see §2.4)
+0- 8 ESAVE Saved STATUS on exception entry (new in v0.2)
+00 9 ETVAL Exception trap value (see §8.5) — new in v0.5
+0+ 10 EPC2 Frame-2 exception PC (nested exception; see §8.2) — new in v0.6
++- 11 ECAUSE2 Frame-2 exception cause — new in v0.6
++0 12 ESAVE2 Frame-2 saved STATUS — new in v0.6
+++ 13 ETVAL2 Frame-2 exception trap value — new in v0.6
00- −1 MPU_SELECT MPU region index selector (new in v0.7, see §9)
0-+ −2 MPU_BASE Base address of the MPU-selected region (new in v0.7)
0-0 −3 MPU_CFG Config (size + permissions + valid) of the selected region (new in v0.7)
0-- −4 IPENDING Pending-IRQ bitvector (new in v0.7, see §10)
-++ −5 IENABLE Per-line IRQ enable mask (new in v0.7)
-+0 −6 IPRIORITY Per-line IRQ priority (new in v0.7)
others Reserved (read as zero)

The four frame-2 CSRs (EPC2, ECAUSE2, ESAVE2, ETVAL2) are the depth-2 counterparts of EPC, ECAUSE, ESAVE, ETVAL. They are written by the processor only when a synchronous fault occurs while the outer handler is still running (STATUS.depth = P); see §8.2 for the full entry rules. EVEC and STATUS are not duplicated — both frames share the single exception vector, and STATUS.depth tracks which bank is the active save target.

The six v0.7 CSRs occupy the first negative half of the CSR address space. The MPU CSRs (MPU_SELECT, MPU_BASE, MPU_CFG) provide indirect access to the 9-region descriptor bank: write a region index to MPU_SELECT, then read or write MPU_BASE and MPU_CFG to manipulate that region (see §9.3). The three interrupt CSRs are direct (one word each). All six CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5). The pre-existing system CSRs (EPC, ECAUSE, EVEC, STATUS, and the frame-2 bank) follow the same kernel-only rule.

At reset, all CSRs are initialized to zero.

2.3 Address space

The user/kernel split is a convention at the ISA level but can be made physically binding by the MPU (§9): the kernel programs regions with perm = Z (kernel-only) or P (user+kernel) as appropriate, and user-mode accesses outside any user-permitted region raise EXC_PERM_* (§8.1). Prior to v0.7 this separation was soft — any mode could reach any address.

2.4 STATUS register structure

Trit Name Values
t[0] mode N = kernel, Z = reserved, P = user
t[1] ie N = interrupts masked, Z = reserved, P = interrupts enabled
t[2] lx Logic extension for LMODE=N: N = Heyting, Z = standard Łukasiewicz, P = RM3 (see §6)
t[3] depth Exception-frame depth (new in v0.6): Z = 0 frames active, P = 1 frame active (main bank holds outer context), N = 2 frames active (main bank + frame 2)
t[4..26] Reserved (must be Z)

At reset, STATUS = N → kernel mode, interrupts masked, LMODE=N submode = Łukasiewicz, depth = Z (no frames active).

Rationale — why a trit for depth. One balanced trit holds exactly the three states the nested-exception machinery needs: no frame, one frame, two frames. A fourth state (triple fault) would be out of range anyway — the hardware treats it as a machine-check reset (§8.2) rather than a representable depth. Using depth = N as the “fully nested” state preserves the monotonic Z→P→N progression as exceptions stack up.


3. Instruction encoding

Every instruction is a fixed-length 27-trit word.

3.1 Opcode field

The 4 least significant trits (t[0]–t[3]) form the primary opcode. 3⁴ = 81 possible combinations — ample opcode space.

Placing the opcode at t[0]–t[3] allows the decoder to begin working as soon as the first trits of the word arrive, without waiting for the full word.

3.2 Instruction formats

Five formats. The format is determined solely by the opcode; the decoder does not inspect other fields to determine the format.

R format (register–register)
t:  [0-3]   [4-6]   [7-9]   [10-12]  [13-26]
    opcode   rd      rs1     rs2      funct (14 trits)
    4 trits  3 trits 3 trits 3 trits  14 trits

I format (immediate)
t:  [0-3]   [4-6]   [7-9]   [10-26]
    opcode   rd      rs1     imm17
    4 trits  3 trits 3 trits 17 trits  (imm ∈ [−(3¹⁷−1)/2, +(3¹⁷−1)/2] ≈ ±64 million)

J format (conditional branch)
t:  [0-3]   [4-6]   [7-26]
    opcode   rs1     offset20
    4 trits  3 trits 20 trits  (offset ∈ [−(3²⁰−1)/2, +(3²⁰−1)/2] ≈ ±1.7 billion)

U format (unconditional jump)
t:  [0-3]   [4-26]
    opcode   offset23
    4 trits  23 trits  (offset ∈ [−(3²³−1)/2, +(3²³−1)/2] ≈ ±4.7 × 10¹⁰)

B format (ternary three-way branch) — new in v0.3
t:  [0-3]   [4-6]   [7-16]     [17-26]
    opcode   rX      off_z      off_n
    4 trits  3 trits 10 trits   10 trits  (offsets ∈ [−(3¹⁰−1)/2, +(3¹⁰−1)/2] ≈ ±29 524)

Fields are laid out from least significant (t[0]) to most significant (t[26]).

Rationale for format U: In v0.1, JMP and CALL used J format, wasting the rs1 field (3 trits) that they do not need. Format U merges those trits into the offset, multiplying jump range by 27 at no cost. Conditional branches retain J format because they require rs1 for the test register.

Rationale for format B (new in v0.3): A ternary three-way branch needs two offsets (for Z and N outcomes; P falls through). The 23 trits after opcode+rX are split evenly into two 10-trit offset fields, each with a range of ±29 524. This is the most ternary-native branch format: one instruction, three outcomes, zero wasted trits.

The 14-trit funct field in R format allows 3¹⁴ ≈ 4.8 million variants per opcode — only a few trits are used so far (mode selectors on ADD/SUB, FCVT, TSET; register field on TSEL), leaving the rest for future extensions. Funct sub-fields are specified per-instruction.


4. Instruction set

4.1 Opcode table (4 trits = value from −40 to +40)

ALU group — R format (opcode −40 to −27)

Opcode (val) Mnemonic funct Operation
−40 ADD funct[0]=Z rd ← rs1 + rs2 (balanced arithmetic)
−40 ADDS funct[0]=N rd ← rs1 + rs2 (saturating: clamps to T27 range)
−40 ADC funct[0]=P rd ← rs1 + rs2 + FLAGS.carry (new in v0.3)
−39 SUB funct[0]=Z rd ← rs1 − rs2
−39 SUBS funct[0]=N rd ← rs1 − rs2 (saturating)
−39 SBC funct[0]=P rd ← rs1 − rs2 − FLAGS.carry (new in v0.3)
−38 MUL funct[0]=Z rd ← low 27 trits of rs1 × rs2
−38 MULH funct[0]=P rd ← high 27 trits of rs1 × rs2 (54-trit product)
−37 DIV Z rd ← rs1 ÷ rs2 (symmetric Euclidean, see §5.4)
−36 MOD Z rd ← rs1 mod rs2 (symmetric remainder, see §5.4)
−35 NEG Z rd ← −rs1 (trit-by-trit inversion: P↔N, Z→Z)
−34 TAND Z rd ← rs1 AND rs2 (per LMODE, see §6)
−33 TOR Z rd ← rs1 OR rs2 (per LMODE)
−32 TNOT Z rd ← NOT rs1 (per LMODE)
−31 TIMPL Z rd ← rs1 IMPL rs2 (per LMODE)
−30 CONS Z rd ← consensus(rs1, rs2) — always Kleene
−29 ACONS Z rd ← anti-consensus(rs1, rs2) — always Kleene
−28 TSHIFT Z rd ← rs1 shifted by val(rs2) trits (left if >0, right if <0; vacated trits filled with Z). val(rs2) uses the full T27 range, no masking; shifts with
−27 TCMP Z rd ← trit-by-trit comparison: rd[i] = sign(rs1[i] − rs2[i])

Consensus: trit-by-trit, cons(a,b) = a if a==b, else Z. Anti-consensus: trit-by-trit, acons(a,b) = Z if a==b, else the absent trit: acons(N,Z) = acons(Z,N) = Pacons(Z,P) = acons(P,Z) = Nacons(N,P) = acons(P,N) = Z. CONS and ACONS are dual operations: CONS extracts agreement, ACONS extracts the absent trit. Both ignore LMODE — they are arithmetic primitives, not logic operations.

TCMP is the trit-by-trit spaceship operator. For each trit position i: rd[i] = N if rs1[i] < rs2[i], Z if rs1[i] = rs2[i], P if rs1[i] > rs2[i]. TCMP complements CONS/ACONS: CONS extracts shared values, TCMP extracts the ordering relation. Together, these three form a complete trit-comparison toolkit.

Saturating arithmetic (ADDS, SUBS): the result is clamped to [−(3²⁷−1)/2, +(3²⁷−1)/2] instead of wrapping. FLAGS.carry is set to Z (no overflow) regardless, since overflow is absorbed. FLAGS.sign reflects the clamped result.

Carry-chain arithmetic (ADC, SBC — new in v0.3): the value of FLAGS.carry from the previous ALU operation is added to (ADC) or subtracted from (SBC) the result. This enables multi-precision arithmetic. The balanced ternary carry {N, Z, P} is richer than the binary carry {0, 1} — one ADC propagates 3 values natively. Sequence for 54-trit addition: ADD lo, a_lo, b_lo then ADC hi, a_hi, b_hi.

funct[0] on ADD/SUB is a 3-way mode selector: Z = normal (ADD/SUB), N = saturating (ADDS/SUBS), P = with carry (ADC/SBC). This is itself a ternary exploitation — one trit selects among 3 modes.

Note on funct indexing: funct[i] denotes the i-th trit within the funct field (local index, 0 = LST of funct). In absolute instruction-word terms, funct[0] lives at t[13] (the trit immediately after rs2). Using the LST-end of funct for mode selectors keeps the decoder logic close to the opcode and rs2 decoder.

Memory group — I format (opcode −26 to −18)

Opcode (val) Mnemonic Operation
−26 LOAD rd ← Mem[rs1 + imm17]
−25 STORE Mem[rs1 + imm17] ← rd (rd field used as source)
−24 LI rd ← zero-extend(imm17) to 27 trits
−23 LUI rd ← imm17 << 10 (load upper immediate, low 10 trits set to Z)
−22 ADDI rd ← rs1 + zero-extend(imm17)
−21 BRT3 B
−20 to −19 reserved
−18 CMPI FLAGS ← compare(rs1, zero-extend(imm17)) — see §5.5

Branch group — J format (opcode −17 to −10) and U format (opcode −9 to −8)

Opcode (val) Mnemonic Format Condition Semantics
−17 BEQ J rs1 == 0 PC ← PC + offset20
−16 BNE J rs1 ≠ 0 PC ← PC + offset20
−15 BLT J rs1 < 0 PC ← PC + offset20
−14 BGT J rs1 > 0 PC ← PC + offset20
−13 BLE J rs1 ≤ 0 PC ← PC + offset20
−12 BGE J rs1 ≥ 0 PC ← PC + offset20
−11 JMPA J unconditional PC ← rs1 + offset20
−10 BF J FLAGS.sign matches mask Trit-masked branch on FLAGS (new in v0.3, see below)
−9 JMP U unconditional PC ← PC + offset23
−8 CALL U unconditional ra ← PC + 1 ; PC ← PC + offset23

Branch instructions (BEQ–BGE) use rs1 (field [4-6]) as the register to test, and the offset is relative to the current PC (before increment). BLT/BGT/BLE/BGE compare val(rs1) to 0.

JMPA retains J format because it needs rs1 as the base address register. JMP and CALL use U format for maximum jump range (23 trits ≈ ±4.7 × 10¹⁰ words).

CSR and system group — I format (opcode −7 to −4)

Opcode (val) Mnemonic Operation
−7 CSRR rd ← CSR[imm17]
−6 CSRW CSR[imm17] ← rs1
−5 CSRX rd ← CSR[imm17] ; CSR[imm17] ← rs1 (atomic read-then-write)
−4 ECALL Synchronous trap: ECAUSE ← EXC_ECALL_U/H/D based on imm17[0] (Z=user syscall=0, P=hypercall=+1, N=debug trap=+2); call number is read by the handler from a7 (r17). imm17[1..16] are reserved and ignored by the decoder — assemblers emit Z for forward compatibility (see §8.4).

Special group — R format (opcode −3 to +4)

Opcode (val) Mnemonic Operation
−3 IRET Atomic exception return: PC ← EPC ; STATUS ← ESAVE (new in v0.2)
−2 TSEL R
−1 NOP No operation
0 HALT Halt processor
+1 TGET rd ← trit t[val(rs2)] of rs1 (result is N, Z, or P in a T27 word)
+2 TSET rd ← rs1 with trit t[val(rs2)] set to value encoded in funct[0] (see below)
+3 TSIGN rd ← sign(rs1) : N, Z, or P (as T27: −1, 0, or +1)
+4 CMP FLAGS ← compare(rs1, rs2) — see §5.5

TSET encoding (opcode +2): the value to insert is determined by funct[0]:

funct[0] Assembler mnemonic Inserted trit value
N TSETN rd, rs1, rs2 N (−1)
Z TSETZ rd, rs1, rs2 Z (0)
P TSETP rd, rs1, rs2 P (+1)

rs2 provides the trit index (0–26); the value to write comes from funct, not from rs2. The bare mnemonic TSET is accepted by the assembler as an alias for TSETZ (clear a trit).

Absolute value and trit-reduce (opcode +5 to +7)

Opcode (val) Mnemonic Operation
+5 TABS rd ← |rs1| (absolute value)
+6 TMIN rd ← minimum trit of rs1 (fold-min across all 27 trits; result is N, Z, or P as T27)
+7 TMAX rd ← maximum trit of rs1 (fold-max across all 27 trits; result is N, Z, or P as T27)

TMIN / TMAX are trit-reduce operations: they fold across all 27 trit positions and return the extremum as a single-trit value in a T27 register. - TMIN(x) = P if and only if all trits of x are P (all-P test). - TMAX(x) = N if and only if all trits of x are N (all-N test). - After a subsumption check (TIMPL result, req, caps), the pattern TMIN result followed by BGT t0, granted branches if all 27 capabilities are satisfied — no constant needed.

New in v0.3: TSEL, BF, BRT3

TSEL — 3-way conditional select (opcode −2, R format)

TSEL rd, rn, rz, rp dispatches based on FLAGS.sign: - FLAGS.sign = N → rd ← rn (rs1 field) - FLAGS.sign = Z → rd ← rz (rs2 field) - FLAGS.sign = P → rd ← rp (funct[0..2] field, register address)

This is the defining ternary-native data instruction. After a CMP, one instruction selects among three registers — binary requires 2 CMOVs or a branch. The R format accommodates 4 register fields: rd (destination), rs1=rn, rs2=rz, funct[0..2]=rp (3 trits at the LST end of funct, i.e. t[13..15] of the instruction word).

Example — clamp to range [lo, hi]:

CMP    val, lo          # FLAGS.sign: N if val < lo, Z if =, P if >
TSEL   t0, lo, val, val # t0 = lo if below, val otherwise
CMP    t0, hi           # FLAGS.sign: N if t0 < hi, Z if =, P if >
TSEL   result, t0, t0, hi  # result = hi if above, t0 otherwise

BF — trit-masked branch on FLAGS (opcode −10, J format)

BF cond, offset20 branches if FLAGS.sign matches the condition mask encoded in the rs1 field [4-6]: - t[4] = P → match if FLAGS.sign = N - t[5] = P → match if FLAGS.sign = Z - t[6] = P → match if FLAGS.sign = P

The branch is taken if any matching trit is set. This encodes all 6 standard comparison branches plus “always” in a single opcode:

Assembler rs1 mask Condition
BFLT P00 FLAGS.sign = N (less than)
BFEQ 0P0 FLAGS.sign = Z (equal)
BFGT 00P FLAGS.sign = P (greater than)
BFLE PP0 FLAGS.sign = N or Z (less or equal)
BFGE 0PP FLAGS.sign = Z or P (greater or equal)
BFNE P0P FLAGS.sign = N or P (not equal)

The pattern CMP a, b ; BFLT label replaces SUB t0, a, b ; BLT t0, label, saving a register and an instruction. FLAGS are not modified by BF.

BRT3 — ternary three-way branch (opcode −21, B format)

BRT3 rX, off_z, off_n reads the least significant trit (LST) of register rX and dispatches: - LST(rX) = P → fall through to PC + 1 (no branch penalty) - LST(rX) = Z → PC ← PC + off_z - LST(rX) = N → PC ← PC + off_n

Format B provides two 10-trit signed offsets (±29 524 each). The P-falls-through convention optimizes for the common case: loop bodies execute directly without a branch.

A while loop compiles to:

loop_start:
    ; evaluate condition → rX (P=true, Z=unknown, N=false)
    BRT3  rX, loop_start, loop_exit
    ; fall-through (P) → loop body
    ...
    JMP   loop_start
loop_exit:

Two instructions of overhead; the body executes without any branch. Variants: - while! (optimistic): set off_z = +1 → Z falls through with P. - while? (pessimistic): set off_z = off_n → Z treated as N, exits loop.

FLAGS are not affected by BRT3.

TFP group — R format (opcode +8 to +13)

Opcode (val) Mnemonic funct Operation
+8 FADD Z rd ← rs1 +_f rs2 (T26F addition)
+9 FSUB Z rd ← rs1 −_f rs2 (T26F subtraction)
+10 FMUL Z rd ← rs1 ×_f rs2 (T26F multiplication)
+11 FDIV Z rd ← rs1 ÷_f rs2 (T26F division)
+12 FCMP Z FLAGS ← float comparison of rs1, rs2 (see §7)
+13 FCVT see below Integer ↔ T26F conversion (see §7)

FCVT encoding (opcode +13): the conversion mode is selected by funct[0]:

funct[0] Assembler mnemonic Operation
Z FICVT rd, rs1 rd ← T26F(int(rs1)) — integer to float
P FCVTI rd, rs1 rd ← int(T26F(rs1)), round to nearest
N FCVTIZ rd, rs1 rd ← int(T26F(rs1)), round toward zero

All TFP instructions use R format. rs2 is ignored for FCVT. The funct field is zero except for FCVT where funct[0] selects the conversion mode, following the same pattern as ADD/ADDS/ADC.

NaR propagation: if any input is NaR, the result is NaR. Additionally: ∞ − ∞ = NaR, 0 × ∞ = NaR, 0 ÷ 0 = NaR.

Division by zero: if rs2 = 0 and rs1 ≠ 0, FDIV produces ∞.

FLAGS after TFP operations (FADD, FSUB, FMUL, FDIV, FCMP) — unified scheme driven by the result’s class:

Result class sign carry
NaR Z N
∞ (saturated) P P
0 (true zero) Z Z
normal sign of result (N/P) Z

For FCMP, the “result” is the ordered difference rs1 −_f rs2: normal ⇒ trichotomy on sign; NaR ⇒ unordered (sign=Z, carry=N), distinct from true equality (sign=Z, carry=Z).

Free operations — these existing instructions work correctly on T26F values: - NEG (opcode −35): trit inversion = Tekum negation (Proposition 3: θ(−t) = −θ(t)) - TABS (opcode +5): absolute value preserved since t[26] = Z is invariant under abs - LOAD / STORE: move 27-trit words without interpretation

See §7 for the T26F format specification.

Vector group — R format (opcode +15 to +22) — new in v0.8

Opcode (val) Mnemonic funct Operation
+15 VADD funct[0]=Z vd ← vs1 + vs2, lane-wise saturating to {N,Z,P} (see §16.3)
+15 VSUB funct[0]=N vd ← vs1 − vs2, lane-wise saturating
+16 VMUL Z vd ← vs1 × vs2, lane-wise (closed in {N,Z,P} — no saturation needed)
+17 VLOG see §16.4 Lane-wise logic: TAND / TOR / TNOT / TIMPL (LMODE-following) or CONS / ACONS (LMODE-bypass) per funct[0..2]
+18 VSEL rm in funct[0..2] vd[i] ← vs1[i] if vm[i]=N ; Z if vm[i]=Z ; vs2[i] if vm[i]=P (see §16.4)
+19 VCMP Z vd[i] ← sign(vs1[i] − vs2[i]) — produces a ternary mask in {N,Z,P} per lane
+20 VRED see §16.4 Reduction vs1 → scalar GPR rd: SUM / SIGN / CONS / LST / MST / AND / OR per funct[0..2]
+21 VPERM see §16.4 Lane permutation: rotate / shift / reverse / shuffle per funct[0..2]
+22 VMOVE see §16.5 Inter-bank and intra-bank movement: GPR↔v-reg whole-word, v-reg→v-reg, broadcast / insert / extract

All vector opcodes use R format. With the exception of VMOVE (§16.5) and VRED (whose rd is a GPR), rs1, rs2, and rd are all v-reg indices.

No new format is introduced — vectors reuse R format unchanged. The bank is entirely determined by opcode value.

See §16 for the full vector specification (datapath model, lane semantics, ternary advantages, mask conventions, reduction rules, BitNet example).

Reserved extensions (opcode +23 to +40)

Opcodes +14 and +23 to +40 are reserved for future extensions: - +14 : reserved TFP (FSQRT, FMA…) - +23 to +24 : reserved Vector v0.9 (FMA, gather/scatter, tryte/trybble lane modes) - +25 to +40 : implementer-defined / custom


5. Balanced ternary arithmetic

5.1 Addition

Addition follows balanced base-3 tables. The carry is also balanced ternary:

a + b  →  sum | carry
N + N  →   P  |  N     (−1 + −1 = −2 = +1 − 3 → sum=P, carry=N)
N + Z  →   N  |  Z
N + P  →   Z  |  Z     (−1 + +1 = 0)
Z + Z  →   Z  |  Z
Z + P  →   P  |  Z
P + P  →   N  |  P     (+1 + +1 = +2 = −1 + 3 → sum=N, carry=P)

When a carry-in is present (multi-trit addition), the full 3-input sum produces sum and carry-out in the same balanced ternary system.

5.2 Negation (NEG)

NEG inverts every trit: P→N, Z→Z, N→P. This is the “free” operation of balanced ternary — it requires no carry chain, just a trit-wise inversion.

5.3 Multiplication

Trit-by-trit extended product. MUL returns the low 27 trits (truncated). MULH returns the high 27 trits of the 54-trit product.

For a full 54-trit result: MUL rd_lo, rs1, rs2 then MULH rd_hi, rs1, rs2.

The trit-by-trit product table is:

a × b → product
N × N →  P    (−1 × −1 = +1)
N × Z →  Z    (−1 ×  0 =  0)
N × P →  N    (−1 × +1 = −1)
Z × Z →  Z
Z × P →  Z
P × P →  P    (+1 × +1 = +1)

5.4 Integer division (symmetric Euclidean)

Setnex uses symmetric Euclidean division, the natural convention for balanced ternary:

This minimizes the magnitude of the remainder, which aligns with the balanced ternary philosophy of keeping values centered on zero.

When |b| is odd (including all powers of 3), the tie-break case does not occur and the result is unique.

Division by zero triggers the EXC_DIV0 exception.

Comparison with C convention: C truncates toward zero, which can produce larger remainders. The symmetric convention is more natural for balanced ternary and simplifies subsequent computations on the remainder.

5.5 FLAGS register

The FLAGS CSR (address 3) is updated after every ALU instruction (ADD through TCMP) and after CMP/CMPI. Both flag trits are fully ternary (N/Z/P), not binary.

Trit Position Name Values
t[0] sign Result sign / comparison trichotomy N = negative, Z = zero, P = positive
t[1] carry Carry-out from the most significant trit N / Z / P = outgoing carry from the balanced ternary adder, N = underflow, P = overflow
t[2..26] Reserved (always Z)

After ALU instructions (ADD, SUB, MUL, etc.): - sign = sign(result) : N if result < 0, Z if result = 0, P if result > 0. - carry = carry-out from trit position 26 of the adder (meaningful for ADD/SUB; Z for other ALU ops). N if the true result was below −(3²⁷−1)/2 (underflow), P if above +(3²⁷−1)/2 (overflow), Z otherwise.

After CMP rs1, rs2: - sign = sign(val(rs1) − val(rs2)) — this is the trichotomy trit: N if rs1 < rs2, Z if rs1 = rs2, P if rs1 > rs2. - carry reflects the subtraction rs1 − rs2 internally.

After CMPI rs1, imm17: same behavior with the zero-extended immediate in place of rs2.

The trichotomy trit in FLAGS.sign is the native ternary comparison result — it encodes three outcomes (less, equal, greater) in a single trit, which would require two bits in binary.

5.5.1 FLAGS datapath

Although FLAGS is accessible as CSR address 3 (for context save/restore on exception), it is not a generic CSR — it has dedicated wiring to the execution units:

The CSRR/CSRW path (through the CSR file) is the slow path used only by exception prologue/epilogue (ESAVE, IRET) and by software that needs to inspect or fabricate FLAGS explicitly. Normal use is entirely implicit through the dedicated wires above.

Implementation note: FLAGS needs only 2 trits of storage (t[0], t[1]); t[2..26] are hard-wired to Z and do not require flip-flops.


6. Configurable ternary logic (LMODE)

The LMODE CSR (address 2) selects the truth tables for logic instructions TAND, TOR, TNOT, and TIMPL. LMODE holds a T27 value; only trit t[0] is significant.

When LMODE=N, the STATUS.lx trit (t[2]) selects among three sub-modes. This provides 5 logic modes total using only two trits.

6.1 Logic mode map

LMODE t[0] STATUS.lx Logic Z means… Notes
N N Heyting (HT) not provable Intuitionistic; NOT and IMPL differ from Łukasiewicz
N Z Łukasiewicz (Ł) not yet known Most tolerant; IMPL = order test
N P RM3 both true and false Paraconsistent (Routley–Meyer); IMPL differs
Z (ignored) Kleene (K) undecidable Default at reset; neutral; SQL 3VL
P (ignored) B3 (Bochvar) meaningless Z infectious: any input Z → output Z

At reset, LMODE = 0 and STATUS = 0 → Kleene active without explicit configuration.

STATUS.lx is only significant when LMODE=N. When LMODE=Z or LMODE=P, the lx trit is ignored.

6.2 Mode descriptions

Kleene (LMODE=Z) — the default. AND = min, OR = max, NOT = negation, IMPL(a,b) = OR(NOT(a), b). This is the standard three-valued logic used by SQL for NULL handling. Z propagates through some operations but not all.

Łukasiewicz (LMODE=N, STATUS.lx=Z) — shares AND/OR/NOT with Kleene. Differs only on IMPL: IMPL(a,b) = min(P, −a + b + 1). The result is P if and only if a ≤ b, making TIMPL a trit-parallel subsumption test. Most tolerant of indetermination.

Heyting (LMODE=N, STATUS.lx=N) — intuitionistic logic. Shares AND/OR with Kleene/Łukasiewicz but differs on NOT: NOT_HT(Z) = N, NOT_HT(P) = N (only N maps to P). IMPL uses the Heyting algebra residual: IMPL_HT(a,b) = greatest c such that AND(a, c) ≤ b. Most conservative about the unknown state: “not provable” is treated as false.

RM3 (LMODE=N, STATUS.lx=P) — paraconsistent logic (Routley & Meyer). Shares AND/OR/NOT with Kleene/Łukasiewicz. Differs only on IMPL. Z represents “both true and false” — a contradiction that does not explode into arbitrary conclusions. Suitable for reasoning with inconsistent data.

B3 / Bochvar (LMODE=P) — Bochvar’s internal logic (1937). Z is “meaningless” and infectious: any operation with a Z input produces Z. AND, OR, NOT, and IMPL are all affected. On classical inputs (N and P only), B3 reduces to Boolean logic. Most strict: incomplete data produces no conclusion.

Hardware cost of STATUS.lx: When LMODE=Z or LMODE=P, the lx trit is ignored and the datapath is unchanged. When LMODE=N, the TNOT instruction must check STATUS.lx to select between standard NOT (−a) and Heyting NOT. This is a single AND + MUX-2→1 on the NOT output, gated by LMODE=N ∧ STATUS.lx=N. The TIMPL MUX adds one input (RM3) to the existing Łukasiewicz/Heyting selector. Total added cost: negligible.

6.3 LMODE-insensitive operations

CONS and ACONS are always evaluated using Kleene semantics, regardless of LMODE. They are arithmetic primitives, not logic operations.

CONS extracts the common trit (agreement → value, disagreement → Z). ACONS extracts the absent trit (agreement → Z, disagreement → the missing trit from {N, Z, P}). Together they form a complete dual pair for balanced ternary arithmetic circuits.

TCMP is also LMODE-insensitive — it is a pure arithmetic comparison.

6.4 Complete truth tables

AND

Kleene / Łukasiewicz / Heyting / RM3 AND (identical — min):

AND N Z P
N N N N
Z N Z Z
P N Z P

B3 (Bochvar) AND — Z infectious:

AND N Z P
N N Z N
Z Z Z Z
P N Z P

OR

Kleene / Łukasiewicz / Heyting / RM3 OR (identical — max):

OR N Z P
N N Z P
Z Z Z P
P P P P

B3 (Bochvar) OR — Z infectious:

OR N Z P
N N Z P
Z Z Z Z
P P Z P

NOT

Kleene / Łukasiewicz / RM3 NOT (identical — negation):

a NOT(a)
N P
Z Z
P N

Heyting NOT:

a NOT(a)
N P
Z N
P N

B3 (Bochvar) NOT — Z infectious:

a NOT(a)
N P
Z Z
P N

Note: B3 NOT has the same table as Kleene NOT. The infectious property of B3 manifests in AND, OR, and IMPL, not in NOT (since NOT is unary and NOT(Z) = Z is already the “contaminated” result).

IMPL — all 5 modes are distinct

Kleene IMPL: IMPL(a,b) = OR(NOT(a), b) = max(−a, b)

IMPL N Z P
N P P P
Z Z Z P
P N Z P

Łukasiewicz IMPL: IMPL(a,b) = min(P, −a + b + 1)

IMPL N Z P
N P P P
Z Z P P
P N Z P

Heyting IMPL: IMPL(a,b) = greatest c such that AND(a, c) ≤ b

IMPL N Z P
N P P P
Z N P P
P N Z P

RM3 IMPL (paraconsistent):

IMPL N Z P
N P P P
Z N Z P
P N N P

B3 (Bochvar) IMPL: IMPL(a,b) = OR_B3(NOT_B3(a), b) — Z infectious:

IMPL N Z P
N P Z P
Z Z Z Z
P N Z P

All five IMPL tables are distinct. The discriminating cells:

(a, b) Kleene Łukasiewicz Heyting RM3 B3
(Z, N) Z Z N N Z
(Z, Z) Z P P Z Z
(N, Z) P P P P Z
(Z, P) P P P P Z
(P, N) N N N N N
(P, Z) Z Z Z N Z

7. Ternary floating-point format (T26F)

Setnex uses the Tekum balanced ternary tapered precision format (Hunhold, arXiv:2512.10964) for floating-point arithmetic. The native width is 26 trits (T26F), stored in a 27-trit register with t[26] = Z.

7.1 Why 26 trits

Tekum requires an even trit width (the anchor midpoint pattern +-+-…+- has 2k trits). Since the Setnex word is 27 trits (odd), T26F uses the lower 26 trits and reserves t[26] = Z. This preserves full compatibility with NEG, TABS, LOAD/STORE, and integer CMP (for non-NaR values).

7.2 Special values

Three patterns are special (all 26 trits identical):

Pattern (t[0..25]) Name Float value 27-trit register
all Z Zero 0.0 all Z
all P Infinity (∞) +∞ t[26]=Z, t[0..25]=P
all N Not a Result (NaR) undefined t[26]=Z, t[0..25]=N

NaR is the Tekum analogue of IEEE 754 NaN. There is only one infinity (unsigned, as in the real wheel algebra: 1/0 = ∞).

7.3 Anchor and decoding

For a non-special T26F value, the anchor is:

anc(t) = |t₂₆| − M

where t₂₆ denotes the lower 26 trits, |·| is balanced ternary absolute value, and M is the 26-trit midpoint pattern +-+-+-+-+-+-+-+-+-+-+-+-+- (13 repetitions of +-).

The anchor trits (big-endian, MST first) are partitioned as:

[regime: 3 trits] [exponent: c trits] [fraction: 26 − c − 3 trits]

Regime (r): signed integer from the 3 most significant anchor trits. r ∈ [−13, +13], though only |r| ≤ 7 encodes a valid value. Anchor patterns with |r| > 7 are reserved encodings and decode to NaR; a conforming TFP unit must not produce them as a result of any arithmetic operation.

Exponent trit count: c = max(0, |r| − 2). Near r = 0, all trits go to fraction (maximum precision). At extreme regimes, more trits go to exponent (maximum range). This is the tapered precision property.

Exponent value:

e = int(exponent_trits) + sign(r) × BIAS[|r|]

| |r| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |-----|—|—|—|—|—|—|—|—| | BIAS | 0 | 1 | 2 | 4 | 10 | 28 | 82 | 244 | | c (exponent trits) | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | | p (fraction trits) | 23 | 23 | 23 | 22 | 21 | 20 | 19 | 18 |

Fraction value: f = Σ trit_i × 3^(−i) for i = 1..p, where p = 26 − c − 3 is the fraction trit count. f ∈ (−0.5, +0.5).

Decoded value:

θ(t) = sign × (1 + f) × 3^e

where sign is P (+1) if the T26F value is positive, N (−1) if negative.

7.4 Precision and range

At regime r = 0: c = 0, all 23 fraction trits available → precision ≈ 23 × log₁₀(3) ≈ 11 decimal digits, exponent = 0 (values near 1.0).

At |r| = 7: 18 fraction trits (≈ 8.6 decimal digits), exponent range ±(244 + (3⁵−1)/2) = ±365 powers of 3 (≈ ±174 decimal decades).

7.5 Key properties

Monotonicity (Proposition 4): for non-special values, integer ordering on the raw 26-trit word corresponds to numerical ordering of the decoded float. The existing CMP instruction gives correct ordering for normal T26F values — but FCMP is required for proper NaR handling (NaR = all-N would otherwise compare as less than everything instead of unordered).

Free negation (Proposition 3): θ(−t) = −θ(t). Trit-by-trit inversion (NEG) negates the float. Since flip(Z) = Z, this works on the full 27-trit register with t[26] = Z.

Truncation is rounding: reducing precision by discarding least-significant fraction trits is equivalent to rounding, with no carry propagation needed.

7.6 T26F in a 27-trit register

The convention t[26] = Z ensures: - T26F values occupy the integer range [−(3²⁶−1)/2, +(3²⁶−1)/2], a strict subset of the T27 range. - LOAD/STORE transfer T26F values correctly (full 27-trit word move). - NEG and TABS operate correctly (Z is invariant under trit inversion and absolute value). - Software can distinguish integer and float values by testing t[26] (Z → possible float, non-Z → integer exceeding T26 range).

Integer operations on T26F values produce undefined float results. TFP instructions on non-T26F integer values produce undefined results. Type discipline is the programmer’s responsibility.


8. Exception handling

8.1 Exception causes

ECAUSE code Name Trigger ETVAL contents
−13 EXC_DIV0 Division by zero 0 (unused)
−12 EXC_ALIGN Misaligned memory access (future) Faulting effective address
−11 EXC_FAULT Invalid address Faulting effective address
−10 EXC_ILLEGAL Undefined opcode, reserved instruction, or kernel-only CSR accessed from user mode Raw 27-trit instruction word
−9 EXC_PERM_R MPU denies a LOAD (new in v0.7, see §9) Faulting effective address
−8 EXC_PERM_W MPU denies a STORE (new in v0.7) Faulting effective address
−7 EXC_PERM_X MPU denies an instruction fetch (new in v0.7) Faulting PC (same as EPC)
0 EXC_ECALL_U System call — user flavor (ECALL with imm17[0] = Z) 0 (syscall number is in a7, see §8.4)
+1 EXC_ECALL_H Hypercall (ECALL with imm17[0] = P) 0 (hypercall number is in a7, see §8.4)
+2 EXC_ECALL_D Debug trap (ECALL with imm17[0] = N) 0 (debug reason is in a7, see §8.4)
+10 EXC_OVERFLOW Reserved; not raised in v0.7 (awaits a STATUS enable trit in a future revision) 0 (unused)
+20 IRQ_0 Asynchronous interrupt, line 0 (new in v0.7, see §10) 0 (unused)
+21 IRQ_1 Asynchronous interrupt, line 1 0 (unused)
+22 IRQ_2 Asynchronous interrupt, line 2 0 (unused)
+23 IRQ_3 Asynchronous interrupt, line 3 0 (unused)
+24 IRQ_4 Asynchronous interrupt, line 4 0 (unused)
+25 IRQ_5 Asynchronous interrupt, line 5 0 (unused)
+26 IRQ_6 Asynchronous interrupt, line 6 0 (unused)
+27 IRQ_7 Asynchronous interrupt, line 7 0 (unused)
+28 IRQ_8 Asynchronous interrupt, line 8 0 (unused)

The three ECALL flavors share a single handler entry point (EVEC). Negative codes denote involuntary faults; positive codes in [0, +10] denote deliberate synchronous traps; positive codes in [+20, +28] denote asynchronous interrupts (§10). Code 0 is preserved for EXC_ECALL_U so that a v0.5 binary (whose imm17 is always Z) enters the handler with ECAUSE = 0 exactly as before. See §8.4 for the flavor-tag encoding.

Async interrupts (IRQ_0..IRQ_8) reuse the exception entry path of §8.2 unchanged — they differ from synchronous exceptions only in trigger (external line) and in the saved EPC, which points at the next instruction the CPU would have executed rather than a faulting one. See §10 for dispatch rules.

A triple-fault condition (a synchronous fault raised while STATUS.depth = N, i.e. both exception frames already in use) does not produce an ECAUSE code — it is not observable by any handler. Instead it triggers an immediate machine-check reset (§8.2).

8.2 Exception entry sequence

The bank selected for the save is determined by STATUS.depth at the moment the fault is taken:

STATUS.depth on entry Save target New depth
Z (0 frames active) main bank (EPC, ESAVE, ECAUSE, ETVAL) P
P (1 frame active) frame 2 bank (EPC2, ESAVE2, ECAUSE2, ETVAL2) N
N (2 frames active) — (machine-check reset; see below)

Case 1 — outer entry (depth = Z):

  1. ESAVE ← STATUS (save current processor status, including depth = Z)
  2. EPC ← PC (address of the faulting instruction)
  3. ECAUSE ← code
  4. ETVAL ← trap value (per §8.1; 0 if unused)
  5. STATUS.mode ← N (switch to kernel mode)
  6. STATUS.ie ← N (disable interrupts)
  7. STATUS.depth ← P (one frame now active)
  8. PC ← EVEC

Case 2 — nested entry (depth = P):

  1. ESAVE2 ← STATUS (save current status, including depth = P)
  2. EPC2 ← PC
  3. ECAUSE2 ← code
  4. ETVAL2 ← trap value
  5. STATUS.depth ← N (two frames now active); mode and ie are unchanged (already N)
  6. PC ← EVEC

Steps 1–7 (case 1) or 1–6 (case 2) are performed atomically — no further exception may be taken between them. The main-bank CSRs are untouched by a case-2 entry, so the outer handler’s return context is preserved.

Case 3 — triple fault (depth = N). If a synchronous fault occurs while both frames are already in use, no save is possible. The processor performs a machine-check reset: all CSRs are cleared to zero and PC ← 0, as if from power-on. No handler is invoked; ECAUSE is not written. Correct kernel code reaches this state only under a genuine hardware defect or runaway condition; a two-level-deep fault chain from user code alone is handled cleanly by case 2.

The handler reads EPC / EPC2, ECAUSE / ECAUSE2, and — when relevant — ETVAL / ETVAL2 via CSRR (selecting the bank by inspecting STATUS.depth on entry), processes the exception, then returns via IRET.

Rationale — why the outer handler is not re-entered on a nested fault. EVEC is shared between both frames, so the nested entry jumps to the same vector. The handler decides, by reading STATUS.depth first, whether it is handling an outer or a nested frame and uses the corresponding CSR bank. Duplicating EVEC would have given each frame its own vector at no functional gain, since the depth is already visible in STATUS.

8.3 Exception return (IRET)

IRET restores from the bank selected by the current STATUS.depth:

STATUS.depth on IRET Restore source Effect on depth
P (1 frame active) main bank (EPC, ESAVE) → Z (via STATUS ← ESAVE)
N (2 frames active) frame 2 bank (EPC2, ESAVE2) → P (via STATUS ← ESAVE2)
Z (no frame active) undefined (illegal IRET — reserved for future trap)

Case 1 — return from outer frame (depth = P):

  1. PC ← EPC
  2. STATUS ← ESAVE

Case 2 — return from nested frame (depth = N):

  1. PC ← EPC2
  2. STATUS ← ESAVE2

Both writes are performed atomically — no further exception can be taken between them. This prevents the STATUS/PC corruption that was possible in v0.1’s two-instruction sequence.

Because ESAVE / ESAVE2 hold the full prior STATUS (with the depth trit captured as Z / P respectively at entry time), the single STATUS ← ESAVE* write simultaneously restores mode, ie, lx, and decrements depth to its pre-entry value. No separate depth-decrement step is needed.

IRET uses opcode −3 (R format). The rd, rs1, rs2, and funct fields are ignored and should be set to zero. Executing IRET while depth = Z is reserved; v0.6 leaves the behavior undefined, and a future revision may assign EXC_ILLEGAL.

8.4 Syscall dispatch convention

The ECALL instruction carries a flavor tag in imm17[0] (the least-significant trit of the I-format immediate). The decoder maps the tag to one of three ECAUSE codes; all three flavors share a single EVEC.

imm17[0] Flavor ECAUSE on entry Intended use
Z user syscall EXC_ECALL_U = 0 Ordinary user→kernel transition
P hypercall EXC_ECALL_H = +1 Guest kernel → hypervisor transition (when a hypervisor is present)
N debug trap EXC_ECALL_D = +2 Breakpoint / debugger synchronous trap

imm17[1..16] are reserved. The decoder ignores them — only imm17[0] carries the flavor tag, so non-zero upper trits are silently tolerated (this keeps the decoder branch-free). Assemblers emit Z for forward compatibility and encode the flavor via dedicated mnemonics (ECALL, HCALL, DBGBRK — see §11.4) rather than through explicit immediate operands.

Backward compatibility. A v0.5 binary encodes every ECALL with imm17 = Z, which maps to imm17[0] = ZEXC_ECALL_U = 0. This is the exact cause code v0.5 produced, so an unmodified v0.6 handler that still dispatches only on ECAUSE = 0 continues to work.

User-side register contract at the point of ECALL (all flavors):

Register Role
a7 (r17) Call number (syscall / hypercall / debug reason, per flavor)
a0a6 (r10–r16) Arguments 1–7
a0 (r10) Return value, written by the handler, visible after resume

Kernel-side handler contract:

  1. On entry, ECAUSE ∈ {0, +1, +2} identifies the flavor; EPC points at the ECALL instruction itself.
  2. Dispatch on ECAUSE to the appropriate table (syscall / hypercall / debug). No re-fetch of EPC and no decoding of imm17 is required.
  3. Read a7 for the call number within the selected table; read a0a6 for arguments; execute; write the result to a0.
  4. Advance EPC by one word so the resumed program continues after ECALL, then IRET:
   CSRR   t0, EPC
   ADDI   t0, t0, 1
   CSRW   EPC, t0
   IRET

For a nested flavor trap (e.g. a debug breakpoint triggered while a syscall handler is running), the same epilogue applies to EPC2 instead — the handler uses the frame-2 bank (see §8.3).

Register preservation across ECALL is an OS-level policy, not an ISA mandate. The recommended baseline is: the kernel preserves all callee-saved registers (s0s9, sp, ra) and the argument registers a1a6 (regardless of how many the specific call actually reads); it overwrites a0 with the return value and is free to clobber a7 and the temporaries t0t3.

Rationale — why a7 rather than imm17 for the call number. Placing the call number in a register lets it be computed at runtime (libc wrappers, indirect dispatch tables) and keeps ECALL a pure trap with no decoded payload beyond the flavor tag. The handler dispatches on a value it already has in a GPR, avoiding a re-fetch and re-decode of the instruction at EPC.

Rationale — why a separate flavor tag at all. Hypercalls and debug traps have different trust and privilege semantics from ordinary syscalls; routing them through distinct ECAUSE values lets the handler pick the right dispatch table in one step, without mixing them in a single numeric space shared with OS syscall numbers (which would force each OS to carve out reserved ranges for hypercalls/debug).

8.5 Exception trap value (ETVAL, ETVAL2)

ETVAL (CSR address 9, MST-first +00) and its frame-2 counterpart ETVAL2 (CSR address 13, MST-first +++) are T27 words written by the processor on every exception entry (§8.2). They carry exception-specific context that does not fit into the 27 values available in ECAUSE / ECAUSE2. The per-exception semantics is given by the ETVAL contents column of §8.1.

Exception ETVAL (or ETVAL2) contents
EXC_FAULT, EXC_ALIGN The effective address computed by the faulting LOAD / STORE (i.e. rs1 + imm17 of the offending access)
EXC_ILLEGAL The raw 27-trit instruction word the decoder rejected
EXC_ECALL_U, EXC_ECALL_H, EXC_ECALL_D 0 — the call number is passed in a7, see §8.4
EXC_DIV0, EXC_OVERFLOW 0 — no auxiliary value needed

ETVAL is written on a case-1 (outer) entry and ETVAL2 on a case-2 (nested) entry; a nested fault never overwrites the outer handler’s ETVAL. An outer handler can therefore safely read ETVAL once at the top and rely on that value remaining valid across any depth-1 nested fault it may subsequently incur; a nested handler reads ETVAL2 by the same rule.

Rationale — dedicated CSR instead of re-decoding EPC. Reconstructing the faulting address or the offending opcode from EPC requires an instruction fetch, which may itself fault (self-modifying code, paged-out text section). Exposing the value directly in a CSR decouples the handler from whatever state the instruction stream is in. Matches the RISC-V mtval design.


9. Memory Protection Unit (MPU)

9.1 Role

The Memory Protection Unit controls which memory addresses are accessible from each privilege mode and for which access type (read, write, instruction fetch). It converts the §2.3 address-space convention into hardware-enforceable boundaries: a user-mode LOAD or STORE targeting an address without user permission raises EXC_PERM_R or EXC_PERM_W, and an instruction fetch without execute permission raises EXC_PERM_X (§8.1).

The MPU performs no address translation — the address presented by the pipeline is the address delivered to memory. It decides, on each access, whether that access may proceed. This is the “PMP” style of RISC-V, not the “MMU” style.

9.2 Region model

The MPU holds nine region descriptors, indexed 0..8. Each descriptor specifies:

Regions are naturally aligned power-of-three (NAPO3) blocks. The descriptor bank is internal processor state, not memory-mapped; software manipulates it through three CSRs (§9.3).

Why no “whole address space” region. The address space is symmetric ([−(3²⁷−1)/2, +(3²⁷−1)/2]) while a NAPO3 region is a one-sided half-open interval [base, base + 3ⁿ); there is no single well-aligned (base, n) pair that covers both halves without wrap. To blanket the full range, either rely on the kernel-mode no-match default (§9.5 step 4), or program two top-size regions — one anchored at a negative base and one at a non-negative base.

9.3 Indirect CSR access

Three CSRs form the programming interface:

CSR Addr Role
MPU_SELECT −1 Index of the region targeted by subsequent MPU_BASE / MPU_CFG operations. Legal values: 0..8.
MPU_BASE −2 Reads or writes the base field of the selected region.
MPU_CFG −3 Reads or writes the config field (size + permissions + valid) of the selected region; see §9.4.

Programming region i is therefore:

   LI     t0, i
   CSRW   MPU_SELECT, t0
   CSRW   MPU_BASE, base_value
   CSRW   MPU_CFG, cfg_value

All three CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL. Writing a value outside [0, 8] to MPU_SELECT, or a malformed MPU_CFG (reserved trits non-zero, size out of range), also raises EXC_ILLEGAL, with the instruction word in ETVAL.

Rationale — indirection rather than direct mapping. A direct mapping would require 18 CSR slots for 9 × (base, cfg), which does not fit in the 3-trit CSR address space. Indirection keeps the CSR cost constant and lets the region count grow in a future revision without touching the CSR map.

9.4 MPU_CFG layout

MPU_CFG packs the non-base fields into a single 27-trit word:

Trit(s) Field Values
t[0] perm_R N = no read, Z = kernel read only, P = user + kernel read
t[1] perm_W N = no write, Z = kernel write only, P = user + kernel write
t[2] perm_X N = no execute, Z = kernel execute only, P = user + kernel execute
t[3..6] size Size exponent n (4-trit T4 integer); region covers 3ⁿ words. Legal range: n ∈ [0, 26]. n = 0 → one word; n = 263²⁶ consecutive words (slightly less than half the address space). Values n < 0 or n ≥ 27 written via CSRW MPU_CFG raise EXC_ILLEGAL.
t[7] valid P = active, N = inactive, Z = reserved (descriptor treated as inactive)
t[8..26] Reserved (must be Z on write; read as 0)

9.5 Matching and permission check

On every memory access — LOAD, STORE, or instruction fetch — with effective address A in current mode M, the MPU performs:

  1. Scan the 9 regions. Region i matches if valid(i) = P and A ∈ [base(i), base(i) + 3^size(i)).

  2. Select. If multiple regions match, the one with the lowest index wins (region 0 has highest priority).

  3. Evaluate the permission trit for the access type (perm_R for LOAD, perm_W for STORE, perm_X for fetch) against mode M:

Permission trit Kernel (mode = N) User (mode = P)
N (none) Deny — fault Deny — fault
Z (kernel only) Allow Deny — fault
P (user + kernel) Allow Allow
  1. No-match default: - Kernel mode: allow (default-permit). - User mode: deny (default-deny) — raises EXC_PERM_*.

  2. On deny, the access is suppressed and one of EXC_PERM_R, EXC_PERM_W, EXC_PERM_X is raised via the §8.2 entry sequence. ETVAL holds the faulting effective address (or the faulting PC, identical to EPC, for EXC_PERM_X).

Rationale — asymmetric defaults. In kernel mode the MPU acts as a blacklist (poison regions fault even for kernel); in user mode it acts as a whitelist (user can only reach addresses explicitly granted). This mirrors RISC-V PMP semantics and fits the v0.6 model where the kernel is trusted by default and user code must be explicitly admitted.

9.6 Poison regions and defensive use

A region at low index (0 or 1) with perm_R = perm_W = perm_X = N and valid = P is a poison region: any access — even kernel — faults. Typical uses:

Because lowest index wins (§9.5 step 2), a poison region at index 0 overrides any permissive region at index 1+, even for the kernel.

9.7 NAPO3 alignment

Natural alignment requires base(i) mod 3^size(i) = 0. The hardware does not check alignment at CSRW MPU_BASE time. A misaligned base is legal syntactically; the containment test in §9.5 step 1 uses the literal integer interval [base, base + 3^n) and may match addresses the author did not intend. Alignment is software responsibility — typically an assembler or linker computes region bases as multiples of 3ⁿ for the chosen n, and a misaligned base is a programming error, not a legitimate runtime state.

9.8 Reset state

At reset, every region descriptor has valid = N. The MPU is effectively disabled: no regions match, kernel-mode accesses proceed under the no-match default of §9.5 step 4, and the boot firmware (which runs in kernel mode per §2.4) executes unimpeded. The boot firmware is expected to program whichever regions the system needs before switching to user mode.

Typical boot sequence:

  1. Kernel programs one or more regions granting user-mode perm_X = P over the user text segment, perm_R = P / perm_W = P over user data and stack.
  2. Kernel programs defensive regions (poison windows, kernel-only overlays) if desired.
  3. Kernel switches STATUS.mode to P via an IRET that restores a user-mode ESAVE.

9.9 Interactions


10. Asynchronous interrupts

10.1 Role

Asynchronous interrupts deliver external events (timer tick, peripheral completion, incoming byte) to the CPU independently of the instruction stream. Unlike a synchronous exception (§8), an interrupt is not caused by the instruction in flight; it arrives between instructions, driven by a signal outside the pipeline.

Interrupts are Setnex’s only source of preemption: without them, a user program that does not voluntarily ECALL keeps the CPU forever. A timer IRQ lets the kernel reclaim the CPU at a bounded cadence — the foundation of preemptive scheduling.

10.2 Controller model

The controller presents nine IRQ lines, indexed k ∈ {0..8}. For each line it maintains three software-visible state elements:

A line is eligible when pending and enabled. The controller selects the eligible line with the highest priority; ties are broken by lowest line number.

10.3 CSR layouts

IPENDING (addr −4) — pending bitvector

Trit Meaning
t[k] for k ∈ [0, 8] Line k: P = pending, Z = idle, N = reserved
t[9..26] Reserved (read as 0)

Writing Z to a trit clears a pending state that was set by software or by a now-deasserted external source; if the external source is still asserting, the controller re-raises the trit on the next cycle. Writing P to a trit whose external source is not asserting synthesizes a software-generated interrupt (implementation-defined latency). Writing N has no effect — the trit retains its previous value. The N value is reserved for a future semantic (e.g. sticky / edge-latched pending) and kept unassigned to preserve forward compatibility.

IENABLE (addr −5) — enable mask

Same layout as IPENDING: t[k] = P enables line k, Z disables it. Writing N has no effect — the trit retains its previous value (reserved, as for IPENDING).

IPRIORITY (addr −6) — per-line priority

Nine 3-trit fields pack exactly into 27 trits:

Trit field Line Priority
t[0..2] 0 T3 integer ∈ [−13, +13]; higher = higher priority
t[3..5] 1
t[6..8] 2
t[9..11] 3
t[12..14] 4
t[15..17] 5
t[18..20] 6
t[21..23] 7
t[24..26] 8

At reset all priority fields are 0; arbitration then reduces to lowest-line-number wins.

10.4 Dispatch rule

Between instructions, the CPU evaluates:

if STATUS.ie = P
   and STATUS.depth ≠ N                 (no save slot available → no async entry)
   and ∃ k : IPENDING[k] = P AND IENABLE[k] = P:
      k* ← argmax_k (IPRIORITY[k]), lowest k on ties
      take interrupt IRQ_k*              (see §10.5)

The check happens at each instruction-commit boundary. An interrupt cannot preempt a single instruction mid-way; it waits for the next commit point.

10.5 Entry, handler, return

Entry reuses §8.2 verbatim. For a case-1 entry (user code running, depth = Z, line k* wins arbitration):

  1. ESAVE ← STATUS
  2. EPC ← PC_next — address of the next instruction that would have executed (not a faulting one)
  3. ECAUSE ← +20 + k*
  4. ETVAL ← 0
  5. STATUS.mode ← N, STATUS.ie ← N, STATUS.depth ← P
  6. PC ← EVEC

A case-2 (nested) entry uses the frame-2 bank (EPC2, ESAVE2, ECAUSE2, ETVAL2) as in §8.2.

Handler boilerplate. The handler must select the correct exception bank based on STATUS.depth before reading ECAUSE / ECAUSE2:

   LI     t2, 3                    ; trit index for depth
   CSRR   t0, STATUS
   TGET   t1, t0, t2               ; LST(t1) = depth trit (P = outer, N = nested)
   BRT3   t1, impossible, nested   ; P → fall-through (outer frame),
                                   ; Z → impossible (no frame active),
                                   ; N → nested (frame 2)
   ; --- outer frame (depth = P): cause/data in main bank ---
   CSRR   t0, ECAUSE
   JMP    dispatch
nested:
   ; --- nested frame (depth = N): cause/data in frame-2 bank ---
   CSRR   t0, ECAUSE2
dispatch:
   ADDI   t1, t0, -20              ; t1 = line number if this is an IRQ
   ; ... dispatch on t1 to per-line service routine ...
   ; service the peripheral, quiesce the line
   IRET
impossible:
   ; depth = Z inside a handler is a spec violation — halt for diagnosis
   HALT

IRET restores STATUS (which brings ie back to its pre-entry value, typically P) and jumps to the bank-appropriate EPC (see §8.3) — the instruction that was pending when the interrupt fired.

The same depth-select prologue applies to every exception handler, not only IRQ: a synchronous-fault handler that reads ETVAL / ECAUSE must branch through the same STATUS.depth test before selecting the main or frame-2 bank. Real kernels typically factor it into a single shared trampoline.

10.6 Acknowledgment

The handler must quiesce the source before IRET, typically by accessing a status or completion register on the peripheral. If the external line remains asserted on return, the controller re-raises IPENDING[k] and the interrupt re-fires immediately after ie goes back to P. Software may also clear IPENDING[k] directly by CSRW IPENDING with Z at position k, but a still-asserting external source will re-set the trit on the next cycle.

10.7 Masking

Interrupt delivery is gated by three independent mechanisms, all of which must allow it:

  1. Global enable: STATUS.ie = P. Any other value blocks all IRQs. Set to N automatically on any exception entry (§8.2).
  2. Per-line enable: IENABLE[k] = P.
  3. Frame-depth guard: STATUS.depth ≠ N. When both frames are in use, no save slot is available and async entry is suppressed until IRET frees a frame. This is automatic and cannot be overridden.

A synchronous exception entry sets ie ← N per v0.6 §8.2, so an IRQ cannot preempt a fresh sync handler. A handler that wants low-latency nesting (IRQs accepted during a long syscall) must re-enable ie explicitly after saving whatever state it cares about — a subsequent IRQ would then push to frame 2.

10.8 Priority arbitration

Priorities are software-controlled 3-trit fields per line ([−13, +13]). The controller picks the eligible line with the maximum priority value; ties go to the lowest line number. Negative priorities are legal — they rank below default-zero lines without fully disabling them.

Rationale — software-controlled priority rather than line-hardwired. Hardwiring priority to the line number (line 0 always wins) is simpler but gives no knob to re-rank the timer below an urgent disk IRQ. A per-line 3-trit field costs exactly one CSR for all 9 lines and subsumes the hardwired case (all priorities = 0 → line number arbitrates).


11. Calling convention (ABI)

11.1 Argument passing

11.2 Callee-saved registers

s0s9 (r8–r9, r18–r25), sp (r2), ra (r1).

Caller-saved (may be clobbered across a call): t0t3 (r5–r7, r26), a0a7 (r10–r17).

11.3 Stack

The stack grows toward negative addresses. sp points to the top of the stack (last valid word). sp must always be word-aligned — since the address space is word-addressed (§2.3), this is trivially satisfied and imposes no additional constraint on the compiler.

Frame layout. A function that needs to save ra and the caller’s s0 (the standard non-leaf case) allocates N ≥ 2 words and lays out the frame as follows (high addresses at the top):

 high addr   ┌─────────────┐ ← (caller's sp = this frame's s0)
 sp+(N−1)   │  saved ra    │
 sp+(N−2)   │  saved s0    │
 sp+(N−3)   │  local[0]    │
   …        │      …       │
 sp+0       │  local[N−3]  │ ← sp
 low addr   └─────────────┘

Standard prologue — allocates the frame first so that at no point does sp transiently reference an unallocated region; then saves ra and the old s0 into the reserved slots; finally installs the new frame pointer:

  ADDI   sp, sp, -N        # allocate frame (N = locals + 2)
  STORE  ra, sp, N-1       # save return address at top of frame
  STORE  s0, sp, N-2       # save caller's frame pointer
  ADDI   s0, sp, N         # new s0 = caller's sp

Standard epilogue — mirror of the prologue:

  LOAD   ra, sp, N-1       # restore return address
  LOAD   s0, sp, N-2       # restore caller's frame pointer
  ADDI   sp, sp, N         # deallocate frame
  RET

A leaf function that makes no further calls and does not use s0 may skip saving ra and s0 entirely, reducing the prologue to a single ADDI sp, sp, -N (or omitting it if no locals are spilled).

11.4 Pseudo-instructions (assembler)

Pseudo Expansion Notes
RET JMPA ra, 0 Return from subroutine
MOV rd, rs ADD rd, rs, zero Register copy
NEG rd, rs native NEG rd, rs (R format, rs2/funct ignored) Balanced ternary negation
CALL label CALL offset23(label) Compute offset from PC
LI rd, imm LI if fits imm17; else LUI + ADDI Load arbitrary immediate
TSET rd, rs1, rs2 TSETZ rd, rs1, rs2 Alias: clear trit to Z
TNIMPL rd, a, b TNOT t0, b then TAND rd, a, t0 Non-implication: a AND NOT b
TREIMPL rd, a, b TIMPL rd, b, a Reverse implication: b ⇒ a
NOT rd, rs TNOT rd, rs Mnemonic alias for clarity
BFLT label BF P00, label Branch if FLAGS.sign = N (less than)
BFEQ label BF 0P0, label Branch if FLAGS.sign = Z (equal)
BFGT label BF 00P, label Branch if FLAGS.sign = P (greater than)
BFLE label BF PP0, label Branch if FLAGS.sign ≤ Z (less or equal)
BFGE label BF 0PP, label Branch if FLAGS.sign ≥ Z (greater or equal)
BFNE label BF P0P, label Branch if FLAGS.sign ≠ Z (not equal)
ECALL ECALL imm17=0 (i.e. imm17[0] = Z) User syscall flavor (EXC_ECALL_U); v0.5-compatible default encoding
HCALL ECALL imm17[0] = P, imm17[1..16] = Z Hypercall flavor (EXC_ECALL_H); new in v0.6
DBGBRK ECALL imm17[0] = N, imm17[1..16] = Z Debug-trap flavor (EXC_ECALL_D); new in v0.6

11.5 Syscall calling convention

Distinct from the standard function call convention of §11.1. When user code invokes a kernel service via ECALL:

See §8.4 for the handler-side contract and the recommended register-preservation policy.

A libc-style wrapper setting the syscall number from a symbolic constant:

; write(fd, buf, len) — fd in a0, buf in a1, len in a2 on entry
write:
    LI     a7, SYS_WRITE      ; syscall number → r17
    ECALL
    RET                        ; a0 now holds the syscall result

A pass-through wrapper (the number is already in a7):

; long syscall(long number, long arg0, ..., long arg6)
; number already in a7, args already in a0..a6
syscall:
    ECALL
    RET

12. Reference encoding (textual representation)

Trits are written using the -/0/+ convention.

In memory layout (LST-first): t[0] is stored and written first. Instruction encoding diagrams use this convention.

In human-readable display (MST-first): the most significant trit is written leftmost, as in ordinary number notation. Register addresses and integer literals use this convention.

Each convention is explicitly labeled.

Example — ADD r3, r1, r2 (R format)

Field values: - opcode ADD = −40 = enc(−40, 4): −40 ÷ 3 → q = −13, r = −1 → t[0] = N; −13 ÷ 3 → q = −4, r = −1 → t[1] = N; −4 ÷ 3 → q = −1, r = −1 → t[2] = N; −1 ÷ 3 → q = 0, r = −1 → t[3] = N. Result: ---- (LST-first). Verify: −1 −3 −9 −27 = −40 ✓

LST-first memory layout:

t[0-3]  t[4-6]  t[7-9]  t[10-12]  t[13-26]
----    0+0     +00     -+0       00000000000000

Full 27-trit word (LST-first): ----0+0+00-+000000000000000

The opcode sits at t[0]–t[3] — the decoder starts working as soon as the first trits arrive.


13. Architectural summary

Parameter Value
Word width 27 trits
General-purpose registers 27 (r0–r26), T3 address
Vector registers (new in v0.8) 27 (v0–v26), T3 address, 27 trits each — bank selected by opcode
CSR registers 27 addressable (T3), 19 defined
Instruction width 27 trits (fixed)
Instruction formats 5 (R, I, J, U, B)
Opcode 4 trits (81 values, 60 used in v0.8)
Address space ±(3²⁷−1)/2 words ≈ ±3.6 × 10¹²
Max immediate (I format) 17 trits ≈ ±64 million
Max branch offset (J format) 20 trits ≈ ±1.7 billion
Max jump offset (U format) 23 trits ≈ ±4.7 × 10¹⁰
Logic modes 5: Kleene (default), Łukasiewicz, Heyting, RM3, B3 (Bochvar)
Floating-point T26F: 26-trit Tekum in 27-trit register (t[26] = Z)
Arithmetic flags 2 trits: sign, carry
Division convention Symmetric Euclidean
Memory unit 1 word = 27 trits
Endianness Least significant trit first (little-endian)
Exception frames 2 (main bank + frame 2); triple-fault = machine-check reset
MPU regions 9, indirect CSR access, NAPO3 sizing, per-axis ternary permissions
Interrupt lines 9 async IRQs, per-line enable / priority, shared EVEC with sync exceptions
Vector lanes (new in v0.8) 27 × 1-trit lanes per word (trit-parallel); 8 vector opcodes (+15..+22)

14. Complete opcode map

Quick-reference table, sorted by opcode value.

Opcode Mnemonic Format Group
−40 ADD / ADDS / ADC R ALU
−39 SUB / SUBS / SBC R ALU
−38 MUL / MULH R ALU
−37 DIV R ALU
−36 MOD R ALU
−35 NEG R ALU
−34 TAND R ALU / Logic
−33 TOR R ALU / Logic
−32 TNOT R ALU / Logic
−31 TIMPL R ALU / Logic
−30 CONS R ALU / Trit
−29 ACONS R ALU / Trit
−28 TSHIFT R ALU / Trit
−27 TCMP R ALU / Trit
−26 LOAD I Memory
−25 STORE I Memory
−24 LI I Memory
−23 LUI I Memory
−22 ADDI I Memory
−21 BRT3 B Branch
−20..−19 reserved
−18 CMPI I Memory
−17 BEQ J Branch
−16 BNE J Branch
−15 BLT J Branch
−14 BGT J Branch
−13 BLE J Branch
−12 BGE J Branch
−11 JMPA J Branch
−10 BF J Branch
−9 JMP U Jump
−8 CALL U Jump
−7 CSRR I System
−6 CSRW I System
−5 CSRX I System
−4 ECALL I System — imm17[0] = flavor tag (Z/P/N → user/hyper/debug); imm17[1..16] reserved, ignored by decoder
−3 IRET R System
−2 TSEL R Special
−1 NOP Special
0 HALT Special
+1 TGET R Trit ops
+2 TSET(N/Z/P) R Trit ops
+3 TSIGN R Trit ops
+4 CMP R Trit ops
+5 TABS R Trit ops
+6 TMIN R Trit ops
+7 TMAX R Trit ops
+8 FADD R TFP
+9 FSUB R TFP
+10 FMUL R TFP
+11 FDIV R TFP
+12 FCMP R TFP
+13 FCVT R TFP
+14 reserved (TFP)
+15 VADD / VSUB R Vector
+16 VMUL R Vector
+17 VLOG R Vector — funct[0..2] selects TAND/TOR/TNOT/TIMPL/CONS/ACONS
+18 VSEL R Vector — 3-way ternary select with mask in funct[0..2]
+19 VCMP R Vector — lane-wise sign-of-difference, produces ternary mask
+20 VRED R Vector — reduction to GPR; funct[0..2] selects SUM/SIGN/CONS/LST/MST/AND/OR
+21 VPERM R Vector — funct[0..2] selects rotate/shift/reverse/shuffle
+22 VMOVE R Vector — inter-bank movement; funct[0..2] selects mode (§16.5)
+23..+24 reserved (Vector v0.9)
+25..+40 reserved (Custom)

15. Extension roadmap

v0.9 — Vector extension follow-up

Extension Opcode range Rationale
VFMA (fused multiply-accumulate) +23 Closes the gap with TFP and ML kernels: vd ← vd + vs1 × vs2 lane-wise in one instruction. Critical for dot-product / convolution loops.
VGATHER / VSCATTER +24 Indirect lane addressing via an index v-reg. Enables sparse-data vector code. May share a single opcode via funct direction trit.
Tryte / trybble lane modes funct of existing vector opcodes Optional 9 × 3-trit and 3 × 9-trit lane partitions, selected per-instruction via funct trits. Reuses the existing v-reg bank and ALU; only the carry-cut points differ. Investigated only if a target workload (DSP, codecs) demonstrates clear gain over the trit-parallel default.
VLEN CSR (vector length) new CSR slot Optional: a runtime-configurable active-lane count for dynamic vector length, as in RVV. Deferred until a use case appears — current trit-parallel model already covers fixed-width 27-lane work.
Masked arithmetic funct trit on VADD / VMUL / VLOG Per-lane predication using a mask v-reg, separate from VSEL. Adds a one-trit funct selector and one extra register field via funct[3..5]. Defer pending design review.

v1.0 — Stabilization

ISA freeze: a complete reference implementation (Python simulator + assembler), regression test suite at every spec section, and a worked-example application (e.g. tritlib-driven kernel + user program demonstrating syscall, MPU, IRQ, and vector loop). No new opcodes between v0.9 and v1.0.

Hardware target — post-1.0

Once the ISA reaches 1.0, the following implementation milestones are pursued outside of the versioned specification:

Milestone Notes
FPGA soft core VHDL or Verilog description of Setnex; each trit encoded as 2 bits on FPGA fabric, ternary signals on external bus (same approach as 5500FP). Target: iCE40 (open toolchain via nextpnr/yosys) or Xilinx/Intel.
Assembler tooling Setnex assembler in Python, building on tritlib.
ASIC exploration Contingent on CNT or memristor ternary gate availability; long-term.

16. Vector extension

16.1 Role and datapath model

The vector extension turns the existing 27-trit datapath into a trit-parallel SIMD unit. Each vector instruction operates simultaneously on the 27 trits of a vector register, treating the register as a vector of 27 lanes × 1 trit. One VADD performs 27 independent trit additions in the cycles a single scalar ADD would have used to compute the equivalent 27-trit sum with carry propagation — the saving comes not from the arithmetic itself but from amortizing fetch, decode, register-file access, and loop control over 27 elements.

The model is deliberately minimal:

16.2 Vector register bank

27 vector registers v0..v26, each a full 27-trit word. The bank is disjoint from the GPR bank: vector instructions reference v-regs through the same 3-trit register field they would use for GPRs in scalar instructions, with the bank determined entirely by the opcode (see §2.1.1 dispatch table).

v0 is writable by convention but software typically maintains it at zero to serve as a vector-zero source (analogous to r0 but not hardware-enforced).

The bank carries no implicit register conventions (no caller/callee-saved split is mandated at the ISA level); the v-reg ABI is left to the platform — see §16.6 for the recommended save/restore rules.

16.3 Lane arithmetic semantics

For arithmetic opcodes (VADD, VSUB, VMUL), the result is computed independently per lane and clamped to {N, Z, P}:

Operation Lane result
VADD vd, vs1, vs2 vd[i] ← clamp(vs1[i] + vs2[i]) for i = 0..26
VSUB vd, vs1, vs2 vd[i] ← clamp(vs1[i] − vs2[i])
VMUL vd, vs1, vs2 vd[i] ← vs1[i] × vs2[i] (closed in {N,Z,P}; no clamp needed)

where clamp(x) maps x ∈ [−2, +2] to {N, Z, P} as: clamp(−2) = N, clamp(−1) = N, clamp(0) = Z, clamp(+1) = P, clamp(+2) = P.

No FLAGS update. Vector arithmetic instructions do not modify the FLAGS register — the saturation policy avoids overflow signalling, and per-lane status bits would require widening FLAGS or introducing a vector flags register, both rejected for v0.8. Software needing per-lane sign tests should use VCMP vd, vs1, vzero to materialize a sign mask.

Multiplication is closed. The product of any two trits in {N, Z, P} is in {N, Z, P} (see §5.3 product table); VMUL therefore needs no saturation logic at all — it is the cleanest of the three.

No vector divide. Trit-by-trit division has no useful semantics on 1-trit lanes (P / P = P, Z / Z = NaR, etc.); the operation is omitted. Division is a scalar concern in this revision.

16.4 funct sub-mode encodings

Several vector opcodes use the low 3 trits of the funct field as a sub-mode selector. The encodings are:

VLOG (opcode +17)

funct[0..2] (LST-first) Mnemonic Operation per lane LMODE-following
Z Z Z VAND vd[i] ← TAND(vs1[i], vs2[i]) yes
P Z Z VOR vd[i] ← TOR(vs1[i], vs2[i]) yes
Z P Z VIMPL vd[i] ← TIMPL(vs1[i], vs2[i]) yes
P P Z VNOT vd[i] ← TNOT(vs1[i]) (rs2 ignored) yes
Z Z P VCONS vd[i] ← cons(vs1[i], vs2[i]) no (always Kleene)
P Z P VACONS vd[i] ← acons(vs1[i], vs2[i]) no (always Kleene)

VAND/VOR/VIMPL/VNOT follow the current LMODE (§6) — switching LMODE to Łukasiewicz, Heyting, RM3, or B3 changes the result of all four lane-wise. VCONS/VACONS are arithmetic primitives (§6.3) and ignore LMODE, exactly as their scalar counterparts.

Other funct[0..2] patterns are reserved and raise EXC_ILLEGAL.

VSEL (opcode +18) — three-way merge with vector mask

VSEL vd, vs1, vs2, vm selects per-lane among vs1, zero, and vs2 according to the corresponding lane of mask vm:

vm[i] vd[i]
N vs1[i]
Z Z (zero)
P vs2[i]

The mask register index vm is encoded in funct[0..2]. This is the vector counterpart of scalar TSEL (§4 opcode −2): one instruction, three sources fused via a ternary mask. Combined with VCMP (which produces N/Z/P = </=/>), it gives a complete lane-wise compare-and-select pair in two instructions.

Why zero on vm[i] = Z rather than preserve-vd[i]. Zero‑on‑Z gives the user a trit-level “blank out” semantic for free: VCMP against a threshold and VSEL produces a sparse vector with explicit zeros where the predicate was indeterminate. A merge-on-Z variant (preserving vd[i]) is recoverable in two instructions if needed; the reverse is not.

VCMP (opcode +19)

Single mode: vd[i] ← sign(vs1[i] − vs2[i]), lane-wise. The result is a ternary mask in {N, Z, P} per lane — N where vs1[i] < vs2[i], Z where equal, P where greater. The same mask can be fed directly into VSEL without re-encoding.

VRED (opcode +20) — reduction to scalar GPR

VRED rd, vs1 reduces all 27 lanes of vs1 to a single 27-trit value written to GPR rd (note: rd is a GPR here, not a v-reg — VRED is the inverse of VBCAST in operand bank).

funct[0..2] Mnemonic Result in rd
Z Z Z VRED.SUM Σ vs1[i] for i=0..26, in [−27, +27], encoded as a T27 integer
P Z Z VRED.SIGN sign(Σ vs1[i]) ∈ {N, Z, P} (signed majority)
Z P Z VRED.CONS Kleene consensus across all 27 lanes: P if all are P or Z (and at least one P), N if all are N or Z (and at least one N), Z otherwise
P P Z VRED.LST trit-min across all 27 lanes (= scalar TMIN semantics — P iff every lane is P)
Z Z P VRED.MST trit-max across all 27 lanes (= scalar TMAX semantics — N iff every lane is N)
P Z P VRED.AND trit-fold using the current LMODE AND
Z P P VRED.OR trit-fold using the current LMODE OR

VRED.SIGN is the natural classifier output for ternary neural networks: a 27-weight × 27-input dot product reduces to one trit indicating “negative / neutral / positive” decision. VRED.SUM is a 5-trit-magnitude integer suitable for further scalar processing.

VPERM (opcode +21) — lane permutation

VPERM vd, vs1, vs2 rearranges the 27 lanes of vs1 into vd. The mode is selected by funct[0..2]:

funct[0..2] Mnemonic Operation
Z Z Z VROTL rotate left by val(vs2_LST) lanes (cyclic; vs2 read as small integer)
P Z Z VROTR rotate right by val(vs2_LST) lanes
Z P Z VSHL shift left by k lanes (vacated lanes filled with Z); k = lower 5 trits of vs2
P P Z VSHR shift right by k lanes (vacated lanes filled with Z)
Z Z P VREV reverse: vd[i] ← vs1[26−i] (vs2 ignored)
P Z P VSHUF arbitrary shuffle: vd[i] ← vs1[lane_index(vs2, i)] (lane index for each output position is read from vs2 — see below)

For VSHUF, the source lane for each of the 27 output lanes is encoded in three trits of vs2: output lane i takes its value from input lane val(vs2[3i..3i+2]) mod 27. This packs 27 lane indices (each 3 trits) exactly into a 27-trit word — a clean dense encoding with no waste.

16.5 VMOVE — inter-bank movement (opcode +22)

VMOVE is the only vector opcode that crosses the GPR / v-reg boundary. The mode is encoded in funct[0..2]; lane index, when relevant, lives in funct[3..5] (3 trits, range 0..26).

funct[0..2] Mnemonic rd bank rs1 bank Operation
Z Z Z VMOV.GV v-reg GPR v[rd] ← r[rs1] (whole 27-trit word copy)
P Z Z VMOV.VG GPR v-reg r[rd] ← v[rs1] (whole word)
Z P Z VMOV.VV v-reg v-reg v[rd] ← v[rs1] (vector–vector copy)
P P Z VBCAST v-reg GPR v[rd][i] ← r[rs1][LST] for all i — splat the LST of r[rs1] to every lane
Z Z P VINS v-reg GPR v[rd][k] ← r[rs1][LST]; other lanes preserved; k = val(funct[3..5])
P Z P VEXT GPR v-reg r[rd] ← TSIGN-extend(v[rs1][k]); k = val(funct[3..5]) (result is N, Z, or P in a T27 word)

For VINS/VEXT, lane index k outside [0, 26] raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5).

VMOV.VV does not introduce an opcode of its own — software can also realize a vector-to-vector copy via VOR vd, vs1, vs1 or VADD vd, vs1, v0 (assuming the v0 = 0 convention), but VMOV.VV is provided as a clearer mnemonic; assemblers may emit either form.

Why VBCAST splats only the LST trit of the source GPR. The natural alternative — copying all 27 trits of the GPR to the v-reg — is already covered by VMOV.GV. VBCAST exists for the distinct case “I have one trit (typically a sign or flag) and want to fill all 27 lanes with it” — a pattern recurring in masking and in initializing constant vectors. The two operations are kept distinct because their use cases are.

16.6 Context save/restore

The vector bank is part of the architectural state. A context switch that wishes to preserve user-mode vector code must save and restore all 27 v-regs (27 × 27 = 729 trits = 27 words).

Because vector use is opt-in, kernels are encouraged to implement lazy save: track per-thread a “vector-touched” flag, and skip the save/restore if the thread has not executed any vector instruction since its last entry to the kernel. The ISA does not provide hardware for this — it is a kernel-level optimization.

A reference save sequence:

   ; sp points at the top of a 27-word save area
   VMOV.VG  t0, v0     ; v0 → r5 (t0)
   STORE    t0, sp, 0
   VMOV.VG  t0, v1
   STORE    t0, sp, 1
   ; … 25 more pairs …
   VMOV.VG  t0, v26
   STORE    t0, sp, 26

Restore is the symmetric sequence with VMOV.GV. A future revision may introduce a fused VLD / VST block move (gather/scatter, §15) to compress this loop.

16.7 Ternary advantages

The vector extension is not a mechanical port of binary SIMD. Five primitives have no clean equivalent in a binary ISA:

  1. Three-state mask. A v-reg of trits doubles as a predicate where each lane carries one of three meanings — N (one branch), Z (zero / inactive), P (other branch). Binary SIMD encodes the third state through a separate “zeroing-vs-merging” bit (AVX-512 style); on Setnex it is intrinsic to the data.

  2. VCMP is trichotomic in one shot. Lane-wise sign-of-difference produces <, =, > simultaneously in three distinct mask values. Binary SIMD needs two compares (one for <, one for =) to recover the same trichotomy.

  3. Kleene consensus reduction (VRED.CONS). A native three-valued voter across 27 lanes — agree-positive, agree-negative, or disagree — in one instruction. The closest binary analog is a popcount majority + sign extraction, two to three instructions and a register.

  4. Signed-sum reduction (VRED.SIGN). The signed sum of 27 ternary trits gives the sign of a dot product directly. Equivalent binary code requires two popcounts (one for +1s, one for −1s) plus a subtraction. This is the key kernel for BitNet 1.58-bit inference (§16.8).

  5. LMODE-aware logic (VLOG, VRED.AND/OR). 27 lanes of Łukasiewicz / Heyting / Bochvar / RM3 logic in one cycle. No binary ISA can express non-classical three-valued logic without a lookup-table emulation costing tens of cycles per evaluation.

Combined, these primitives make Setnex’s vector unit a natural target for: ternary neural networks (BitNet, ternary weight networks), three-valued logic SAT solvers, ternary cellular automata, and any code where a third value (unknown, null, inactive) is first-class data rather than an exception.

16.8 Worked example: BitNet 1.58-bit dot product

A BitNet-style layer multiplies a vector of activations a[0..N−1] (each in {N, Z, P}) by a weight matrix W whose rows are also in {N, Z, P}, producing one output trit per row via a sign reduction.

For one row of 27 weights and 27 activations, the inner kernel is:

   ; assume:
   ;   v1 ← row of 27 weights      (loaded via VMOV.GV from a GPR holding the row word)
   ;   v2 ← 27 input activations   (loaded similarly)
   ;
   VMUL    v3, v1, v2           ; v3[i] ← w[i] × a[i] ∈ {N, Z, P}, lane-wise
   VRED    a0, v3, funct=SIGN   ; a0 ← sign(Σ v3[i]) ∈ {N, Z, P}
                                ;   N = output −1, Z = output 0, P = output +1

Two instructions for a 27-element ternary dot-product reduced to a single output trit. The same kernel on a 64-bit binary CPU needs two popcounts (one over the +1 mask, one over the −1 mask), a subtraction, and a sign extraction — minimum 5–6 instructions plus the unpacking of the packed-2-bits-per-weight format that binary SIMD requires to handle ternary values at all.

Scaling: a layer with 256 rows and 256 inputs is 256 × 10 such 27-wide kernels (= 2560 vector instructions for the multiplication phase). The output trits are then re-packed via VINS into output v-regs for the next layer.

Why the third state matters. A weight of Z in BitNet means “this connection contributes nothing” — sparsity is encoded in the value, with no separate sparse-index bookkeeping. VMUL propagates the zero through naturally (Z × anything = Z), and VRED.SIGN ignores it. Setnex thereby supports dense storage of sparse weights with no overhead.


Setnex ISA v0.8 — Reference specification Setnex project / Terias — Eric Tellier