Setnex ISA — Specification v0.8¶
Copyright 2026 Eric Tellier (Terias)
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this specification except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, this specification is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
In accordance with Section 3 of the Apache License 2.0, any implementation of this specification is granted a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, use, sell, and distribute implementations that comply with this specification.
Balanced ternary instruction set architecture, inspired by RISC-V. Balanced ternary {−1, 0, +1}, 27-trit word, 27 registers, fixed-length instructions.
Changelog from v0.7¶
| # | Change | Rationale |
|---|---|---|
| 1 | Vector extension added — dedicated 27-register vector bank v0..v26, trit-parallel datapath (each lane = 1 trit, 27 lanes per word). Eight new opcodes (+15..+22): VADD, VMUL, VLOG, VSEL, VCMP, VRED, VPERM, VMOVE. See §16. |
Amortizes fetch/decode/control over 27 simultaneous trit operations and exposes ternary-native primitives (3-way mask, ternary compare, Kleene consensus reduction) that have no clean binary equivalent. Ternary ML at 1.58-bit (BitNet) maps natively. |
| 2 | New register bank: 27 vector registers v0..v26, each 27 trits wide. Indexed using the 3-trit register fields of the existing R format — vector opcodes interpret rs1/rs2/rd as v-reg indices instead of GPRs. VMOVE is the only opcode that crosses banks. |
Keeps the format set unchanged (still R, I, J, U, B). Per-opcode bank dispatch matches RVV practice; no instruction-side mode bit needed. |
| 3 | Logic ops merged into VLOG: TAND/TOR/TNOT/TIMPL (LMODE-following) and CONS/ACONS (LMODE-bypass) all share opcode +17 with sub-mode in funct[0..2] |
Same datapath, same operand shape — fusing into one opcode preserves opcode budget for v0.9 (FMA, gather/scatter, tryte/trybble lanes). |
| 4 | Permutation and shift merged into VPERM (opcode +21) — modes: rotate, shift, reverse, shuffle |
Same lane-permutation network, only the control signal differs. Single opcode covers all. |
| 5 | Inter-bank movement merged into VMOVE (opcode +22) — modes: GPR↔v-reg whole-word copy, v-reg↔v-reg copy, single-trit broadcast/insert/extract |
Bridges the two register files without burning three separate opcodes; lane index when relevant lives in funct[3..5]. |
| 6 | Two opcodes (+23, +24) left reserved for v0.9 | Earmarked for fused multiply-accumulate (VFMA) and gather/scatter (VGATHER/VSCATTER), or tryte/trybble lane modes if explored next. |
| 7 | Architectural summary updated — opcode count 52 → 60; register count 27 GPR → 27 GPR + 27 v-reg | Vector bank is part of the architectural state visible to a context switch; see §16.6 for the save/restore contract. |
| 8 | No change to scalar ISA, CSRs, exception model, MPU, or interrupt controller | v0.8 is purely additive over v0.7. Existing binaries run unmodified; vector code is opt-in. |
Changelog from v0.6¶
| # | Change | Rationale |
|---|---|---|
| 1 | Memory Protection Unit (MPU) added — 9 indirect-access regions, NAPO3 size encoding, ternary per-axis permissions ({N = none, Z = kernel only, P = user + kernel}) for R/W/X | Turns the §2.3 convention “negative = kernel, positive = user” into a hardware-enforced rule. Prerequisite for running untrusted user code without silent kernel corruption. See §9. |
| 2 | Asynchronous interrupt controller added — 9 IRQ lines, three CSRs (IPENDING, IENABLE, IPRIORITY), dispatch through the existing exception machinery (shared EVEC, frame-2 bank for nested cases) |
Provides external-event delivery (timer, UART, etc.) independent of the instruction stream. Makes preemptive scheduling possible. See §10. |
| 3 | Six new CSRs at addresses −1..−6: MPU_SELECT, MPU_BASE, MPU_CFG, IPENDING, IENABLE, IPRIORITY |
First use of the negative CSR address half. MPU uses indirect access (select-then-access) so region count is independent of CSR budget. |
| 4 | Three new EXC_PERM_R / EXC_PERM_W / EXC_PERM_X codes (−9, −8, −7) distinct from EXC_FAULT |
EXC_FAULT = address invalid (doesn’t exist); EXC_PERM_* = address exists but MPU denies. Three separate codes let the handler dispatch on access type without re-decoding the faulting instruction. Mirrors the RISC-V load/store/instruction page-fault split. |
| 5 | Nine new IRQ cause codes IRQ_0..IRQ_8 (+20..+28) |
Hardware selects the highest-priority pending-enabled IRQ line and writes the corresponding cause code at dispatch time. A single-CSR read tells the handler which line to service. |
| 6 | §2.3 address space: “convention” upgraded to “enforcement” — the MPU can make the user/kernel split physically binding | Documentation reflects that the split is no longer a soft contract. |
| 7 | Spec clarifications (no semantic change vs. v0.6 simulator): EXC_OVERFLOW reclassified “reserved, not raised” (no STATUS enable trit exists); TSHIFT explicitly produces 0 for |val(rs2)| ≥ 27; ECALL imm17[1..16] ignored by the decoder (not “must be Z”); Tekum anchor |r| > 7 decodes to NaR; IPENDING/IENABLE write-N is a no-op |
Align the spec text with observed v0.6 simulator behavior; close under-specified edges surfaced during the v0.7 review. |
| 8 | MPU_CFG size range narrowed from [0, 27] to [0, 26] |
n = 27 had no well-defined base under the signed NAPO3 model; kernel-mode no-match default already covers the “everything allowed” case. |
| 9 | §10.5 handler boilerplate rewritten to select the correct bank via STATUS.depth before reading ECAUSE / ECAUSE2; §10.2 Pending/Enable semantics extended with explicit level-vs-edge rule |
v0.7 first-draft example silently mis-dispatched on nested entry; edge-triggered peripherals had no documented path to coexist with the level-sensitive sampler. |
Changelog from v0.5¶
| # | Change | Rationale |
|---|---|---|
| 1 | Nested exceptions: a second bank of exception CSRs (EPC2, ESAVE2, ECAUSE2, ETVAL2) added; STATUS.depth (trit t[3]) tracks active frame count |
v0.5 machine-check reset on any synchronous fault in kernel mode was too brutal — forced a fault-free kernel. One level of nesting is now tolerated; only a third fault (while depth = N) triggers reset. |
| 2 | ECALL imm17[0] repurposed as flavor tag: Z = user syscall, P = hypercall, N = debug trap |
Subdivides the EXC_ECALL cause without consuming a new opcode. Backward compatible with v0.5 binaries (whose imm17 = Z → user syscall, same as before). |
| 3 | EXC_ECALL split into EXC_ECALL_U = 0, EXC_ECALL_H = +1, EXC_ECALL_D = +2; all three share a single EVEC |
Lets the handler dispatch on ECAUSE alone without decoding imm17. Positive codes preserve the convention “negative = fault, positive = deliberate synchronous trap”. EXC_ECALL_U = 0 preserves v0.5 binary compatibility. |
| 4 | Triple-fault condition (fault while depth = N) triggers machine-check reset; no ECAUSE code allocated |
Condition is not observable by any handler, so reserving a cause code would be dead weight. Documented by name in §8.2. |
Changelog from v0.4¶
| # | Change | Rationale |
|---|---|---|
| 1 | r17 repurposed: s2 (callee-saved) → a7 (argument register / syscall number) |
Reserve a dedicated register for the syscall number, following the RISC-V convention; lets the handler dispatch without re-fetching the faulting instruction |
| 2 | Saved register range narrowed: s0–s10 (11 regs) → s0–s9 (10 regs) |
r18–r25 renumbered from s3–s10 to s2–s9 so the saved range is contiguous with no gap |
| 3 | ECALL clarified: imm17 reserved (must be Z); syscall number passed in a7; ECAUSE ← EXC_ECALL (= 0) on entry |
v0.4 was ambiguous on whether imm17 overwrote ECAUSE; register-based dispatch avoids a re-decode of the instruction at EPC |
| 4 | New §8.4: Syscall dispatch convention | Documents the handler-side contract and the EPC + 1 epilogue pattern |
| 5 | New §11.5: Syscall calling convention | ABI-level wrapper/caller contract, distinct from the standard function call convention of §11.1 |
| 6 | §11.3 prologue/epilogue rewritten: allocate frame first, then save ra and the old s0; mirrored epilogue |
v0.4’s prologue overwrote s0 without saving it, and left ra stored outside the allocated stack region between the STORE and the final sp decrement |
| 7 | §11.3: stack alignment stated explicitly (sp is word-aligned) |
Closes an under-specified point of the v0.4 ABI; trivially satisfied by word-addressing |
| 8 | New CSR ETVAL at address 9 (MST-first +00); new §8.5 |
Symmetric with the ECALL/a7 clarification: EXC_FAULT, EXC_ALIGN and EXC_ILLEGAL now have a documented data channel (faulting address / raw instruction) instead of forcing the handler to re-decode EPC |
| 9 | §8.2 extended: synchronous fault while STATUS.mode = N → machine-check reset |
v0.4 had a single EPC/ESAVE save slot; nested synchronous exceptions would silently corrupt it. Proper nested handling deferred to v0.6 |
| 10 | §8.1 exception table gains an ETVAL contents column |
Makes per-exception data channel explicit |
| 11 | funct[0] mode selector on ADD/SUB inverted: N = saturating (was P), P = with-carry (was N) | Rationale: N evokes clamping/constraint, P evokes additive chaining — more natural ternary mnemonic. |
Changelog from v0.3¶
| # | Change | Rationale |
|---|---|---|
| 1 | Overflow flag removed from FLAGS | Overflow is carried by FLAGS.carry (P = overflow, N = underflow); a separate overflow trit was redundant |
| 2 | TFP extension: FADD, FSUB, FMUL, FDIV, FCMP, FCVT (opcodes +8 to +13) | Native ternary floating-point using Tekum T26F format; 6 of 7 reserved TFP opcodes allocated |
| 3 | T26F format: 26-trit Tekum in 27-trit register, t[26] = Z | Even-width Tekum requirement; compatible with NEG, TABS, LOAD/STORE |
| 4 | New §7: Ternary floating-point format specification | Tekum anchor, regime, exponent, fraction, special values, properties |
| 5 | Opcodes used: 46 → 52 | 6 TFP instructions added |
Changelog from v0.2¶
| # | Change | Rationale |
|---|---|---|
| 1 | ADC (funct[0]=P on ADD) and SBC (funct[0]=P on SUB) added | Multi-precision carry chain with ternary carry {N,Z,P}; funct[0] is now a 3-way mode selector: Z=normal, N=saturating, P=with-carry |
| 2 | TSEL added (opcode −2, R format with rp in funct[0..2]) | 3-way conditional select on FLAGS.sign — the defining ternary-native instruction |
| 3 | BF added (opcode −10, J format, rs1 field = condition mask) | Trit-masked branch on FLAGS.sign; 6 comparison branches in 1 opcode |
| 4 | BRT3 added (opcode −21, new format B) | Ternary three-way branch on LST(rX): P=fall-through, Z=off_z, N=off_n |
| 5 | Format B added: 4t opcode + 3t rX + 10t off_z + 10t off_n | New format for BRT3; two 10-trit offsets (±29 524) |
| 6 | Instruction formats: 4 → 5 (R, I, J, U, B) | Accommodates BRT3 |
| 7 | Opcodes used: 43 → 46 (TSEL, BF, BRT3; ADC/SBC via funct) | Was 43 in v0.2 |
1. Notation and conventions¶
| Symbol | Meaning |
|---|---|
t |
trit ∈ {−1, 0, +1}, written N, Z, P |
T[n] |
n-trit word (value in [−(3ⁿ−1)/2, +(3ⁿ−1)/2]) |
tryte |
27-trit word (basic memory and register unit) |
val(x) |
balanced integer value of a ternary word |
enc(n, w) |
balanced ternary encoding of integer n on w trits |
sign(x) |
N if x < 0, Z if x = 0, P if x > 0 |
t[i] |
trit i of a word (i=0 = least significant trit) |
rd |
destination register |
rs1, rs2 |
source registers |
imm |
signed immediate (balanced ternary) |
LST |
Least Significant Trit (t[0]) |
MST |
Most Significant Trit (t[w−1] for a w-trit word) |
zero-extend(x) |
Extend a narrow field to 27 trits by filling higher trits with Z (preserves balanced value, since Z = 0) |
Trits are numbered from 0 (LST) to 26 (MST) within a tryte.
1.1 Textual representation¶
Trit glyphs: - for N (−1), 0 for Z (0), + for P (+1).
Two conventions coexist in this document:
-
MST-first (human-readable, used in register encoding examples and inline notation): most significant trit is written leftmost, as in ordinary number notation. Example:
++0-+means t[4]=P, t[3]=P, t[2]=Z, t[1]=N, t[0]=P → val = 81 + 27 + 0 − 3 + 1 = 106. -
LST-first (memory layout, used in instruction encoding diagrams): t[0] is written leftmost. This matches the physical trit ordering in memory and in instruction format diagrams.
Each convention is explicitly labeled where it appears. When unmarked, MST-first is assumed.
1.2 Integer value¶
The integer value of a w-trit field starting at t[i] is: Σ trit[i+k] × 3^k for k=0..w−1.
Words are stored in memory least significant trit first (little-endian).
2. Programming model¶
2.1 General-purpose registers¶
27 T27 registers, named r0–r26, encoded on 3 trits (address ∈ [−13, +13]).
| Register | ABI name | Conventional role |
|---|---|---|
r0 |
zero |
Always 0 (read-only by convention) |
r1 |
ra |
Return address |
r2 |
sp |
Stack pointer (grows toward negative addresses) |
r3 |
gp |
Global pointer |
r4 |
tp |
Thread pointer |
r5–r7 |
t0–t2 |
Temporaries (caller-saved) |
r8–r9 |
s0–s1 |
Saved registers (callee-saved), s0 = frame pointer |
r10–r16 |
a0–a6 |
Arguments / return values |
r17 |
a7 |
Argument register / syscall number (see §8.4, §11.5) |
r18–r25 |
s2–s9 |
Saved registers (callee-saved) |
r26 |
t3 |
Extra temporary |
Register addresses use balanced ternary encoding naturally (shown MST-first):
r0 = 0 → 000, r1 = 1 → 00+, r2 = 2 → 0+-, r3 = 3 → 0+0, …, r13 = 13 → +++, r14 = −13 → ---, …, r26 = −1 → 00-.
Note: the 27 register indices 0–26 are mapped to the 27 balanced ternary T3 values −13 to +13. The mapping is: register rN has address enc(N, 3) for N ∈ {0..13}, and enc(N−27, 3) for N ∈ {14..26}. This wraps naturally in the balanced ternary range.
2.1.1 Vector registers (new in v0.8)¶
A separate bank of 27 vector registers v0–v26, each a full 27-trit word, is introduced for the vector extension (§16). Vector registers are addressed using the same 3-trit encoding as GPRs (one of the 27 balanced ternary T3 values −13..+13); the bank that is read or written is determined by opcode, not by the register field itself.
| Opcode range | Bank used for rd / rs1 / rs2 |
|---|---|
| Scalar opcodes (−40..+14) | GPR bank r0..r26 |
| Vector opcodes (+15..+21) | v-reg bank v0..v26 |
VMOVE (+22) |
Mixed — direction encoded in funct[0..2]; see §16.5 |
There is no architectural alias vzero distinct from v0; software conventionally keeps v0 cleared and treats it as the zero vector when needed (analogous to the r0/zero convention, but not hardware-enforced — v0 is a writable register).
The vector bank is part of the architectural state and must be saved/restored across context switches that wish to preserve user-mode vector code; see §16.6 for the recommended save/restore contract. There is no v-reg variant of r0’s read-as-zero contract.
2.2 Control and status registers (CSR)¶
CSR addresses are T3 (3 trits), providing 27 addressable slots (values −13 to +13) — a symmetric match with the 27 GPRs. Each CSR is a full T27 word. Unused slots are reserved and read as zero.
| Address (T3, MST-first) | Decimal | Name | Description |
|---|---|---|---|
00+ |
1 | PC |
Program counter (T27, word address) |
0+- |
2 | LMODE |
Logic mode (see §6) |
0+0 |
3 | FLAGS |
Arithmetic flags (see §5.5) |
0++ |
4 | EPC |
Exception program counter |
+-- |
5 | ECAUSE |
Exception cause (T27) |
+-0 |
6 | EVEC |
Exception vector (handler address) |
+-+ |
7 | STATUS |
Processor status (see §2.4) |
+0- |
8 | ESAVE |
Saved STATUS on exception entry (new in v0.2) |
+00 |
9 | ETVAL |
Exception trap value (see §8.5) — new in v0.5 |
+0+ |
10 | EPC2 |
Frame-2 exception PC (nested exception; see §8.2) — new in v0.6 |
++- |
11 | ECAUSE2 |
Frame-2 exception cause — new in v0.6 |
++0 |
12 | ESAVE2 |
Frame-2 saved STATUS — new in v0.6 |
+++ |
13 | ETVAL2 |
Frame-2 exception trap value — new in v0.6 |
00- |
−1 | MPU_SELECT |
MPU region index selector (new in v0.7, see §9) |
0-+ |
−2 | MPU_BASE |
Base address of the MPU-selected region (new in v0.7) |
0-0 |
−3 | MPU_CFG |
Config (size + permissions + valid) of the selected region (new in v0.7) |
0-- |
−4 | IPENDING |
Pending-IRQ bitvector (new in v0.7, see §10) |
-++ |
−5 | IENABLE |
Per-line IRQ enable mask (new in v0.7) |
-+0 |
−6 | IPRIORITY |
Per-line IRQ priority (new in v0.7) |
| others | — | — | Reserved (read as zero) |
The four frame-2 CSRs (EPC2, ECAUSE2, ESAVE2, ETVAL2) are the depth-2 counterparts of EPC, ECAUSE, ESAVE, ETVAL. They are written by the processor only when a synchronous fault occurs while the outer handler is still running (STATUS.depth = P); see §8.2 for the full entry rules. EVEC and STATUS are not duplicated — both frames share the single exception vector, and STATUS.depth tracks which bank is the active save target.
The six v0.7 CSRs occupy the first negative half of the CSR address space. The MPU CSRs (MPU_SELECT, MPU_BASE, MPU_CFG) provide indirect access to the 9-region descriptor bank: write a region index to MPU_SELECT, then read or write MPU_BASE and MPU_CFG to manipulate that region (see §9.3). The three interrupt CSRs are direct (one word each). All six CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5). The pre-existing system CSRs (EPC, ECAUSE, EVEC, STATUS, and the frame-2 bank) follow the same kernel-only rule.
At reset, all CSRs are initialized to zero.
2.3 Address space¶
- Addressing unit: 27-trit word
- Address space: T27 → val ∈ [−(3²⁷−1)/2, +(3²⁷−1)/2] ≈ ±3.6 × 10¹²
- PC is incremented by 1 (one word) after each instruction
- Negative addresses: stack and kernel space
- Positive addresses: user code and data
The user/kernel split is a convention at the ISA level but can be made physically binding by the MPU (§9): the kernel programs regions with perm = Z (kernel-only) or P (user+kernel) as appropriate, and user-mode accesses outside any user-permitted region raise EXC_PERM_* (§8.1). Prior to v0.7 this separation was soft — any mode could reach any address.
2.4 STATUS register structure¶
| Trit | Name | Values |
|---|---|---|
| t[0] | mode |
N = kernel, Z = reserved, P = user |
| t[1] | ie |
N = interrupts masked, Z = reserved, P = interrupts enabled |
| t[2] | lx |
Logic extension for LMODE=N: N = Heyting, Z = standard Łukasiewicz, P = RM3 (see §6) |
| t[3] | depth |
Exception-frame depth (new in v0.6): Z = 0 frames active, P = 1 frame active (main bank holds outer context), N = 2 frames active (main bank + frame 2) |
| t[4..26] | — | Reserved (must be Z) |
At reset, STATUS = N → kernel mode, interrupts masked, LMODE=N submode = Łukasiewicz, depth = Z (no frames active).
Rationale — why a trit for depth. One balanced trit holds exactly the three states the nested-exception machinery needs: no frame, one frame, two frames. A fourth state (triple fault) would be out of range anyway — the hardware treats it as a machine-check reset (§8.2) rather than a representable depth. Using
depth = Nas the “fully nested” state preserves the monotonic Z→P→N progression as exceptions stack up.
3. Instruction encoding¶
Every instruction is a fixed-length 27-trit word.
3.1 Opcode field¶
The 4 least significant trits (t[0]–t[3]) form the primary opcode. 3⁴ = 81 possible combinations — ample opcode space.
Placing the opcode at t[0]–t[3] allows the decoder to begin working as soon as the first trits of the word arrive, without waiting for the full word.
3.2 Instruction formats¶
Five formats. The format is determined solely by the opcode; the decoder does not inspect other fields to determine the format.
R format (register–register)
t: [0-3] [4-6] [7-9] [10-12] [13-26]
opcode rd rs1 rs2 funct (14 trits)
4 trits 3 trits 3 trits 3 trits 14 trits
I format (immediate)
t: [0-3] [4-6] [7-9] [10-26]
opcode rd rs1 imm17
4 trits 3 trits 3 trits 17 trits (imm ∈ [−(3¹⁷−1)/2, +(3¹⁷−1)/2] ≈ ±64 million)
J format (conditional branch)
t: [0-3] [4-6] [7-26]
opcode rs1 offset20
4 trits 3 trits 20 trits (offset ∈ [−(3²⁰−1)/2, +(3²⁰−1)/2] ≈ ±1.7 billion)
U format (unconditional jump)
t: [0-3] [4-26]
opcode offset23
4 trits 23 trits (offset ∈ [−(3²³−1)/2, +(3²³−1)/2] ≈ ±4.7 × 10¹⁰)
B format (ternary three-way branch) — new in v0.3
t: [0-3] [4-6] [7-16] [17-26]
opcode rX off_z off_n
4 trits 3 trits 10 trits 10 trits (offsets ∈ [−(3¹⁰−1)/2, +(3¹⁰−1)/2] ≈ ±29 524)
Fields are laid out from least significant (t[0]) to most significant (t[26]).
Rationale for format U: In v0.1, JMP and CALL used J format, wasting the rs1 field (3 trits) that they do not need. Format U merges those trits into the offset, multiplying jump range by 27 at no cost. Conditional branches retain J format because they require rs1 for the test register.
Rationale for format B (new in v0.3): A ternary three-way branch needs two offsets (for Z and N outcomes; P falls through). The 23 trits after opcode+rX are split evenly into two 10-trit offset fields, each with a range of ±29 524. This is the most ternary-native branch format: one instruction, three outcomes, zero wasted trits.
The 14-trit funct field in R format allows 3¹⁴ ≈ 4.8 million variants per opcode — only a few trits are used so far (mode selectors on ADD/SUB, FCVT, TSET; register field on TSEL), leaving the rest for future extensions. Funct sub-fields are specified per-instruction.
4. Instruction set¶
4.1 Opcode table (4 trits = value from −40 to +40)¶
ALU group — R format (opcode −40 to −27)¶
| Opcode (val) | Mnemonic | funct | Operation |
|---|---|---|---|
| −40 | ADD |
funct[0]=Z |
rd ← rs1 + rs2 (balanced arithmetic) |
| −40 | ADDS |
funct[0]=N |
rd ← rs1 + rs2 (saturating: clamps to T27 range) |
| −40 | ADC |
funct[0]=P |
rd ← rs1 + rs2 + FLAGS.carry (new in v0.3) |
| −39 | SUB |
funct[0]=Z |
rd ← rs1 − rs2 |
| −39 | SUBS |
funct[0]=N |
rd ← rs1 − rs2 (saturating) |
| −39 | SBC |
funct[0]=P |
rd ← rs1 − rs2 − FLAGS.carry (new in v0.3) |
| −38 | MUL |
funct[0]=Z |
rd ← low 27 trits of rs1 × rs2 |
| −38 | MULH |
funct[0]=P |
rd ← high 27 trits of rs1 × rs2 (54-trit product) |
| −37 | DIV |
Z | rd ← rs1 ÷ rs2 (symmetric Euclidean, see §5.4) |
| −36 | MOD |
Z | rd ← rs1 mod rs2 (symmetric remainder, see §5.4) |
| −35 | NEG |
Z | rd ← −rs1 (trit-by-trit inversion: P↔N, Z→Z) |
| −34 | TAND |
Z | rd ← rs1 AND rs2 (per LMODE, see §6) |
| −33 | TOR |
Z | rd ← rs1 OR rs2 (per LMODE) |
| −32 | TNOT |
Z | rd ← NOT rs1 (per LMODE) |
| −31 | TIMPL |
Z | rd ← rs1 IMPL rs2 (per LMODE) |
| −30 | CONS |
Z | rd ← consensus(rs1, rs2) — always Kleene |
| −29 | ACONS |
Z | rd ← anti-consensus(rs1, rs2) — always Kleene |
| −28 | TSHIFT |
Z | rd ← rs1 shifted by val(rs2) trits (left if >0, right if <0; vacated trits filled with Z). val(rs2) uses the full T27 range, no masking; shifts with |
| −27 | TCMP |
Z | rd ← trit-by-trit comparison: rd[i] = sign(rs1[i] − rs2[i]) |
Consensus: trit-by-trit,
cons(a,b) = a if a==b, else Z. Anti-consensus: trit-by-trit,acons(a,b) = Z if a==b, else the absent trit:acons(N,Z) = acons(Z,N) = P—acons(Z,P) = acons(P,Z) = N—acons(N,P) = acons(P,N) = Z. CONS and ACONS are dual operations: CONS extracts agreement, ACONS extracts the absent trit. Both ignore LMODE — they are arithmetic primitives, not logic operations.TCMP is the trit-by-trit spaceship operator. For each trit position i:
rd[i] = Nif rs1[i] < rs2[i],Zif rs1[i] = rs2[i],Pif rs1[i] > rs2[i]. TCMP complements CONS/ACONS: CONS extracts shared values, TCMP extracts the ordering relation. Together, these three form a complete trit-comparison toolkit.Saturating arithmetic (ADDS, SUBS): the result is clamped to [−(3²⁷−1)/2, +(3²⁷−1)/2] instead of wrapping. FLAGS.carry is set to Z (no overflow) regardless, since overflow is absorbed. FLAGS.sign reflects the clamped result.
Carry-chain arithmetic (ADC, SBC — new in v0.3): the value of FLAGS.carry from the previous ALU operation is added to (ADC) or subtracted from (SBC) the result. This enables multi-precision arithmetic. The balanced ternary carry {N, Z, P} is richer than the binary carry {0, 1} — one ADC propagates 3 values natively. Sequence for 54-trit addition:
ADD lo, a_lo, b_lothenADC hi, a_hi, b_hi.funct[0] on ADD/SUB is a 3-way mode selector: Z = normal (ADD/SUB), N = saturating (ADDS/SUBS), P = with carry (ADC/SBC). This is itself a ternary exploitation — one trit selects among 3 modes.
Note on funct indexing:
funct[i]denotes the i-th trit within the funct field (local index, 0 = LST of funct). In absolute instruction-word terms,funct[0]lives at t[13] (the trit immediately after rs2). Using the LST-end of funct for mode selectors keeps the decoder logic close to the opcode and rs2 decoder.
Memory group — I format (opcode −26 to −18)¶
| Opcode (val) | Mnemonic | Operation |
|---|---|---|
| −26 | LOAD |
rd ← Mem[rs1 + imm17] |
| −25 | STORE |
Mem[rs1 + imm17] ← rd (rd field used as source) |
| −24 | LI |
rd ← zero-extend(imm17) to 27 trits |
| −23 | LUI |
rd ← imm17 << 10 (load upper immediate, low 10 trits set to Z) |
| −22 | ADDI |
rd ← rs1 + zero-extend(imm17) |
| −21 | BRT3 |
B |
| −20 to −19 | — | reserved |
| −18 | CMPI |
FLAGS ← compare(rs1, zero-extend(imm17)) — see §5.5 |
Branch group — J format (opcode −17 to −10) and U format (opcode −9 to −8)¶
| Opcode (val) | Mnemonic | Format | Condition | Semantics |
|---|---|---|---|---|
| −17 | BEQ |
J | rs1 == 0 | PC ← PC + offset20 |
| −16 | BNE |
J | rs1 ≠ 0 | PC ← PC + offset20 |
| −15 | BLT |
J | rs1 < 0 | PC ← PC + offset20 |
| −14 | BGT |
J | rs1 > 0 | PC ← PC + offset20 |
| −13 | BLE |
J | rs1 ≤ 0 | PC ← PC + offset20 |
| −12 | BGE |
J | rs1 ≥ 0 | PC ← PC + offset20 |
| −11 | JMPA |
J | unconditional | PC ← rs1 + offset20 |
| −10 | BF |
J | FLAGS.sign matches mask | Trit-masked branch on FLAGS (new in v0.3, see below) |
| −9 | JMP |
U | unconditional | PC ← PC + offset23 |
| −8 | CALL |
U | unconditional | ra ← PC + 1 ; PC ← PC + offset23 |
Branch instructions (BEQ–BGE) use rs1 (field [4-6]) as the register to test, and the offset is relative to the current PC (before increment). BLT/BGT/BLE/BGE compare val(rs1) to 0.
JMPA retains J format because it needs rs1 as the base address register. JMP and CALL use U format for maximum jump range (23 trits ≈ ±4.7 × 10¹⁰ words).
CSR and system group — I format (opcode −7 to −4)¶
| Opcode (val) | Mnemonic | Operation |
|---|---|---|
| −7 | CSRR |
rd ← CSR[imm17] |
| −6 | CSRW |
CSR[imm17] ← rs1 |
| −5 | CSRX |
rd ← CSR[imm17] ; CSR[imm17] ← rs1 (atomic read-then-write) |
| −4 | ECALL |
Synchronous trap: ECAUSE ← EXC_ECALL_U/H/D based on imm17[0] (Z=user syscall=0, P=hypercall=+1, N=debug trap=+2); call number is read by the handler from a7 (r17). imm17[1..16] are reserved and ignored by the decoder — assemblers emit Z for forward compatibility (see §8.4). |
Special group — R format (opcode −3 to +4)¶
| Opcode (val) | Mnemonic | Operation |
|---|---|---|
| −3 | IRET |
Atomic exception return: PC ← EPC ; STATUS ← ESAVE (new in v0.2) |
| −2 | TSEL |
R |
| −1 | NOP |
No operation |
| 0 | HALT |
Halt processor |
| +1 | TGET |
rd ← trit t[val(rs2)] of rs1 (result is N, Z, or P in a T27 word) |
| +2 | TSET |
rd ← rs1 with trit t[val(rs2)] set to value encoded in funct[0] (see below) |
| +3 | TSIGN |
rd ← sign(rs1) : N, Z, or P (as T27: −1, 0, or +1) |
| +4 | CMP |
FLAGS ← compare(rs1, rs2) — see §5.5 |
TSET encoding (opcode +2): the value to insert is determined by funct[0]:
| funct[0] | Assembler mnemonic | Inserted trit value |
|---|---|---|
| N | TSETN rd, rs1, rs2 |
N (−1) |
| Z | TSETZ rd, rs1, rs2 |
Z (0) |
| P | TSETP rd, rs1, rs2 |
P (+1) |
rs2provides the trit index (0–26); the value to write comes from funct, not from rs2. The bare mnemonicTSETis accepted by the assembler as an alias forTSETZ(clear a trit).
Absolute value and trit-reduce (opcode +5 to +7)¶
| Opcode (val) | Mnemonic | Operation |
|---|---|---|
| +5 | TABS |
rd ← |rs1| (absolute value) |
| +6 | TMIN |
rd ← minimum trit of rs1 (fold-min across all 27 trits; result is N, Z, or P as T27) |
| +7 | TMAX |
rd ← maximum trit of rs1 (fold-max across all 27 trits; result is N, Z, or P as T27) |
TMIN / TMAX are trit-reduce operations: they fold across all 27 trit positions and return the extremum as a single-trit value in a T27 register. -
TMIN(x) = Pif and only if all trits of x are P (all-P test). -TMAX(x) = Nif and only if all trits of x are N (all-N test). - After a subsumption check (TIMPL result, req, caps), the patternTMIN resultfollowed byBGT t0, grantedbranches if all 27 capabilities are satisfied — no constant needed.
New in v0.3: TSEL, BF, BRT3¶
TSEL — 3-way conditional select (opcode −2, R format)
TSEL rd, rn, rz, rp dispatches based on FLAGS.sign:
- FLAGS.sign = N → rd ← rn (rs1 field)
- FLAGS.sign = Z → rd ← rz (rs2 field)
- FLAGS.sign = P → rd ← rp (funct[0..2] field, register address)
This is the defining ternary-native data instruction. After a CMP, one instruction selects among three registers — binary requires 2 CMOVs or a branch. The R format accommodates 4 register fields: rd (destination), rs1=rn, rs2=rz, funct[0..2]=rp (3 trits at the LST end of funct, i.e. t[13..15] of the instruction word).
Example — clamp to range [lo, hi]:
CMP val, lo # FLAGS.sign: N if val < lo, Z if =, P if >
TSEL t0, lo, val, val # t0 = lo if below, val otherwise
CMP t0, hi # FLAGS.sign: N if t0 < hi, Z if =, P if >
TSEL result, t0, t0, hi # result = hi if above, t0 otherwise
BF — trit-masked branch on FLAGS (opcode −10, J format)
BF cond, offset20 branches if FLAGS.sign matches the condition mask encoded in the rs1 field [4-6]:
- t[4] = P → match if FLAGS.sign = N
- t[5] = P → match if FLAGS.sign = Z
- t[6] = P → match if FLAGS.sign = P
The branch is taken if any matching trit is set. This encodes all 6 standard comparison branches plus “always” in a single opcode:
| Assembler | rs1 mask | Condition |
|---|---|---|
BFLT |
P00 |
FLAGS.sign = N (less than) |
BFEQ |
0P0 |
FLAGS.sign = Z (equal) |
BFGT |
00P |
FLAGS.sign = P (greater than) |
BFLE |
PP0 |
FLAGS.sign = N or Z (less or equal) |
BFGE |
0PP |
FLAGS.sign = Z or P (greater or equal) |
BFNE |
P0P |
FLAGS.sign = N or P (not equal) |
The pattern CMP a, b ; BFLT label replaces SUB t0, a, b ; BLT t0, label, saving a register and an instruction. FLAGS are not modified by BF.
BRT3 — ternary three-way branch (opcode −21, B format)
BRT3 rX, off_z, off_n reads the least significant trit (LST) of register rX and dispatches:
- LST(rX) = P → fall through to PC + 1 (no branch penalty)
- LST(rX) = Z → PC ← PC + off_z
- LST(rX) = N → PC ← PC + off_n
Format B provides two 10-trit signed offsets (±29 524 each). The P-falls-through convention optimizes for the common case: loop bodies execute directly without a branch.
A while loop compiles to:
loop_start:
; evaluate condition → rX (P=true, Z=unknown, N=false)
BRT3 rX, loop_start, loop_exit
; fall-through (P) → loop body
...
JMP loop_start
loop_exit:
Two instructions of overhead; the body executes without any branch. Variants:
- while! (optimistic): set off_z = +1 → Z falls through with P.
- while? (pessimistic): set off_z = off_n → Z treated as N, exits loop.
FLAGS are not affected by BRT3.
TFP group — R format (opcode +8 to +13)¶
| Opcode (val) | Mnemonic | funct | Operation |
|---|---|---|---|
| +8 | FADD |
Z | rd ← rs1 +_f rs2 (T26F addition) |
| +9 | FSUB |
Z | rd ← rs1 −_f rs2 (T26F subtraction) |
| +10 | FMUL |
Z | rd ← rs1 ×_f rs2 (T26F multiplication) |
| +11 | FDIV |
Z | rd ← rs1 ÷_f rs2 (T26F division) |
| +12 | FCMP |
Z | FLAGS ← float comparison of rs1, rs2 (see §7) |
| +13 | FCVT |
see below | Integer ↔ T26F conversion (see §7) |
FCVT encoding (opcode +13): the conversion mode is selected by funct[0]:
| funct[0] | Assembler mnemonic | Operation |
|---|---|---|
| Z | FICVT rd, rs1 |
rd ← T26F(int(rs1)) — integer to float |
| P | FCVTI rd, rs1 |
rd ← int(T26F(rs1)), round to nearest |
| N | FCVTIZ rd, rs1 |
rd ← int(T26F(rs1)), round toward zero |
All TFP instructions use R format. rs2 is ignored for FCVT. The funct field is zero except for FCVT where funct[0] selects the conversion mode, following the same pattern as ADD/ADDS/ADC.
NaR propagation: if any input is NaR, the result is NaR. Additionally: ∞ − ∞ = NaR, 0 × ∞ = NaR, 0 ÷ 0 = NaR.
Division by zero: if rs2 = 0 and rs1 ≠ 0, FDIV produces ∞.
FLAGS after TFP operations (FADD, FSUB, FMUL, FDIV, FCMP) — unified scheme driven by the result’s class:
Result class signcarryNaR Z N ∞ (saturated) P P 0 (true zero) Z Z normal sign of result (N/P) Z For FCMP, the “result” is the ordered difference
rs1 −_f rs2: normal ⇒ trichotomy onsign; NaR ⇒ unordered (sign=Z, carry=N), distinct from true equality (sign=Z, carry=Z).Free operations — these existing instructions work correctly on T26F values: -
NEG(opcode −35): trit inversion = Tekum negation (Proposition 3: θ(−t) = −θ(t)) -TABS(opcode +5): absolute value preserved since t[26] = Z is invariant under abs -LOAD/STORE: move 27-trit words without interpretationSee §7 for the T26F format specification.
Vector group — R format (opcode +15 to +22) — new in v0.8¶
| Opcode (val) | Mnemonic | funct | Operation |
|---|---|---|---|
| +15 | VADD |
funct[0]=Z |
vd ← vs1 + vs2, lane-wise saturating to {N,Z,P} (see §16.3) |
| +15 | VSUB |
funct[0]=N |
vd ← vs1 − vs2, lane-wise saturating |
| +16 | VMUL |
Z | vd ← vs1 × vs2, lane-wise (closed in {N,Z,P} — no saturation needed) |
| +17 | VLOG |
see §16.4 | Lane-wise logic: TAND / TOR / TNOT / TIMPL (LMODE-following) or CONS / ACONS (LMODE-bypass) per funct[0..2] |
| +18 | VSEL |
rm in funct[0..2] |
vd[i] ← vs1[i] if vm[i]=N ; Z if vm[i]=Z ; vs2[i] if vm[i]=P (see §16.4) |
| +19 | VCMP |
Z | vd[i] ← sign(vs1[i] − vs2[i]) — produces a ternary mask in {N,Z,P} per lane |
| +20 | VRED |
see §16.4 | Reduction vs1 → scalar GPR rd: SUM / SIGN / CONS / LST / MST / AND / OR per funct[0..2] |
| +21 | VPERM |
see §16.4 | Lane permutation: rotate / shift / reverse / shuffle per funct[0..2] |
| +22 | VMOVE |
see §16.5 | Inter-bank and intra-bank movement: GPR↔v-reg whole-word, v-reg→v-reg, broadcast / insert / extract |
All vector opcodes use R format. With the exception of
VMOVE(§16.5) andVRED(whoserdis a GPR),rs1,rs2, andrdare all v-reg indices.No new format is introduced — vectors reuse R format unchanged. The bank is entirely determined by opcode value.
See §16 for the full vector specification (datapath model, lane semantics, ternary advantages, mask conventions, reduction rules, BitNet example).
Reserved extensions (opcode +23 to +40)¶
Opcodes +14 and +23 to +40 are reserved for future extensions: - +14 : reserved TFP (FSQRT, FMA…) - +23 to +24 : reserved Vector v0.9 (FMA, gather/scatter, tryte/trybble lane modes) - +25 to +40 : implementer-defined / custom
5. Balanced ternary arithmetic¶
5.1 Addition¶
Addition follows balanced base-3 tables. The carry is also balanced ternary:
a + b → sum | carry
N + N → P | N (−1 + −1 = −2 = +1 − 3 → sum=P, carry=N)
N + Z → N | Z
N + P → Z | Z (−1 + +1 = 0)
Z + Z → Z | Z
Z + P → P | Z
P + P → N | P (+1 + +1 = +2 = −1 + 3 → sum=N, carry=P)
When a carry-in is present (multi-trit addition), the full 3-input sum produces sum and carry-out in the same balanced ternary system.
5.2 Negation (NEG)¶
NEG inverts every trit: P→N, Z→Z, N→P. This is the “free” operation of balanced ternary — it requires no carry chain, just a trit-wise inversion.
5.3 Multiplication¶
Trit-by-trit extended product. MUL returns the low 27 trits (truncated). MULH returns the high 27 trits of the 54-trit product.
For a full 54-trit result: MUL rd_lo, rs1, rs2 then MULH rd_hi, rs1, rs2.
The trit-by-trit product table is:
a × b → product
N × N → P (−1 × −1 = +1)
N × Z → Z (−1 × 0 = 0)
N × P → N (−1 × +1 = −1)
Z × Z → Z
Z × P → Z
P × P → P (+1 × +1 = +1)
5.4 Integer division (symmetric Euclidean)¶
Setnex uses symmetric Euclidean division, the natural convention for balanced ternary:
- Quotient:
q = round_to_nearest(a / b), with ties rounded toward zero. - Remainder:
r = a − q × b, satisfying|r| ≤ |b| / 2.
This minimizes the magnitude of the remainder, which aligns with the balanced ternary philosophy of keeping values centered on zero.
When |b| is odd (including all powers of 3), the tie-break case does not occur and the result is unique.
Division by zero triggers the EXC_DIV0 exception.
Comparison with C convention: C truncates toward zero, which can produce larger remainders. The symmetric convention is more natural for balanced ternary and simplifies subsequent computations on the remainder.
5.5 FLAGS register¶
The FLAGS CSR (address 3) is updated after every ALU instruction (ADD through TCMP) and after CMP/CMPI. Both flag trits are fully ternary (N/Z/P), not binary.
| Trit | Position | Name | Values |
|---|---|---|---|
| t[0] | sign |
Result sign / comparison trichotomy | N = negative, Z = zero, P = positive |
| t[1] | carry |
Carry-out from the most significant trit | N / Z / P = outgoing carry from the balanced ternary adder, N = underflow, P = overflow |
| t[2..26] | — | Reserved (always Z) |
After ALU instructions (ADD, SUB, MUL, etc.):
- sign = sign(result) : N if result < 0, Z if result = 0, P if result > 0.
- carry = carry-out from trit position 26 of the adder (meaningful for ADD/SUB; Z for other ALU ops). N if the true result was below −(3²⁷−1)/2 (underflow), P if above +(3²⁷−1)/2 (overflow), Z otherwise.
After CMP rs1, rs2:
- sign = sign(val(rs1) − val(rs2)) — this is the trichotomy trit: N if rs1 < rs2, Z if rs1 = rs2, P if rs1 > rs2.
- carry reflects the subtraction rs1 − rs2 internally.
After CMPI rs1, imm17: same behavior with the zero-extended immediate in place of rs2.
The trichotomy trit in FLAGS.sign is the native ternary comparison result — it encodes three outcomes (less, equal, greater) in a single trit, which would require two bits in binary.
5.5.1 FLAGS datapath¶
Although FLAGS is accessible as CSR address 3 (for context save/restore on exception), it is not a generic CSR — it has dedicated wiring to the execution units:
- ALU ← FLAGS.carry: t[1] of FLAGS feeds the carry-in of the balanced ternary adder. The input is gated by
funct[0]on ADD/SUB:funct[0]=Z(ADD/SUB) forces carry-in = Z;funct[0]=P(ADC/SBC) routes FLAGS.carry directly to the adder. - ALU → FLAGS.{sign, carry}: both trits are written back every cycle an ALU instruction (ADD..TCMP, CMP, CMPI) retires. Other instructions leave FLAGS unchanged.
- Branch / select unit ← FLAGS.sign: t[0] of FLAGS is routed to the branch decision logic for BF (opcode −10) and to the selection MUX for TSEL (opcode −2). Neither instruction modifies FLAGS.
The CSRR/CSRW path (through the CSR file) is the slow path used only by exception prologue/epilogue (ESAVE, IRET) and by software that needs to inspect or fabricate FLAGS explicitly. Normal use is entirely implicit through the dedicated wires above.
Implementation note: FLAGS needs only 2 trits of storage (t[0], t[1]); t[2..26] are hard-wired to Z and do not require flip-flops.
6. Configurable ternary logic (LMODE)¶
The LMODE CSR (address 2) selects the truth tables for logic instructions TAND, TOR, TNOT, and TIMPL. LMODE holds a T27 value; only trit t[0] is significant.
When LMODE=N, the STATUS.lx trit (t[2]) selects among three sub-modes. This provides 5 logic modes total using only two trits.
6.1 Logic mode map¶
| LMODE t[0] | STATUS.lx | Logic | Z means… | Notes |
|---|---|---|---|---|
| N | N | Heyting (HT) | not provable | Intuitionistic; NOT and IMPL differ from Łukasiewicz |
| N | Z | Łukasiewicz (Ł) | not yet known | Most tolerant; IMPL = order test |
| N | P | RM3 | both true and false | Paraconsistent (Routley–Meyer); IMPL differs |
| Z | (ignored) | Kleene (K) | undecidable | Default at reset; neutral; SQL 3VL |
| P | (ignored) | B3 (Bochvar) | meaningless | Z infectious: any input Z → output Z |
At reset, LMODE = 0 and STATUS = 0 → Kleene active without explicit configuration.
STATUS.lx is only significant when LMODE=N. When LMODE=Z or LMODE=P, the lx trit is ignored.
6.2 Mode descriptions¶
Kleene (LMODE=Z) — the default. AND = min, OR = max, NOT = negation, IMPL(a,b) = OR(NOT(a), b). This is the standard three-valued logic used by SQL for NULL handling. Z propagates through some operations but not all.
Łukasiewicz (LMODE=N, STATUS.lx=Z) — shares AND/OR/NOT with Kleene. Differs only on IMPL: IMPL(a,b) = min(P, −a + b + 1). The result is P if and only if a ≤ b, making TIMPL a trit-parallel subsumption test. Most tolerant of indetermination.
Heyting (LMODE=N, STATUS.lx=N) — intuitionistic logic. Shares AND/OR with Kleene/Łukasiewicz but differs on NOT: NOT_HT(Z) = N, NOT_HT(P) = N (only N maps to P). IMPL uses the Heyting algebra residual: IMPL_HT(a,b) = greatest c such that AND(a, c) ≤ b. Most conservative about the unknown state: “not provable” is treated as false.
RM3 (LMODE=N, STATUS.lx=P) — paraconsistent logic (Routley & Meyer). Shares AND/OR/NOT with Kleene/Łukasiewicz. Differs only on IMPL. Z represents “both true and false” — a contradiction that does not explode into arbitrary conclusions. Suitable for reasoning with inconsistent data.
B3 / Bochvar (LMODE=P) — Bochvar’s internal logic (1937). Z is “meaningless” and infectious: any operation with a Z input produces Z. AND, OR, NOT, and IMPL are all affected. On classical inputs (N and P only), B3 reduces to Boolean logic. Most strict: incomplete data produces no conclusion.
Hardware cost of STATUS.lx: When LMODE=Z or LMODE=P, the lx trit is ignored and the datapath is unchanged. When LMODE=N, the TNOT instruction must check STATUS.lx to select between standard NOT (−a) and Heyting NOT. This is a single AND + MUX-2→1 on the NOT output, gated by LMODE=N ∧ STATUS.lx=N. The TIMPL MUX adds one input (RM3) to the existing Łukasiewicz/Heyting selector. Total added cost: negligible.
6.3 LMODE-insensitive operations¶
CONS and ACONS are always evaluated using Kleene semantics, regardless of LMODE. They are arithmetic primitives, not logic operations.
CONS extracts the common trit (agreement → value, disagreement → Z). ACONS extracts the absent trit (agreement → Z, disagreement → the missing trit from {N, Z, P}). Together they form a complete dual pair for balanced ternary arithmetic circuits.
TCMP is also LMODE-insensitive — it is a pure arithmetic comparison.
6.4 Complete truth tables¶
AND¶
Kleene / Łukasiewicz / Heyting / RM3 AND (identical — min):
| AND | N | Z | P |
|---|---|---|---|
| N | N | N | N |
| Z | N | Z | Z |
| P | N | Z | P |
B3 (Bochvar) AND — Z infectious:
| AND | N | Z | P |
|---|---|---|---|
| N | N | Z | N |
| Z | Z | Z | Z |
| P | N | Z | P |
OR¶
Kleene / Łukasiewicz / Heyting / RM3 OR (identical — max):
| OR | N | Z | P |
|---|---|---|---|
| N | N | Z | P |
| Z | Z | Z | P |
| P | P | P | P |
B3 (Bochvar) OR — Z infectious:
| OR | N | Z | P |
|---|---|---|---|
| N | N | Z | P |
| Z | Z | Z | Z |
| P | P | Z | P |
NOT¶
Kleene / Łukasiewicz / RM3 NOT (identical — negation):
| a | NOT(a) |
|---|---|
| N | P |
| Z | Z |
| P | N |
Heyting NOT:
| a | NOT(a) |
|---|---|
| N | P |
| Z | N |
| P | N |
B3 (Bochvar) NOT — Z infectious:
| a | NOT(a) |
|---|---|
| N | P |
| Z | Z |
| P | N |
Note: B3 NOT has the same table as Kleene NOT. The infectious property of B3 manifests in AND, OR, and IMPL, not in NOT (since NOT is unary and NOT(Z) = Z is already the “contaminated” result).
IMPL — all 5 modes are distinct¶
Kleene IMPL: IMPL(a,b) = OR(NOT(a), b) = max(−a, b)
| IMPL | N | Z | P |
|---|---|---|---|
| N | P | P | P |
| Z | Z | Z | P |
| P | N | Z | P |
Łukasiewicz IMPL: IMPL(a,b) = min(P, −a + b + 1)
| IMPL | N | Z | P |
|---|---|---|---|
| N | P | P | P |
| Z | Z | P | P |
| P | N | Z | P |
Heyting IMPL: IMPL(a,b) = greatest c such that AND(a, c) ≤ b
| IMPL | N | Z | P |
|---|---|---|---|
| N | P | P | P |
| Z | N | P | P |
| P | N | Z | P |
RM3 IMPL (paraconsistent):
| IMPL | N | Z | P |
|---|---|---|---|
| N | P | P | P |
| Z | N | Z | P |
| P | N | N | P |
B3 (Bochvar) IMPL: IMPL(a,b) = OR_B3(NOT_B3(a), b) — Z infectious:
| IMPL | N | Z | P |
|---|---|---|---|
| N | P | Z | P |
| Z | Z | Z | Z |
| P | N | Z | P |
All five IMPL tables are distinct. The discriminating cells:
(a, b) Kleene Łukasiewicz Heyting RM3 B3 (Z, N) Z Z N N Z (Z, Z) Z P P Z Z (N, Z) P P P P Z (Z, P) P P P P Z (P, N) N N N N N (P, Z) Z Z Z N Z
7. Ternary floating-point format (T26F)¶
Setnex uses the Tekum balanced ternary tapered precision format (Hunhold, arXiv:2512.10964) for floating-point arithmetic. The native width is 26 trits (T26F), stored in a 27-trit register with t[26] = Z.
7.1 Why 26 trits¶
Tekum requires an even trit width (the anchor midpoint pattern +-+-…+- has 2k trits). Since the Setnex word is 27 trits (odd), T26F uses the lower 26 trits and reserves t[26] = Z. This preserves full compatibility with NEG, TABS, LOAD/STORE, and integer CMP (for non-NaR values).
7.2 Special values¶
Three patterns are special (all 26 trits identical):
| Pattern (t[0..25]) | Name | Float value | 27-trit register |
|---|---|---|---|
| all Z | Zero | 0.0 | all Z |
| all P | Infinity (∞) | +∞ | t[26]=Z, t[0..25]=P |
| all N | Not a Result (NaR) | undefined | t[26]=Z, t[0..25]=N |
NaR is the Tekum analogue of IEEE 754 NaN. There is only one infinity (unsigned, as in the real wheel algebra: 1/0 = ∞).
7.3 Anchor and decoding¶
For a non-special T26F value, the anchor is:
anc(t) = |t₂₆| − M
where t₂₆ denotes the lower 26 trits, |·| is balanced ternary absolute value, and M is the 26-trit midpoint pattern +-+-+-+-+-+-+-+-+-+-+-+-+- (13 repetitions of +-).
The anchor trits (big-endian, MST first) are partitioned as:
[regime: 3 trits] [exponent: c trits] [fraction: 26 − c − 3 trits]
Regime (r): signed integer from the 3 most significant anchor trits. r ∈ [−13, +13], though only |r| ≤ 7 encodes a valid value. Anchor patterns with |r| > 7 are reserved encodings and decode to NaR; a conforming TFP unit must not produce them as a result of any arithmetic operation.
Exponent trit count: c = max(0, |r| − 2). Near r = 0, all trits go to fraction (maximum precision). At extreme regimes, more trits go to exponent (maximum range). This is the tapered precision property.
Exponent value:
e = int(exponent_trits) + sign(r) × BIAS[|r|]
| |r| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |-----|—|—|—|—|—|—|—|—| | BIAS | 0 | 1 | 2 | 4 | 10 | 28 | 82 | 244 | | c (exponent trits) | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | | p (fraction trits) | 23 | 23 | 23 | 22 | 21 | 20 | 19 | 18 |
Fraction value: f = Σ trit_i × 3^(−i) for i = 1..p, where p = 26 − c − 3 is the fraction trit count. f ∈ (−0.5, +0.5).
Decoded value:
θ(t) = sign × (1 + f) × 3^e
where sign is P (+1) if the T26F value is positive, N (−1) if negative.
7.4 Precision and range¶
At regime r = 0: c = 0, all 23 fraction trits available → precision ≈ 23 × log₁₀(3) ≈ 11 decimal digits, exponent = 0 (values near 1.0).
At |r| = 7: 18 fraction trits (≈ 8.6 decimal digits), exponent range ±(244 + (3⁵−1)/2) = ±365 powers of 3 (≈ ±174 decimal decades).
7.5 Key properties¶
Monotonicity (Proposition 4): for non-special values, integer ordering on the raw 26-trit word corresponds to numerical ordering of the decoded float. The existing CMP instruction gives correct ordering for normal T26F values — but FCMP is required for proper NaR handling (NaR = all-N would otherwise compare as less than everything instead of unordered).
Free negation (Proposition 3): θ(−t) = −θ(t). Trit-by-trit inversion (NEG) negates the float. Since flip(Z) = Z, this works on the full 27-trit register with t[26] = Z.
Truncation is rounding: reducing precision by discarding least-significant fraction trits is equivalent to rounding, with no carry propagation needed.
7.6 T26F in a 27-trit register¶
The convention t[26] = Z ensures:
- T26F values occupy the integer range [−(3²⁶−1)/2, +(3²⁶−1)/2], a strict subset of the T27 range.
- LOAD/STORE transfer T26F values correctly (full 27-trit word move).
- NEG and TABS operate correctly (Z is invariant under trit inversion and absolute value).
- Software can distinguish integer and float values by testing t[26] (Z → possible float, non-Z → integer exceeding T26 range).
Integer operations on T26F values produce undefined float results. TFP instructions on non-T26F integer values produce undefined results. Type discipline is the programmer’s responsibility.
8. Exception handling¶
8.1 Exception causes¶
| ECAUSE code | Name | Trigger | ETVAL contents |
|---|---|---|---|
| −13 | EXC_DIV0 |
Division by zero | 0 (unused) |
| −12 | EXC_ALIGN |
Misaligned memory access (future) | Faulting effective address |
| −11 | EXC_FAULT |
Invalid address | Faulting effective address |
| −10 | EXC_ILLEGAL |
Undefined opcode, reserved instruction, or kernel-only CSR accessed from user mode | Raw 27-trit instruction word |
| −9 | EXC_PERM_R |
MPU denies a LOAD (new in v0.7, see §9) | Faulting effective address |
| −8 | EXC_PERM_W |
MPU denies a STORE (new in v0.7) | Faulting effective address |
| −7 | EXC_PERM_X |
MPU denies an instruction fetch (new in v0.7) | Faulting PC (same as EPC) |
| 0 | EXC_ECALL_U |
System call — user flavor (ECALL with imm17[0] = Z) |
0 (syscall number is in a7, see §8.4) |
| +1 | EXC_ECALL_H |
Hypercall (ECALL with imm17[0] = P) |
0 (hypercall number is in a7, see §8.4) |
| +2 | EXC_ECALL_D |
Debug trap (ECALL with imm17[0] = N) |
0 (debug reason is in a7, see §8.4) |
| +10 | EXC_OVERFLOW |
Reserved; not raised in v0.7 (awaits a STATUS enable trit in a future revision) | 0 (unused) |
| +20 | IRQ_0 |
Asynchronous interrupt, line 0 (new in v0.7, see §10) | 0 (unused) |
| +21 | IRQ_1 |
Asynchronous interrupt, line 1 | 0 (unused) |
| +22 | IRQ_2 |
Asynchronous interrupt, line 2 | 0 (unused) |
| +23 | IRQ_3 |
Asynchronous interrupt, line 3 | 0 (unused) |
| +24 | IRQ_4 |
Asynchronous interrupt, line 4 | 0 (unused) |
| +25 | IRQ_5 |
Asynchronous interrupt, line 5 | 0 (unused) |
| +26 | IRQ_6 |
Asynchronous interrupt, line 6 | 0 (unused) |
| +27 | IRQ_7 |
Asynchronous interrupt, line 7 | 0 (unused) |
| +28 | IRQ_8 |
Asynchronous interrupt, line 8 | 0 (unused) |
The three ECALL flavors share a single handler entry point (EVEC). Negative codes denote involuntary faults; positive codes in [0, +10] denote deliberate synchronous traps; positive codes in [+20, +28] denote asynchronous interrupts (§10). Code 0 is preserved for EXC_ECALL_U so that a v0.5 binary (whose imm17 is always Z) enters the handler with ECAUSE = 0 exactly as before. See §8.4 for the flavor-tag encoding.
Async interrupts (IRQ_0..IRQ_8) reuse the exception entry path of §8.2 unchanged — they differ from synchronous exceptions only in trigger (external line) and in the saved EPC, which points at the next instruction the CPU would have executed rather than a faulting one. See §10 for dispatch rules.
A triple-fault condition (a synchronous fault raised while STATUS.depth = N, i.e. both exception frames already in use) does not produce an ECAUSE code — it is not observable by any handler. Instead it triggers an immediate machine-check reset (§8.2).
8.2 Exception entry sequence¶
The bank selected for the save is determined by STATUS.depth at the moment the fault is taken:
STATUS.depth on entry |
Save target | New depth |
|---|---|---|
| Z (0 frames active) | main bank (EPC, ESAVE, ECAUSE, ETVAL) |
P |
| P (1 frame active) | frame 2 bank (EPC2, ESAVE2, ECAUSE2, ETVAL2) |
N |
| N (2 frames active) | — | — (machine-check reset; see below) |
Case 1 — outer entry (depth = Z):
ESAVE ← STATUS(save current processor status, includingdepth = Z)EPC ← PC(address of the faulting instruction)ECAUSE ← codeETVAL ← trap value(per §8.1; 0 if unused)STATUS.mode ← N(switch to kernel mode)STATUS.ie ← N(disable interrupts)STATUS.depth ← P(one frame now active)PC ← EVEC
Case 2 — nested entry (depth = P):
ESAVE2 ← STATUS(save current status, includingdepth = P)EPC2 ← PCECAUSE2 ← codeETVAL2 ← trap valueSTATUS.depth ← N(two frames now active);modeandieare unchanged (already N)PC ← EVEC
Steps 1–7 (case 1) or 1–6 (case 2) are performed atomically — no further exception may be taken between them. The main-bank CSRs are untouched by a case-2 entry, so the outer handler’s return context is preserved.
Case 3 — triple fault (depth = N). If a synchronous fault occurs while both frames are already in use, no save is possible. The processor performs a machine-check reset: all CSRs are cleared to zero and PC ← 0, as if from power-on. No handler is invoked; ECAUSE is not written. Correct kernel code reaches this state only under a genuine hardware defect or runaway condition; a two-level-deep fault chain from user code alone is handled cleanly by case 2.
The handler reads EPC / EPC2, ECAUSE / ECAUSE2, and — when relevant — ETVAL / ETVAL2 via CSRR (selecting the bank by inspecting STATUS.depth on entry), processes the exception, then returns via IRET.
Rationale — why the outer handler is not re-entered on a nested fault.
EVECis shared between both frames, so the nested entry jumps to the same vector. The handler decides, by readingSTATUS.depthfirst, whether it is handling an outer or a nested frame and uses the corresponding CSR bank. DuplicatingEVECwould have given each frame its own vector at no functional gain, since the depth is already visible inSTATUS.
8.3 Exception return (IRET)¶
IRET restores from the bank selected by the current STATUS.depth:
STATUS.depth on IRET |
Restore source | Effect on depth |
|---|---|---|
| P (1 frame active) | main bank (EPC, ESAVE) |
→ Z (via STATUS ← ESAVE) |
| N (2 frames active) | frame 2 bank (EPC2, ESAVE2) |
→ P (via STATUS ← ESAVE2) |
| Z (no frame active) | — | undefined (illegal IRET — reserved for future trap) |
Case 1 — return from outer frame (depth = P):
PC ← EPCSTATUS ← ESAVE
Case 2 — return from nested frame (depth = N):
PC ← EPC2STATUS ← ESAVE2
Both writes are performed atomically — no further exception can be taken between them. This prevents the STATUS/PC corruption that was possible in v0.1’s two-instruction sequence.
Because ESAVE / ESAVE2 hold the full prior STATUS (with the depth trit captured as Z / P respectively at entry time), the single STATUS ← ESAVE* write simultaneously restores mode, ie, lx, and decrements depth to its pre-entry value. No separate depth-decrement step is needed.
IRET uses opcode −3 (R format). The rd, rs1, rs2, and funct fields are ignored and should be set to zero. Executing IRET while
depth = Zis reserved; v0.6 leaves the behavior undefined, and a future revision may assignEXC_ILLEGAL.
8.4 Syscall dispatch convention¶
The ECALL instruction carries a flavor tag in imm17[0] (the least-significant trit of the I-format immediate). The decoder maps the tag to one of three ECAUSE codes; all three flavors share a single EVEC.
imm17[0] |
Flavor | ECAUSE on entry |
Intended use |
|---|---|---|---|
| Z | user syscall | EXC_ECALL_U = 0 |
Ordinary user→kernel transition |
| P | hypercall | EXC_ECALL_H = +1 |
Guest kernel → hypervisor transition (when a hypervisor is present) |
| N | debug trap | EXC_ECALL_D = +2 |
Breakpoint / debugger synchronous trap |
imm17[1..16] are reserved. The decoder ignores them — only imm17[0] carries the flavor tag, so non-zero upper trits are silently tolerated (this keeps the decoder branch-free). Assemblers emit Z for forward compatibility and encode the flavor via dedicated mnemonics (ECALL, HCALL, DBGBRK — see §11.4) rather than through explicit immediate operands.
Backward compatibility. A v0.5 binary encodes every ECALL with imm17 = Z, which maps to imm17[0] = Z → EXC_ECALL_U = 0. This is the exact cause code v0.5 produced, so an unmodified v0.6 handler that still dispatches only on ECAUSE = 0 continues to work.
User-side register contract at the point of ECALL (all flavors):
| Register | Role |
|---|---|
a7 (r17) |
Call number (syscall / hypercall / debug reason, per flavor) |
a0–a6 (r10–r16) |
Arguments 1–7 |
a0 (r10) |
Return value, written by the handler, visible after resume |
Kernel-side handler contract:
- On entry,
ECAUSE ∈ {0, +1, +2}identifies the flavor;EPCpoints at theECALLinstruction itself. - Dispatch on
ECAUSEto the appropriate table (syscall / hypercall / debug). No re-fetch ofEPCand no decoding ofimm17is required. - Read
a7for the call number within the selected table; reada0–a6for arguments; execute; write the result toa0. - Advance
EPCby one word so the resumed program continues afterECALL, thenIRET:
CSRR t0, EPC
ADDI t0, t0, 1
CSRW EPC, t0
IRET
For a nested flavor trap (e.g. a debug breakpoint triggered while a syscall handler is running), the same epilogue applies to EPC2 instead — the handler uses the frame-2 bank (see §8.3).
Register preservation across ECALL is an OS-level policy, not an ISA mandate. The recommended baseline is: the kernel preserves all callee-saved registers (s0–s9, sp, ra) and the argument registers a1–a6 (regardless of how many the specific call actually reads); it overwrites a0 with the return value and is free to clobber a7 and the temporaries t0–t3.
Rationale — why
a7rather thanimm17for the call number. Placing the call number in a register lets it be computed at runtime (libc wrappers, indirect dispatch tables) and keepsECALLa pure trap with no decoded payload beyond the flavor tag. The handler dispatches on a value it already has in a GPR, avoiding a re-fetch and re-decode of the instruction atEPC.Rationale — why a separate flavor tag at all. Hypercalls and debug traps have different trust and privilege semantics from ordinary syscalls; routing them through distinct
ECAUSEvalues lets the handler pick the right dispatch table in one step, without mixing them in a single numeric space shared with OS syscall numbers (which would force each OS to carve out reserved ranges for hypercalls/debug).
8.5 Exception trap value (ETVAL, ETVAL2)¶
ETVAL (CSR address 9, MST-first +00) and its frame-2 counterpart ETVAL2 (CSR address 13, MST-first +++) are T27 words written by the processor on every exception entry (§8.2). They carry exception-specific context that does not fit into the 27 values available in ECAUSE / ECAUSE2. The per-exception semantics is given by the ETVAL contents column of §8.1.
| Exception | ETVAL (or ETVAL2) contents |
|---|---|
EXC_FAULT, EXC_ALIGN |
The effective address computed by the faulting LOAD / STORE (i.e. rs1 + imm17 of the offending access) |
EXC_ILLEGAL |
The raw 27-trit instruction word the decoder rejected |
EXC_ECALL_U, EXC_ECALL_H, EXC_ECALL_D |
0 — the call number is passed in a7, see §8.4 |
EXC_DIV0, EXC_OVERFLOW |
0 — no auxiliary value needed |
ETVAL is written on a case-1 (outer) entry and ETVAL2 on a case-2 (nested) entry; a nested fault never overwrites the outer handler’s ETVAL. An outer handler can therefore safely read ETVAL once at the top and rely on that value remaining valid across any depth-1 nested fault it may subsequently incur; a nested handler reads ETVAL2 by the same rule.
Rationale — dedicated CSR instead of re-decoding
EPC. Reconstructing the faulting address or the offending opcode fromEPCrequires an instruction fetch, which may itself fault (self-modifying code, paged-out text section). Exposing the value directly in a CSR decouples the handler from whatever state the instruction stream is in. Matches the RISC-Vmtvaldesign.
9. Memory Protection Unit (MPU)¶
9.1 Role¶
The Memory Protection Unit controls which memory addresses are accessible from each privilege mode and for which access type (read, write, instruction fetch). It converts the §2.3 address-space convention into hardware-enforceable boundaries: a user-mode LOAD or STORE targeting an address without user permission raises EXC_PERM_R or EXC_PERM_W, and an instruction fetch without execute permission raises EXC_PERM_X (§8.1).
The MPU performs no address translation — the address presented by the pipeline is the address delivered to memory. It decides, on each access, whether that access may proceed. This is the “PMP” style of RISC-V, not the “MMU” style.
9.2 Region model¶
The MPU holds nine region descriptors, indexed 0..8. Each descriptor specifies:
- A base address (T27 word address).
- A size exponent
n— the region covers3ⁿconsecutive 27-trit words starting atbase. Natural alignment required (base mod 3ⁿ = 0); see §9.7. - Three permission trits (R, W, X), each carrying a ternary privilege level:
N= no access,Z= kernel only,P= user + kernel. - A valid flag.
Regions are naturally aligned power-of-three (NAPO3) blocks. The descriptor bank is internal processor state, not memory-mapped; software manipulates it through three CSRs (§9.3).
Why no “whole address space” region. The address space is symmetric (
[−(3²⁷−1)/2, +(3²⁷−1)/2]) while a NAPO3 region is a one-sided half-open interval[base, base + 3ⁿ); there is no single well-aligned(base, n)pair that covers both halves without wrap. To blanket the full range, either rely on the kernel-mode no-match default (§9.5 step 4), or program two top-size regions — one anchored at a negative base and one at a non-negative base.
9.3 Indirect CSR access¶
Three CSRs form the programming interface:
| CSR | Addr | Role |
|---|---|---|
MPU_SELECT |
−1 | Index of the region targeted by subsequent MPU_BASE / MPU_CFG operations. Legal values: 0..8. |
MPU_BASE |
−2 | Reads or writes the base field of the selected region. |
MPU_CFG |
−3 | Reads or writes the config field (size + permissions + valid) of the selected region; see §9.4. |
Programming region i is therefore:
LI t0, i
CSRW MPU_SELECT, t0
CSRW MPU_BASE, base_value
CSRW MPU_CFG, cfg_value
All three CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL. Writing a value outside [0, 8] to MPU_SELECT, or a malformed MPU_CFG (reserved trits non-zero, size out of range), also raises EXC_ILLEGAL, with the instruction word in ETVAL.
Rationale — indirection rather than direct mapping. A direct mapping would require 18 CSR slots for 9 × (base, cfg), which does not fit in the 3-trit CSR address space. Indirection keeps the CSR cost constant and lets the region count grow in a future revision without touching the CSR map.
9.4 MPU_CFG layout¶
MPU_CFG packs the non-base fields into a single 27-trit word:
| Trit(s) | Field | Values |
|---|---|---|
| t[0] | perm_R |
N = no read, Z = kernel read only, P = user + kernel read |
| t[1] | perm_W |
N = no write, Z = kernel write only, P = user + kernel write |
| t[2] | perm_X |
N = no execute, Z = kernel execute only, P = user + kernel execute |
| t[3..6] | size |
Size exponent n (4-trit T4 integer); region covers 3ⁿ words. Legal range: n ∈ [0, 26]. n = 0 → one word; n = 26 → 3²⁶ consecutive words (slightly less than half the address space). Values n < 0 or n ≥ 27 written via CSRW MPU_CFG raise EXC_ILLEGAL. |
| t[7] | valid |
P = active, N = inactive, Z = reserved (descriptor treated as inactive) |
| t[8..26] | — | Reserved (must be Z on write; read as 0) |
9.5 Matching and permission check¶
On every memory access — LOAD, STORE, or instruction fetch — with effective address A in current mode M, the MPU performs:
-
Scan the 9 regions. Region
imatches ifvalid(i) = PandA ∈ [base(i), base(i) + 3^size(i)). -
Select. If multiple regions match, the one with the lowest index wins (region 0 has highest priority).
-
Evaluate the permission trit for the access type (
perm_Rfor LOAD,perm_Wfor STORE,perm_Xfor fetch) against modeM:
| Permission trit | Kernel (mode = N) |
User (mode = P) |
|---|---|---|
| N (none) | Deny — fault | Deny — fault |
| Z (kernel only) | Allow | Deny — fault |
| P (user + kernel) | Allow | Allow |
-
No-match default: - Kernel mode: allow (default-permit). - User mode: deny (default-deny) — raises
EXC_PERM_*. -
On deny, the access is suppressed and one of
EXC_PERM_R,EXC_PERM_W,EXC_PERM_Xis raised via the §8.2 entry sequence.ETVALholds the faulting effective address (or the faulting PC, identical toEPC, forEXC_PERM_X).
Rationale — asymmetric defaults. In kernel mode the MPU acts as a blacklist (poison regions fault even for kernel); in user mode it acts as a whitelist (user can only reach addresses explicitly granted). This mirrors RISC-V PMP semantics and fits the v0.6 model where the kernel is trusted by default and user code must be explicitly admitted.
9.6 Poison regions and defensive use¶
A region at low index (0 or 1) with perm_R = perm_W = perm_X = N and valid = P is a poison region: any access — even kernel — faults. Typical uses:
- DMA buffers owned by a peripheral (CPU accesses race the device).
- Guard words between legitimate regions to catch buffer overruns with a precise fault.
- Hardware-reserved windows (fuses, debug taps, memory-mapped configuration).
- Firmware regions temporarily locked during self-update.
Because lowest index wins (§9.5 step 2), a poison region at index 0 overrides any permissive region at index 1+, even for the kernel.
9.7 NAPO3 alignment¶
Natural alignment requires base(i) mod 3^size(i) = 0. The hardware does not check alignment at CSRW MPU_BASE time. A misaligned base is legal syntactically; the containment test in §9.5 step 1 uses the literal integer interval [base, base + 3^n) and may match addresses the author did not intend. Alignment is software responsibility — typically an assembler or linker computes region bases as multiples of 3ⁿ for the chosen n, and a misaligned base is a programming error, not a legitimate runtime state.
9.8 Reset state¶
At reset, every region descriptor has valid = N. The MPU is effectively disabled: no regions match, kernel-mode accesses proceed under the no-match default of §9.5 step 4, and the boot firmware (which runs in kernel mode per §2.4) executes unimpeded. The boot firmware is expected to program whichever regions the system needs before switching to user mode.
Typical boot sequence:
- Kernel programs one or more regions granting user-mode
perm_X = Pover the user text segment,perm_R = P/perm_W = Pover user data and stack. - Kernel programs defensive regions (poison windows, kernel-only overlays) if desired.
- Kernel switches
STATUS.modeto P via anIRETthat restores a user-modeESAVE.
9.9 Interactions¶
- Nested exceptions (§8.2): an MPU fault taken while
depth = Puses the frame-2 bank normally. An MPU fault whiledepth = Nis a triple fault → machine-check reset. This matters when the outer handler itself touches an address that lacks kernel permission — for example, dereferencing a user-supplied pointer that happens to land in a kernel-only or poison region. ETVAL: always the faulting address forEXC_PERM_R/EXC_PERM_W; the faulting PC (=EPC) forEXC_PERM_X. The access type is carried by theECAUSEcode, not byETVAL.- No new opcode: all MPU programming uses existing
CSRR/CSRW.
10. Asynchronous interrupts¶
10.1 Role¶
Asynchronous interrupts deliver external events (timer tick, peripheral completion, incoming byte) to the CPU independently of the instruction stream. Unlike a synchronous exception (§8), an interrupt is not caused by the instruction in flight; it arrives between instructions, driven by a signal outside the pipeline.
Interrupts are Setnex’s only source of preemption: without them, a user program that does not voluntarily ECALL keeps the CPU forever. A timer IRQ lets the kernel reclaim the CPU at a bounded cadence — the foundation of preemptive scheduling.
10.2 Controller model¶
The controller presents nine IRQ lines, indexed k ∈ {0..8}. For each line it maintains three software-visible state elements:
- Pending: the external source has requested service. The controller samples the external line each cycle and sets
IPENDING[k] = Pwhile the line is asserted,Zwhile idle. Software may also write it (see §10.3, §10.6). - Level-sensitive devices hold the line asserted until the handler acknowledges (typically by reading/writing an MMIO status register on the peripheral); the line then deasserts and
IPENDING[k]falls back toZon the next sample. - Edge-triggered devices must latch their “event pending” bit inside the peripheral and hold the IRQ line asserted from the edge event until the handler acks. The IRQ controller itself does not latch edges — a pulse shorter than one sample interval can be missed. This pushes edge-latching into the peripheral, where it belongs (the peripheral knows when its event is served; the controller does not).
- Enable: per-line mask.
- Priority: per-line priority level.
A line is eligible when pending and enabled. The controller selects the eligible line with the highest priority; ties are broken by lowest line number.
10.3 CSR layouts¶
IPENDING (addr −4) — pending bitvector
| Trit | Meaning |
|---|---|
| t[k] for k ∈ [0, 8] | Line k: P = pending, Z = idle, N = reserved |
| t[9..26] | Reserved (read as 0) |
Writing Z to a trit clears a pending state that was set by software or by a now-deasserted external source; if the external source is still asserting, the controller re-raises the trit on the next cycle. Writing P to a trit whose external source is not asserting synthesizes a software-generated interrupt (implementation-defined latency). Writing N has no effect — the trit retains its previous value. The N value is reserved for a future semantic (e.g. sticky / edge-latched pending) and kept unassigned to preserve forward compatibility.
IENABLE (addr −5) — enable mask
Same layout as IPENDING: t[k] = P enables line k, Z disables it. Writing N has no effect — the trit retains its previous value (reserved, as for IPENDING).
IPRIORITY (addr −6) — per-line priority
Nine 3-trit fields pack exactly into 27 trits:
| Trit field | Line | Priority |
|---|---|---|
| t[0..2] | 0 | T3 integer ∈ [−13, +13]; higher = higher priority |
| t[3..5] | 1 | “ |
| t[6..8] | 2 | “ |
| t[9..11] | 3 | “ |
| t[12..14] | 4 | “ |
| t[15..17] | 5 | “ |
| t[18..20] | 6 | “ |
| t[21..23] | 7 | “ |
| t[24..26] | 8 | “ |
At reset all priority fields are 0; arbitration then reduces to lowest-line-number wins.
10.4 Dispatch rule¶
Between instructions, the CPU evaluates:
if STATUS.ie = P
and STATUS.depth ≠ N (no save slot available → no async entry)
and ∃ k : IPENDING[k] = P AND IENABLE[k] = P:
k* ← argmax_k (IPRIORITY[k]), lowest k on ties
take interrupt IRQ_k* (see §10.5)
The check happens at each instruction-commit boundary. An interrupt cannot preempt a single instruction mid-way; it waits for the next commit point.
10.5 Entry, handler, return¶
Entry reuses §8.2 verbatim. For a case-1 entry (user code running, depth = Z, line k* wins arbitration):
ESAVE ← STATUSEPC ← PC_next— address of the next instruction that would have executed (not a faulting one)ECAUSE ← +20 + k*ETVAL ← 0STATUS.mode ← N,STATUS.ie ← N,STATUS.depth ← PPC ← EVEC
A case-2 (nested) entry uses the frame-2 bank (EPC2, ESAVE2, ECAUSE2, ETVAL2) as in §8.2.
Handler boilerplate. The handler must select the correct exception bank based on STATUS.depth before reading ECAUSE / ECAUSE2:
LI t2, 3 ; trit index for depth
CSRR t0, STATUS
TGET t1, t0, t2 ; LST(t1) = depth trit (P = outer, N = nested)
BRT3 t1, impossible, nested ; P → fall-through (outer frame),
; Z → impossible (no frame active),
; N → nested (frame 2)
; --- outer frame (depth = P): cause/data in main bank ---
CSRR t0, ECAUSE
JMP dispatch
nested:
; --- nested frame (depth = N): cause/data in frame-2 bank ---
CSRR t0, ECAUSE2
dispatch:
ADDI t1, t0, -20 ; t1 = line number if this is an IRQ
; ... dispatch on t1 to per-line service routine ...
; service the peripheral, quiesce the line
IRET
impossible:
; depth = Z inside a handler is a spec violation — halt for diagnosis
HALT
IRET restores STATUS (which brings ie back to its pre-entry value, typically P) and jumps to the bank-appropriate EPC (see §8.3) — the instruction that was pending when the interrupt fired.
The same depth-select prologue applies to every exception handler, not only IRQ: a synchronous-fault handler that reads
ETVAL/ECAUSEmust branch through the sameSTATUS.depthtest before selecting the main or frame-2 bank. Real kernels typically factor it into a single shared trampoline.
10.6 Acknowledgment¶
The handler must quiesce the source before IRET, typically by accessing a status or completion register on the peripheral. If the external line remains asserted on return, the controller re-raises IPENDING[k] and the interrupt re-fires immediately after ie goes back to P. Software may also clear IPENDING[k] directly by CSRW IPENDING with Z at position k, but a still-asserting external source will re-set the trit on the next cycle.
10.7 Masking¶
Interrupt delivery is gated by three independent mechanisms, all of which must allow it:
- Global enable:
STATUS.ie = P. Any other value blocks all IRQs. Set to N automatically on any exception entry (§8.2). - Per-line enable:
IENABLE[k] = P. - Frame-depth guard:
STATUS.depth ≠ N. When both frames are in use, no save slot is available and async entry is suppressed until IRET frees a frame. This is automatic and cannot be overridden.
A synchronous exception entry sets ie ← N per v0.6 §8.2, so an IRQ cannot preempt a fresh sync handler. A handler that wants low-latency nesting (IRQs accepted during a long syscall) must re-enable ie explicitly after saving whatever state it cares about — a subsequent IRQ would then push to frame 2.
10.8 Priority arbitration¶
Priorities are software-controlled 3-trit fields per line ([−13, +13]). The controller picks the eligible line with the maximum priority value; ties go to the lowest line number. Negative priorities are legal — they rank below default-zero lines without fully disabling them.
Rationale — software-controlled priority rather than line-hardwired. Hardwiring priority to the line number (line 0 always wins) is simpler but gives no knob to re-rank the timer below an urgent disk IRQ. A per-line 3-trit field costs exactly one CSR for all 9 lines and subsumes the hardwired case (all priorities = 0 → line number arbitrates).
11. Calling convention (ABI)¶
11.1 Argument passing¶
- Arguments 1–7:
a0–a6(r10–r16) - Additional arguments: pushed on the stack (decreasing addresses from
sp) - Return values:
a0(single),a0+a1(pair) a7(r17) carries the syscall number onECALL(§8.4). Outside the syscall path it is an ordinary caller-saved register, available as a scratch temporary or as an 8th argument when caller and callee agree on such an extension.
11.2 Callee-saved registers¶
s0–s9 (r8–r9, r18–r25), sp (r2), ra (r1).
Caller-saved (may be clobbered across a call): t0–t3 (r5–r7, r26), a0–a7 (r10–r17).
11.3 Stack¶
The stack grows toward negative addresses. sp points to the top of the stack (last valid word). sp must always be word-aligned — since the address space is word-addressed (§2.3), this is trivially satisfied and imposes no additional constraint on the compiler.
Frame layout. A function that needs to save ra and the caller’s s0 (the standard non-leaf case) allocates N ≥ 2 words and lays out the frame as follows (high addresses at the top):
high addr ┌─────────────┐ ← (caller's sp = this frame's s0)
sp+(N−1) │ saved ra │
sp+(N−2) │ saved s0 │
sp+(N−3) │ local[0] │
… │ … │
sp+0 │ local[N−3] │ ← sp
low addr └─────────────┘
Standard prologue — allocates the frame first so that at no point does sp transiently reference an unallocated region; then saves ra and the old s0 into the reserved slots; finally installs the new frame pointer:
ADDI sp, sp, -N # allocate frame (N = locals + 2)
STORE ra, sp, N-1 # save return address at top of frame
STORE s0, sp, N-2 # save caller's frame pointer
ADDI s0, sp, N # new s0 = caller's sp
Standard epilogue — mirror of the prologue:
LOAD ra, sp, N-1 # restore return address
LOAD s0, sp, N-2 # restore caller's frame pointer
ADDI sp, sp, N # deallocate frame
RET
A leaf function that makes no further calls and does not use s0 may skip saving ra and s0 entirely, reducing the prologue to a single ADDI sp, sp, -N (or omitting it if no locals are spilled).
11.4 Pseudo-instructions (assembler)¶
| Pseudo | Expansion | Notes |
|---|---|---|
RET |
JMPA ra, 0 |
Return from subroutine |
MOV rd, rs |
ADD rd, rs, zero |
Register copy |
NEG rd, rs |
native NEG rd, rs (R format, rs2/funct ignored) |
Balanced ternary negation |
CALL label |
CALL offset23(label) |
Compute offset from PC |
LI rd, imm |
LI if fits imm17; else LUI + ADDI |
Load arbitrary immediate |
TSET rd, rs1, rs2 |
TSETZ rd, rs1, rs2 |
Alias: clear trit to Z |
TNIMPL rd, a, b |
TNOT t0, b then TAND rd, a, t0 |
Non-implication: a AND NOT b |
TREIMPL rd, a, b |
TIMPL rd, b, a |
Reverse implication: b ⇒ a |
NOT rd, rs |
TNOT rd, rs |
Mnemonic alias for clarity |
BFLT label |
BF P00, label |
Branch if FLAGS.sign = N (less than) |
BFEQ label |
BF 0P0, label |
Branch if FLAGS.sign = Z (equal) |
BFGT label |
BF 00P, label |
Branch if FLAGS.sign = P (greater than) |
BFLE label |
BF PP0, label |
Branch if FLAGS.sign ≤ Z (less or equal) |
BFGE label |
BF 0PP, label |
Branch if FLAGS.sign ≥ Z (greater or equal) |
BFNE label |
BF P0P, label |
Branch if FLAGS.sign ≠ Z (not equal) |
ECALL |
ECALL imm17=0 (i.e. imm17[0] = Z) |
User syscall flavor (EXC_ECALL_U); v0.5-compatible default encoding |
HCALL |
ECALL imm17[0] = P, imm17[1..16] = Z |
Hypercall flavor (EXC_ECALL_H); new in v0.6 |
DBGBRK |
ECALL imm17[0] = N, imm17[1..16] = Z |
Debug-trap flavor (EXC_ECALL_D); new in v0.6 |
11.5 Syscall calling convention¶
Distinct from the standard function call convention of §11.1. When user code invokes a kernel service via ECALL:
- Syscall number:
a7(r17). - Arguments 1–7:
a0–a6(r10–r16). Calls requiring more than 7 arguments pass the overflow on the stack following §11.1. - Return value:
a0(r10). By convention, a non-negativea0is a success result and a negativea0(sign(a0) = N) is a balanced-ternary error code. The convention is a library-level contract, not enforced by the ISA.
See §8.4 for the handler-side contract and the recommended register-preservation policy.
A libc-style wrapper setting the syscall number from a symbolic constant:
; write(fd, buf, len) — fd in a0, buf in a1, len in a2 on entry
write:
LI a7, SYS_WRITE ; syscall number → r17
ECALL
RET ; a0 now holds the syscall result
A pass-through wrapper (the number is already in a7):
; long syscall(long number, long arg0, ..., long arg6)
; number already in a7, args already in a0..a6
syscall:
ECALL
RET
12. Reference encoding (textual representation)¶
Trits are written using the -/0/+ convention.
In memory layout (LST-first): t[0] is stored and written first. Instruction encoding diagrams use this convention.
In human-readable display (MST-first): the most significant trit is written leftmost, as in ordinary number notation. Register addresses and integer literals use this convention.
Each convention is explicitly labeled.
Example — ADD r3, r1, r2 (R format)
Field values:
- opcode ADD = −40 = enc(−40, 4):
−40 ÷ 3 → q = −13, r = −1 → t[0] = N;
−13 ÷ 3 → q = −4, r = −1 → t[1] = N;
−4 ÷ 3 → q = −1, r = −1 → t[2] = N;
−1 ÷ 3 → q = 0, r = −1 → t[3] = N.
Result: ---- (LST-first). Verify: −1 −3 −9 −27 = −40 ✓
- rd = r3 = enc(3, 3): 3 ÷ 3 → q=1, r=0 → t[0]=Z; 1 ÷ 3 → q=0, r=1 → t[1]=P; t[2]=Z → LST-first:
0+0 - rs1 = r1 = enc(1, 3): t[0]=P, t[1]=Z, t[2]=Z → LST-first:
+00 - rs2 = r2 = enc(2, 3): 2 ÷ 3 → q=1, r=−1 → t[0]=N; 1 ÷ 3 → q=0, r=1 → t[1]=P; t[2]=Z → LST-first:
-+0 - funct = 0 (14 trits of Z)
LST-first memory layout:
t[0-3] t[4-6] t[7-9] t[10-12] t[13-26]
---- 0+0 +00 -+0 00000000000000
Full 27-trit word (LST-first): ----0+0+00-+000000000000000
The opcode sits at t[0]–t[3] — the decoder starts working as soon as the first trits arrive.
13. Architectural summary¶
| Parameter | Value |
|---|---|
| Word width | 27 trits |
| General-purpose registers | 27 (r0–r26), T3 address |
| Vector registers (new in v0.8) | 27 (v0–v26), T3 address, 27 trits each — bank selected by opcode |
| CSR registers | 27 addressable (T3), 19 defined |
| Instruction width | 27 trits (fixed) |
| Instruction formats | 5 (R, I, J, U, B) |
| Opcode | 4 trits (81 values, 60 used in v0.8) |
| Address space | ±(3²⁷−1)/2 words ≈ ±3.6 × 10¹² |
| Max immediate (I format) | 17 trits ≈ ±64 million |
| Max branch offset (J format) | 20 trits ≈ ±1.7 billion |
| Max jump offset (U format) | 23 trits ≈ ±4.7 × 10¹⁰ |
| Logic modes | 5: Kleene (default), Łukasiewicz, Heyting, RM3, B3 (Bochvar) |
| Floating-point | T26F: 26-trit Tekum in 27-trit register (t[26] = Z) |
| Arithmetic flags | 2 trits: sign, carry |
| Division convention | Symmetric Euclidean |
| Memory unit | 1 word = 27 trits |
| Endianness | Least significant trit first (little-endian) |
| Exception frames | 2 (main bank + frame 2); triple-fault = machine-check reset |
| MPU regions | 9, indirect CSR access, NAPO3 sizing, per-axis ternary permissions |
| Interrupt lines | 9 async IRQs, per-line enable / priority, shared EVEC with sync exceptions |
| Vector lanes (new in v0.8) | 27 × 1-trit lanes per word (trit-parallel); 8 vector opcodes (+15..+22) |
14. Complete opcode map¶
Quick-reference table, sorted by opcode value.
| Opcode | Mnemonic | Format | Group |
|---|---|---|---|
| −40 | ADD / ADDS / ADC | R | ALU |
| −39 | SUB / SUBS / SBC | R | ALU |
| −38 | MUL / MULH | R | ALU |
| −37 | DIV | R | ALU |
| −36 | MOD | R | ALU |
| −35 | NEG | R | ALU |
| −34 | TAND | R | ALU / Logic |
| −33 | TOR | R | ALU / Logic |
| −32 | TNOT | R | ALU / Logic |
| −31 | TIMPL | R | ALU / Logic |
| −30 | CONS | R | ALU / Trit |
| −29 | ACONS | R | ALU / Trit |
| −28 | TSHIFT | R | ALU / Trit |
| −27 | TCMP | R | ALU / Trit |
| −26 | LOAD | I | Memory |
| −25 | STORE | I | Memory |
| −24 | LI | I | Memory |
| −23 | LUI | I | Memory |
| −22 | ADDI | I | Memory |
| −21 | BRT3 | B | Branch |
| −20..−19 | — | — | reserved |
| −18 | CMPI | I | Memory |
| −17 | BEQ | J | Branch |
| −16 | BNE | J | Branch |
| −15 | BLT | J | Branch |
| −14 | BGT | J | Branch |
| −13 | BLE | J | Branch |
| −12 | BGE | J | Branch |
| −11 | JMPA | J | Branch |
| −10 | BF | J | Branch |
| −9 | JMP | U | Jump |
| −8 | CALL | U | Jump |
| −7 | CSRR | I | System |
| −6 | CSRW | I | System |
| −5 | CSRX | I | System |
| −4 | ECALL | I | System — imm17[0] = flavor tag (Z/P/N → user/hyper/debug); imm17[1..16] reserved, ignored by decoder |
| −3 | IRET | R | System |
| −2 | TSEL | R | Special |
| −1 | NOP | — | Special |
| 0 | HALT | — | Special |
| +1 | TGET | R | Trit ops |
| +2 | TSET(N/Z/P) | R | Trit ops |
| +3 | TSIGN | R | Trit ops |
| +4 | CMP | R | Trit ops |
| +5 | TABS | R | Trit ops |
| +6 | TMIN | R | Trit ops |
| +7 | TMAX | R | Trit ops |
| +8 | FADD | R | TFP |
| +9 | FSUB | R | TFP |
| +10 | FMUL | R | TFP |
| +11 | FDIV | R | TFP |
| +12 | FCMP | R | TFP |
| +13 | FCVT | R | TFP |
| +14 | — | — | reserved (TFP) |
| +15 | VADD / VSUB | R | Vector |
| +16 | VMUL | R | Vector |
| +17 | VLOG | R | Vector — funct[0..2] selects TAND/TOR/TNOT/TIMPL/CONS/ACONS |
| +18 | VSEL | R | Vector — 3-way ternary select with mask in funct[0..2] |
| +19 | VCMP | R | Vector — lane-wise sign-of-difference, produces ternary mask |
| +20 | VRED | R | Vector — reduction to GPR; funct[0..2] selects SUM/SIGN/CONS/LST/MST/AND/OR |
| +21 | VPERM | R | Vector — funct[0..2] selects rotate/shift/reverse/shuffle |
| +22 | VMOVE | R | Vector — inter-bank movement; funct[0..2] selects mode (§16.5) |
| +23..+24 | — | — | reserved (Vector v0.9) |
| +25..+40 | — | — | reserved (Custom) |
15. Extension roadmap¶
v0.9 — Vector extension follow-up¶
| Extension | Opcode range | Rationale |
|---|---|---|
VFMA (fused multiply-accumulate) |
+23 | Closes the gap with TFP and ML kernels: vd ← vd + vs1 × vs2 lane-wise in one instruction. Critical for dot-product / convolution loops. |
VGATHER / VSCATTER |
+24 | Indirect lane addressing via an index v-reg. Enables sparse-data vector code. May share a single opcode via funct direction trit. |
| Tryte / trybble lane modes | funct of existing vector opcodes | Optional 9 × 3-trit and 3 × 9-trit lane partitions, selected per-instruction via funct trits. Reuses the existing v-reg bank and ALU; only the carry-cut points differ. Investigated only if a target workload (DSP, codecs) demonstrates clear gain over the trit-parallel default. |
VLEN CSR (vector length) |
new CSR slot | Optional: a runtime-configurable active-lane count for dynamic vector length, as in RVV. Deferred until a use case appears — current trit-parallel model already covers fixed-width 27-lane work. |
| Masked arithmetic | funct trit on VADD / VMUL / VLOG |
Per-lane predication using a mask v-reg, separate from VSEL. Adds a one-trit funct selector and one extra register field via funct[3..5]. Defer pending design review. |
v1.0 — Stabilization¶
ISA freeze: a complete reference implementation (Python simulator + assembler), regression test suite at every spec section, and a worked-example application (e.g. tritlib-driven kernel + user program demonstrating syscall, MPU, IRQ, and vector loop). No new opcodes between v0.9 and v1.0.
Hardware target — post-1.0¶
Once the ISA reaches 1.0, the following implementation milestones are pursued outside of the versioned specification:
| Milestone | Notes |
|---|---|
| FPGA soft core | VHDL or Verilog description of Setnex; each trit encoded as 2 bits on FPGA fabric, ternary signals on external bus (same approach as 5500FP). Target: iCE40 (open toolchain via nextpnr/yosys) or Xilinx/Intel. |
| Assembler tooling | Setnex assembler in Python, building on tritlib. |
| ASIC exploration | Contingent on CNT or memristor ternary gate availability; long-term. |
16. Vector extension¶
16.1 Role and datapath model¶
The vector extension turns the existing 27-trit datapath into a trit-parallel SIMD unit. Each vector instruction operates simultaneously on the 27 trits of a vector register, treating the register as a vector of 27 lanes × 1 trit. One VADD performs 27 independent trit additions in the cycles a single scalar ADD would have used to compute the equivalent 27-trit sum with carry propagation — the saving comes not from the arithmetic itself but from amortizing fetch, decode, register-file access, and loop control over 27 elements.
The model is deliberately minimal:
- Lane width: 1 trit (fixed in v0.8). Tryte (3 trits) and trybble (9 trits) lane modes are reserved for v0.9 (§15).
- No carry between lanes: the existing 27-trit adder is reused with carry propagation disabled at every lane boundary (one MUX per trit). The hardware cost is negligible.
- Closure on saturation: arithmetic results are clamped to {N, Z, P} (§16.3), so a v-reg always holds 27 well-defined trits — no overflow into a wider intermediate.
- No vector length CSR: every vector op operates on all 27 lanes. Predication is achieved by writing a mask (see §16.4 /
VSEL).
16.2 Vector register bank¶
27 vector registers v0..v26, each a full 27-trit word. The bank is disjoint from the GPR bank: vector instructions reference v-regs through the same 3-trit register field they would use for GPRs in scalar instructions, with the bank determined entirely by the opcode (see §2.1.1 dispatch table).
v0 is writable by convention but software typically maintains it at zero to serve as a vector-zero source (analogous to r0 but not hardware-enforced).
The bank carries no implicit register conventions (no caller/callee-saved split is mandated at the ISA level); the v-reg ABI is left to the platform — see §16.6 for the recommended save/restore rules.
16.3 Lane arithmetic semantics¶
For arithmetic opcodes (VADD, VSUB, VMUL), the result is computed independently per lane and clamped to {N, Z, P}:
| Operation | Lane result |
|---|---|
VADD vd, vs1, vs2 |
vd[i] ← clamp(vs1[i] + vs2[i]) for i = 0..26 |
VSUB vd, vs1, vs2 |
vd[i] ← clamp(vs1[i] − vs2[i]) |
VMUL vd, vs1, vs2 |
vd[i] ← vs1[i] × vs2[i] (closed in {N,Z,P}; no clamp needed) |
where clamp(x) maps x ∈ [−2, +2] to {N, Z, P} as: clamp(−2) = N, clamp(−1) = N, clamp(0) = Z, clamp(+1) = P, clamp(+2) = P.
No FLAGS update. Vector arithmetic instructions do not modify the FLAGS register — the saturation policy avoids overflow signalling, and per-lane status bits would require widening FLAGS or introducing a vector flags register, both rejected for v0.8. Software needing per-lane sign tests should use
VCMP vd, vs1, vzeroto materialize a sign mask.Multiplication is closed. The product of any two trits in {N, Z, P} is in {N, Z, P} (see §5.3 product table);
VMULtherefore needs no saturation logic at all — it is the cleanest of the three.No vector divide. Trit-by-trit division has no useful semantics on 1-trit lanes (
P / P = P,Z / Z = NaR, etc.); the operation is omitted. Division is a scalar concern in this revision.
16.4 funct sub-mode encodings¶
Several vector opcodes use the low 3 trits of the funct field as a sub-mode selector. The encodings are:
VLOG (opcode +17)¶
funct[0..2] (LST-first) |
Mnemonic | Operation per lane | LMODE-following |
|---|---|---|---|
Z Z Z |
VAND |
vd[i] ← TAND(vs1[i], vs2[i]) |
yes |
P Z Z |
VOR |
vd[i] ← TOR(vs1[i], vs2[i]) |
yes |
Z P Z |
VIMPL |
vd[i] ← TIMPL(vs1[i], vs2[i]) |
yes |
P P Z |
VNOT |
vd[i] ← TNOT(vs1[i]) (rs2 ignored) |
yes |
Z Z P |
VCONS |
vd[i] ← cons(vs1[i], vs2[i]) |
no (always Kleene) |
P Z P |
VACONS |
vd[i] ← acons(vs1[i], vs2[i]) |
no (always Kleene) |
VAND/VOR/VIMPL/VNOT follow the current LMODE (§6) — switching LMODE to Łukasiewicz, Heyting, RM3, or B3 changes the result of all four lane-wise. VCONS/VACONS are arithmetic primitives (§6.3) and ignore LMODE, exactly as their scalar counterparts.
Other funct[0..2] patterns are reserved and raise EXC_ILLEGAL.
VSEL (opcode +18) — three-way merge with vector mask¶
VSEL vd, vs1, vs2, vm selects per-lane among vs1, zero, and vs2 according to the corresponding lane of mask vm:
vm[i] |
vd[i] |
|---|---|
| N | vs1[i] |
| Z | Z (zero) |
| P | vs2[i] |
The mask register index vm is encoded in funct[0..2]. This is the vector counterpart of scalar TSEL (§4 opcode −2): one instruction, three sources fused via a ternary mask. Combined with VCMP (which produces N/Z/P = </=/>), it gives a complete lane-wise compare-and-select pair in two instructions.
Why zero on
vm[i] = Zrather than preserve-vd[i]. Zero‑on‑Z gives the user a trit-level “blank out” semantic for free:VCMPagainst a threshold andVSELproduces a sparse vector with explicit zeros where the predicate was indeterminate. A merge-on-Z variant (preservingvd[i]) is recoverable in two instructions if needed; the reverse is not.
VCMP (opcode +19)¶
Single mode: vd[i] ← sign(vs1[i] − vs2[i]), lane-wise. The result is a ternary mask in {N, Z, P} per lane — N where vs1[i] < vs2[i], Z where equal, P where greater. The same mask can be fed directly into VSEL without re-encoding.
VRED (opcode +20) — reduction to scalar GPR¶
VRED rd, vs1 reduces all 27 lanes of vs1 to a single 27-trit value written to GPR rd (note: rd is a GPR here, not a v-reg — VRED is the inverse of VBCAST in operand bank).
funct[0..2] |
Mnemonic | Result in rd |
|---|---|---|
Z Z Z |
VRED.SUM |
Σ vs1[i] for i=0..26, in [−27, +27], encoded as a T27 integer |
P Z Z |
VRED.SIGN |
sign(Σ vs1[i]) ∈ {N, Z, P} (signed majority) |
Z P Z |
VRED.CONS |
Kleene consensus across all 27 lanes: P if all are P or Z (and at least one P), N if all are N or Z (and at least one N), Z otherwise |
P P Z |
VRED.LST |
trit-min across all 27 lanes (= scalar TMIN semantics — P iff every lane is P) |
Z Z P |
VRED.MST |
trit-max across all 27 lanes (= scalar TMAX semantics — N iff every lane is N) |
P Z P |
VRED.AND |
trit-fold using the current LMODE AND |
Z P P |
VRED.OR |
trit-fold using the current LMODE OR |
VRED.SIGN is the natural classifier output for ternary neural networks: a 27-weight × 27-input dot product reduces to one trit indicating “negative / neutral / positive” decision. VRED.SUM is a 5-trit-magnitude integer suitable for further scalar processing.
VPERM (opcode +21) — lane permutation¶
VPERM vd, vs1, vs2 rearranges the 27 lanes of vs1 into vd. The mode is selected by funct[0..2]:
funct[0..2] |
Mnemonic | Operation |
|---|---|---|
Z Z Z |
VROTL |
rotate left by val(vs2_LST) lanes (cyclic; vs2 read as small integer) |
P Z Z |
VROTR |
rotate right by val(vs2_LST) lanes |
Z P Z |
VSHL |
shift left by k lanes (vacated lanes filled with Z); k = lower 5 trits of vs2 |
P P Z |
VSHR |
shift right by k lanes (vacated lanes filled with Z) |
Z Z P |
VREV |
reverse: vd[i] ← vs1[26−i] (vs2 ignored) |
P Z P |
VSHUF |
arbitrary shuffle: vd[i] ← vs1[lane_index(vs2, i)] (lane index for each output position is read from vs2 — see below) |
For VSHUF, the source lane for each of the 27 output lanes is encoded in three trits of vs2: output lane i takes its value from input lane val(vs2[3i..3i+2]) mod 27. This packs 27 lane indices (each 3 trits) exactly into a 27-trit word — a clean dense encoding with no waste.
16.5 VMOVE — inter-bank movement (opcode +22)¶
VMOVE is the only vector opcode that crosses the GPR / v-reg boundary. The mode is encoded in funct[0..2]; lane index, when relevant, lives in funct[3..5] (3 trits, range 0..26).
funct[0..2] |
Mnemonic | rd bank | rs1 bank | Operation |
|---|---|---|---|---|
Z Z Z |
VMOV.GV |
v-reg | GPR | v[rd] ← r[rs1] (whole 27-trit word copy) |
P Z Z |
VMOV.VG |
GPR | v-reg | r[rd] ← v[rs1] (whole word) |
Z P Z |
VMOV.VV |
v-reg | v-reg | v[rd] ← v[rs1] (vector–vector copy) |
P P Z |
VBCAST |
v-reg | GPR | v[rd][i] ← r[rs1][LST] for all i — splat the LST of r[rs1] to every lane |
Z Z P |
VINS |
v-reg | GPR | v[rd][k] ← r[rs1][LST]; other lanes preserved; k = val(funct[3..5]) |
P Z P |
VEXT |
GPR | v-reg | r[rd] ← TSIGN-extend(v[rs1][k]); k = val(funct[3..5]) (result is N, Z, or P in a T27 word) |
For VINS/VEXT, lane index k outside [0, 26] raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5).
VMOV.VV does not introduce an opcode of its own — software can also realize a vector-to-vector copy via VOR vd, vs1, vs1 or VADD vd, vs1, v0 (assuming the v0 = 0 convention), but VMOV.VV is provided as a clearer mnemonic; assemblers may emit either form.
Why
VBCASTsplats only the LST trit of the source GPR. The natural alternative — copying all 27 trits of the GPR to the v-reg — is already covered byVMOV.GV.VBCASTexists for the distinct case “I have one trit (typically a sign or flag) and want to fill all 27 lanes with it” — a pattern recurring in masking and in initializing constant vectors. The two operations are kept distinct because their use cases are.
16.6 Context save/restore¶
The vector bank is part of the architectural state. A context switch that wishes to preserve user-mode vector code must save and restore all 27 v-regs (27 × 27 = 729 trits = 27 words).
Because vector use is opt-in, kernels are encouraged to implement lazy save: track per-thread a “vector-touched” flag, and skip the save/restore if the thread has not executed any vector instruction since its last entry to the kernel. The ISA does not provide hardware for this — it is a kernel-level optimization.
A reference save sequence:
; sp points at the top of a 27-word save area
VMOV.VG t0, v0 ; v0 → r5 (t0)
STORE t0, sp, 0
VMOV.VG t0, v1
STORE t0, sp, 1
; … 25 more pairs …
VMOV.VG t0, v26
STORE t0, sp, 26
Restore is the symmetric sequence with VMOV.GV. A future revision may introduce a fused VLD / VST block move (gather/scatter, §15) to compress this loop.
16.7 Ternary advantages¶
The vector extension is not a mechanical port of binary SIMD. Five primitives have no clean equivalent in a binary ISA:
-
Three-state mask. A v-reg of trits doubles as a predicate where each lane carries one of three meanings — N (one branch), Z (zero / inactive), P (other branch). Binary SIMD encodes the third state through a separate “zeroing-vs-merging” bit (AVX-512 style); on Setnex it is intrinsic to the data.
-
VCMPis trichotomic in one shot. Lane-wise sign-of-difference produces<,=,>simultaneously in three distinct mask values. Binary SIMD needs two compares (one for<, one for=) to recover the same trichotomy. -
Kleene consensus reduction (
VRED.CONS). A native three-valued voter across 27 lanes — agree-positive, agree-negative, or disagree — in one instruction. The closest binary analog is a popcount majority + sign extraction, two to three instructions and a register. -
Signed-sum reduction (
VRED.SIGN). The signed sum of 27 ternary trits gives the sign of a dot product directly. Equivalent binary code requires two popcounts (one for+1s, one for−1s) plus a subtraction. This is the key kernel for BitNet 1.58-bit inference (§16.8). -
LMODE-aware logic (
VLOG,VRED.AND/OR). 27 lanes of Łukasiewicz / Heyting / Bochvar / RM3 logic in one cycle. No binary ISA can express non-classical three-valued logic without a lookup-table emulation costing tens of cycles per evaluation.
Combined, these primitives make Setnex’s vector unit a natural target for: ternary neural networks (BitNet, ternary weight networks), three-valued logic SAT solvers, ternary cellular automata, and any code where a third value (unknown, null, inactive) is first-class data rather than an exception.
16.8 Worked example: BitNet 1.58-bit dot product¶
A BitNet-style layer multiplies a vector of activations a[0..N−1] (each in {N, Z, P}) by a weight matrix W whose rows are also in {N, Z, P}, producing one output trit per row via a sign reduction.
For one row of 27 weights and 27 activations, the inner kernel is:
; assume:
; v1 ← row of 27 weights (loaded via VMOV.GV from a GPR holding the row word)
; v2 ← 27 input activations (loaded similarly)
;
VMUL v3, v1, v2 ; v3[i] ← w[i] × a[i] ∈ {N, Z, P}, lane-wise
VRED a0, v3, funct=SIGN ; a0 ← sign(Σ v3[i]) ∈ {N, Z, P}
; N = output −1, Z = output 0, P = output +1
Two instructions for a 27-element ternary dot-product reduced to a single output trit. The same kernel on a 64-bit binary CPU needs two popcounts (one over the +1 mask, one over the −1 mask), a subtraction, and a sign extraction — minimum 5–6 instructions plus the unpacking of the packed-2-bits-per-weight format that binary SIMD requires to handle ternary values at all.
Scaling: a layer with 256 rows and 256 inputs is 256 × 10 such 27-wide kernels (= 2560 vector instructions for the multiplication phase). The output trits are then re-packed via VINS into output v-regs for the next layer.
Why the third state matters. A weight of
Zin BitNet means “this connection contributes nothing” — sparsity is encoded in the value, with no separate sparse-index bookkeeping.VMULpropagates the zero through naturally (Z × anything = Z), andVRED.SIGNignores it. Setnex thereby supports dense storage of sparse weights with no overhead.
Setnex ISA v0.8 — Reference specification Setnex project / Terias — Eric Tellier