Setnex ISA — Specification v0.8¶

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this specification except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, this specification is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

In accordance with Section 3 of the Apache License 2.0, any implementation of this specification is granted a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, use, sell, and distribute implementations that comply with this specification.

Balanced ternary instruction set architecture, inspired by RISC-V. Balanced ternary {−1, 0, +1}, 27-trit word, 27 registers, fixed-length instructions.

Changelog from v0.7¶

#	Change	Rationale
1	Vector extension added — dedicated 27-register vector bank `v0..v26`, trit-parallel datapath (each lane = 1 trit, 27 lanes per word). Eight new opcodes (+15..+22): `VADD`, `VMUL`, `VLOG`, `VSEL`, `VCMP`, `VRED`, `VPERM`, `VMOVE`. See §16.	Amortizes fetch/decode/control over 27 simultaneous trit operations and exposes ternary-native primitives (3-way mask, ternary compare, Kleene consensus reduction) that have no clean binary equivalent. Ternary ML at 1.58-bit (BitNet) maps natively.
2	New register bank: 27 vector registers `v0..v26`, each 27 trits wide. Indexed using the 3-trit register fields of the existing R format — vector opcodes interpret `rs1`/`rs2`/`rd` as v-reg indices instead of GPRs. `VMOVE` is the only opcode that crosses banks.	Keeps the format set unchanged (still R, I, J, U, B). Per-opcode bank dispatch matches RVV practice; no instruction-side mode bit needed.
3	Logic ops merged into `VLOG`: `TAND`/`TOR`/`TNOT`/`TIMPL` (LMODE-following) and `CONS`/`ACONS` (LMODE-bypass) all share opcode +17 with sub-mode in `funct[0..2]`	Same datapath, same operand shape — fusing into one opcode preserves opcode budget for v0.9 (FMA, gather/scatter, trybble/tryte lanes).
4	Permutation and shift merged into `VPERM` (opcode +21) — modes: rotate, shift, reverse, shuffle	Same lane-permutation network, only the control signal differs. Single opcode covers all.
5	Inter-bank movement merged into `VMOVE` (opcode +22) — modes: GPR↔v-reg whole-word copy, v-reg↔v-reg copy, single-trit broadcast/insert/extract	Bridges the two register files without burning three separate opcodes; lane index when relevant lives in `funct[3..5]`.
6	Two opcodes (+23, +24) left reserved for v0.9	Earmarked for fused multiply-accumulate (`VFMA`) and gather/scatter (`VGATHER`/`VSCATTER`), or trybble/tryte lane modes if explored next.
7	Architectural summary updated — opcode count 52 → 60; register count 27 GPR → 27 GPR + 27 v-reg	Vector bank is part of the architectural state visible to a context switch; see §16.6 for the save/restore contract.
8	No change to scalar ISA, CSRs, exception model, MPU, or interrupt controller	v0.8 is purely additive over v0.7. Existing binaries run unmodified; vector code is opt-in.

Changelog from v0.6¶

#	Change	Rationale
1	Memory Protection Unit (MPU) added — 9 indirect-access regions, NAPO3 size encoding, ternary per-axis permissions ({N = none, Z = kernel only, P = user + kernel}) for R/W/X	Turns the §2.3 convention “negative = kernel, positive = user” into a hardware-enforced rule. Prerequisite for running untrusted user code without silent kernel corruption. See §9.
2	Asynchronous interrupt controller added — 9 IRQ lines, three CSRs (`IPENDING`, `IENABLE`, `IPRIORITY`), dispatch through the existing exception machinery (shared `EVEC`, frame-2 bank for nested cases)	Provides external-event delivery (timer, UART, etc.) independent of the instruction stream. Makes preemptive scheduling possible. See §10.
3	Six new CSRs at addresses −1..−6: `MPU_SELECT`, `MPU_BASE`, `MPU_CFG`, `IPENDING`, `IENABLE`, `IPRIORITY`	First use of the negative CSR address half. MPU uses indirect access (select-then-access) so region count is independent of CSR budget.
4	Three new `EXC_PERM_R` / `EXC_PERM_W` / `EXC_PERM_X` codes (−9, −8, −7) distinct from `EXC_FAULT`	`EXC_FAULT` = address invalid (doesn’t exist); `EXC_PERM_*` = address exists but MPU denies. Three separate codes let the handler dispatch on access type without re-decoding the faulting instruction. Mirrors the RISC-V load/store/instruction page-fault split.
5	Nine new IRQ cause codes `IRQ_0..IRQ_8` (+20..+28)	Hardware selects the highest-priority pending-enabled IRQ line and writes the corresponding cause code at dispatch time. A single-CSR read tells the handler which line to service.
6	§2.3 address space: “convention” upgraded to “enforcement” — the MPU can make the user/kernel split physically binding	Documentation reflects that the split is no longer a soft contract.
7	Spec clarifications (no semantic change vs. v0.6 simulator): `EXC_OVERFLOW` reclassified “reserved, not raised” (no STATUS enable trit exists); `TSHIFT` explicitly produces 0 for \|val(rs2)\| ≥ 27; `ECALL imm17[1..16]` ignored by the decoder (not “must be Z”); Tekum anchor \|r\| > 7 decodes to NaR; IPENDING/IENABLE write-N is a no-op	Align the spec text with observed v0.6 simulator behavior; close under-specified edges surfaced during the v0.7 review.
8	MPU_CFG `size` range narrowed from [0, 27] to [0, 26]	`n = 27` had no well-defined base under the signed NAPO3 model; kernel-mode no-match default already covers the “everything allowed” case.
9	§10.5 handler boilerplate rewritten to select the correct bank via `STATUS.depth` before reading `ECAUSE` / `ECAUSE2`; §10.2 Pending/Enable semantics extended with explicit level-vs-edge rule	v0.7 first-draft example silently mis-dispatched on nested entry; edge-triggered peripherals had no documented path to coexist with the level-sensitive sampler.

Changelog from v0.5¶

#	Change	Rationale
1	Nested exceptions: a second bank of exception CSRs (`EPC2`, `ESAVE2`, `ECAUSE2`, `ETVAL2`) added; `STATUS.depth` (trit `t[3]`) tracks active frame count	v0.5 machine-check reset on any synchronous fault in kernel mode was too brutal — forced a fault-free kernel. One level of nesting is now tolerated; only a third fault (while `depth = N`) triggers reset.
2	`ECALL` `imm17[0]` repurposed as flavor tag: Z = user syscall, P = hypercall, N = debug trap	Subdivides the `EXC_ECALL` cause without consuming a new opcode. Backward compatible with v0.5 binaries (whose `imm17 = Z` → user syscall, same as before).
3	`EXC_ECALL` split into `EXC_ECALL_U` = 0, `EXC_ECALL_H` = +1, `EXC_ECALL_D` = +2; all three share a single `EVEC`	Lets the handler dispatch on `ECAUSE` alone without decoding `imm17`. Positive codes preserve the convention “negative = fault, positive = deliberate synchronous trap”. `EXC_ECALL_U = 0` preserves v0.5 binary compatibility.
4	Triple-fault condition (fault while `depth = N`) triggers machine-check reset; no `ECAUSE` code allocated	Condition is not observable by any handler, so reserving a cause code would be dead weight. Documented by name in §8.2.

Changelog from v0.4¶

#	Change	Rationale
1	`r17` repurposed: `s2` (callee-saved) → `a7` (argument register / syscall number)	Reserve a dedicated register for the syscall number, following the RISC-V convention; lets the handler dispatch without re-fetching the faulting instruction
2	Saved register range narrowed: `s0`–`s10` (11 regs) → `s0`–`s9` (10 regs)	`r18`–`r25` renumbered from `s3`–`s10` to `s2`–`s9` so the saved range is contiguous with no gap
3	`ECALL` clarified: `imm17` reserved (must be Z); syscall number passed in `a7`; `ECAUSE ← EXC_ECALL` (= 0) on entry	v0.4 was ambiguous on whether `imm17` overwrote `ECAUSE`; register-based dispatch avoids a re-decode of the instruction at `EPC`
4	New §8.4: Syscall dispatch convention	Documents the handler-side contract and the `EPC + 1` epilogue pattern
5	New §11.5: Syscall calling convention	ABI-level wrapper/caller contract, distinct from the standard function call convention of §11.1
6	§11.3 prologue/epilogue rewritten: allocate frame first, then save `ra` and the old `s0`; mirrored epilogue	v0.4’s prologue overwrote `s0` without saving it, and left `ra` stored outside the allocated stack region between the STORE and the final `sp` decrement
7	§11.3: stack alignment stated explicitly (`sp` is word-aligned)	Closes an under-specified point of the v0.4 ABI; trivially satisfied by word-addressing
8	New CSR `ETVAL` at address 9 (MST-first `+00`); new §8.5	Symmetric with the ECALL/`a7` clarification: `EXC_FAULT`, `EXC_ALIGN` and `EXC_ILLEGAL` now have a documented data channel (faulting address / raw instruction) instead of forcing the handler to re-decode `EPC`
9	§8.2 extended: synchronous fault while `STATUS.mode = N` → machine-check reset	v0.4 had a single `EPC`/`ESAVE` save slot; nested synchronous exceptions would silently corrupt it. Proper nested handling deferred to v0.6
10	§8.1 exception table gains an `ETVAL contents` column	Makes per-exception data channel explicit
11	funct[0] mode selector on ADD/SUB inverted: N = saturating (was P), P = with-carry (was N)	Rationale: N evokes clamping/constraint, P evokes additive chaining — more natural ternary mnemonic.

Changelog from v0.3¶

#	Change	Rationale
1	Overflow flag removed from FLAGS	Overflow is carried by FLAGS.carry (P = overflow, N = underflow); a separate overflow trit was redundant
2	TFP extension: FADD, FSUB, FMUL, FDIV, FCMP, FCVT (opcodes +8 to +13)	Native ternary floating-point using Tekum T26F format; 6 of 7 reserved TFP opcodes allocated
3	T26F format: 26-trit Tekum in 27-trit register, t[26] = Z	Even-width Tekum requirement; compatible with NEG, TABS, LOAD/STORE
4	New §7: Ternary floating-point format specification	Tekum anchor, regime, exponent, fraction, special values, properties
5	Opcodes used: 46 → 52	6 TFP instructions added

Changelog from v0.2¶

#	Change	Rationale
1	ADC (funct[0]=P on ADD) and SBC (funct[0]=P on SUB) added	Multi-precision carry chain with ternary carry {N,Z,P}; funct[0] is now a 3-way mode selector: Z=normal, N=saturating, P=with-carry
2	TSEL added (opcode −2, R format with rp in funct[0..2])	3-way conditional select on FLAGS.sign — the defining ternary-native instruction
3	BF added (opcode −10, J format, rs1 field = condition mask)	Trit-masked branch on FLAGS.sign; 6 comparison branches in 1 opcode
4	BRT3 added (opcode −21, new format B)	Ternary three-way branch on LST(rX): P=fall-through, Z=off_z, N=off_n
5	Format B added: 4t opcode + 3t rX + 10t off_z + 10t off_n	New format for BRT3; two 10-trit offsets (±29 524)
6	Instruction formats: 4 → 5 (R, I, J, U, B)	Accommodates BRT3
7	Opcodes used: 43 → 46 (TSEL, BF, BRT3; ADC/SBC via funct)	Was 43 in v0.2

1. Notation and conventions¶

Symbol	Meaning
`t`	trit ∈ {−1, 0, +1}, written `N`, `Z`, `P`
`T[n]`	n-trit balanced ternary field (value in [−(3ⁿ−1)/2, +(3ⁿ−1)/2])
`trybble`	3-trit field (T3) — smallest addressable grouping; register address, sub-opcode
`tryte`	9-trit field (T9) — byte-analogue, sub-word unit
`word`	27-trit field (T27) — basic memory and register unit
`val(x)`	balanced integer value of a ternary word
`enc(n, w)`	balanced ternary encoding of integer n on w trits
`sign(x)`	N if x < 0, Z if x = 0, P if x > 0
`t[i]`	trit i of a word (i=0 = least significant trit)
`rd`	destination register
`rs1`, `rs2`	source registers
`imm`	signed immediate (balanced ternary)
`LST`	Least Significant Trit (t[0])
`MST`	Most Significant Trit (t[w−1] for a w-trit word)
`zero-extend(x)`	Extend a narrow field to 27 trits by filling higher trits with Z (preserves balanced value, since Z = 0)

Trits are numbered from 0 (LST) to 26 (MST) within a word. The unit hierarchy (trit / trybble / tryte / word = 1 / 3 / 9 / 27 trits, i.e. 3⁰..3³) gives the natural sub-divisions of a word: a word holds 3 tryte lanes or 9 trybble lanes.

1.1 Textual representation¶

Trit glyphs: - for N (−1), 0 for Z (0), + for P (+1).

Two conventions coexist in this document:

MST-first (human-readable, used in register encoding examples and inline notation): most significant trit is written leftmost, as in ordinary number notation. Example: ++0-+ means t[4]=P, t[3]=P, t[2]=Z, t[1]=N, t[0]=P → val = 81 + 27 + 0 − 3 + 1 = 106.
LST-first (memory layout, used in instruction encoding diagrams): t[0] is written leftmost. This matches the physical trit ordering in memory and in instruction format diagrams.

Each convention is explicitly labeled where it appears. When unmarked, MST-first is assumed.

1.2 Integer value¶

The integer value of a w-trit field starting at t[i] is: Σ trit[i+k] × 3^k for k=0..w−1.

Words are stored in memory least significant trit first (little-endian).

2. Programming model¶

2.1 General-purpose registers¶

27 T27 registers, named r0–r26, encoded on 3 trits (address ∈ [−13, +13]).

Register	ABI name	Conventional role
`r0`	`zero`	Always 0 (read-only by convention)
`r1`	`ra`	Return address
`r2`	`sp`	Stack pointer (grows toward negative addresses)
`r3`	`gp`	Global pointer
`r4`	`tp`	Thread pointer
`r5`–`r7`	`t0`–`t2`	Temporaries (caller-saved)
`r8`–`r9`	`s0`–`s1`	Saved registers (callee-saved), `s0` = frame pointer
`r10`–`r16`	`a0`–`a6`	Arguments / return values
`r17`	`a7`	Argument register / syscall number (see §8.4, §11.5)
`r18`–`r25`	`s2`–`s9`	Saved registers (callee-saved)
`r26`	`t3`	Extra temporary

Register addresses use balanced ternary encoding naturally (shown MST-first): r0 = 0 → 000, r1 = 1 → 00+, r2 = 2 → 0+-, r3 = 3 → 0+0, …, r13 = 13 → +++, r14 = −13 → ---, …, r26 = −1 → 00-.

Note: the 27 register indices 0–26 are mapped to the 27 balanced ternary T3 values −13 to +13. The mapping is: register rN has address enc(N, 3) for N ∈ {0..13}, and enc(N−27, 3) for N ∈ {14..26}. This wraps naturally in the balanced ternary range.

2.1.1 Vector registers (new in v0.8)¶

A separate bank of 27 vector registers v0–v26, each a full 27-trit word, is introduced for the vector extension (§16). Vector registers are addressed using the same 3-trit encoding as GPRs (one of the 27 balanced ternary T3 values −13..+13); the bank that is read or written is determined by opcode, not by the register field itself.

Opcode range	Bank used for `rd` / `rs1` / `rs2`
Scalar opcodes (−40..+14)	GPR bank `r0..r26`
Vector opcodes (+15..+21)	v-reg bank `v0..v26`
`VMOVE` (+22)	Mixed — direction encoded in `funct[0..2]`; see §16.5

There is no architectural alias vzero distinct from v0; software conventionally keeps v0 cleared and treats it as the zero vector when needed (analogous to the r0/zero convention, but not hardware-enforced — v0 is a writable register).

The vector bank is part of the architectural state and must be saved/restored across context switches that wish to preserve user-mode vector code; see §16.6 for the recommended save/restore contract. There is no v-reg variant of r0’s read-as-zero contract.

2.2 Control and status registers (CSR)¶

CSR addresses are T3 (3 trits), providing 27 addressable slots (values −13 to +13) — a symmetric match with the 27 GPRs. Each CSR is a full T27 word. Unused slots are reserved and read as zero.

Address (T3, MST-first)	Decimal	Name	Description
`00+`	1	`PC`	Program counter (T27, word address)
`0+-`	2	`LMODE`	Logic mode (see §6)
`0+0`	3	`FLAGS`	Arithmetic flags (see §5.5)
`0++`	4	`EPC`	Exception program counter
`+--`	5	`ECAUSE`	Exception cause (T27)
`+-0`	6	`EVEC`	Exception vector (handler address)
`+-+`	7	`STATUS`	Processor status (see §2.4)
`+0-`	8	`ESAVE`	Saved STATUS on exception entry (new in v0.2)
`+00`	9	`ETVAL`	Exception trap value (see §8.5) — new in v0.5
`+0+`	10	`EPC2`	Frame-2 exception PC (nested exception; see §8.2) — new in v0.6
`++-`	11	`ECAUSE2`	Frame-2 exception cause — new in v0.6
`++0`	12	`ESAVE2`	Frame-2 saved STATUS — new in v0.6
`+++`	13	`ETVAL2`	Frame-2 exception trap value — new in v0.6
`00-`	−1	`MPU_SELECT`	MPU region index selector (new in v0.7, see §9)
`0-+`	−2	`MPU_BASE`	Base address of the MPU-selected region (new in v0.7)
`0-0`	−3	`MPU_CFG`	Config (size + permissions + valid) of the selected region (new in v0.7)
`0--`	−4	`IPENDING`	Pending-IRQ bitvector (new in v0.7, see §10)
`-++`	−5	`IENABLE`	Per-line IRQ enable mask (new in v0.7)
`-+0`	−6	`IPRIORITY`	Per-line IRQ priority (new in v0.7)
others	—	—	Reserved (read as zero)

The four frame-2 CSRs (EPC2, ECAUSE2, ESAVE2, ETVAL2) are the depth-2 counterparts of EPC, ECAUSE, ESAVE, ETVAL. They are written by the processor only when a synchronous fault occurs while the outer handler is still running (STATUS.depth = P); see §8.2 for the full entry rules. EVEC and STATUS are not duplicated — both frames share the single exception vector, and STATUS.depth tracks which bank is the active save target.

The six v0.7 CSRs occupy the first negative half of the CSR address space. The MPU CSRs (MPU_SELECT, MPU_BASE, MPU_CFG) provide indirect access to the 9-region descriptor bank: write a region index to MPU_SELECT, then read or write MPU_BASE and MPU_CFG to manipulate that region (see §9.3). The three interrupt CSRs are direct (one word each). All six CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5). The pre-existing system CSRs (EPC, ECAUSE, EVEC, STATUS, and the frame-2 bank) follow the same kernel-only rule.

At reset, all CSRs are initialized to zero.

2.3 Address space¶

Addressing unit: 27-trit word
Address space: T27 → val ∈ [−(3²⁷−1)/2, +(3²⁷−1)/2] ≈ ±3.6 × 10¹²
PC is incremented by 1 (one word) after each instruction
Negative addresses: stack and kernel space
Positive addresses: user code and data

The user/kernel split is a convention at the ISA level but can be made physically binding by the MPU (§9): the kernel programs regions with perm = Z (kernel-only) or P (user+kernel) as appropriate, and user-mode accesses outside any user-permitted region raise EXC_PERM_* (§8.1). Prior to v0.7 this separation was soft — any mode could reach any address.

2.4 STATUS register structure¶

Trit	Name	Values
t[0]	`mode`	N = kernel, Z = reserved, P = user
t[1]	`ie`	N = interrupts masked, Z = reserved, P = interrupts enabled
t[2]	`lx`	Logic extension for LMODE=N: N = Heyting, Z = standard Łukasiewicz, P = RM3 (see §6)
t[3]	`depth`	Exception-frame depth (new in v0.6): Z = 0 frames active, P = 1 frame active (main bank holds outer context), N = 2 frames active (main bank + frame 2)
t[4..26]	—	Reserved (must be Z)

At reset, STATUS = N → kernel mode, interrupts masked, LMODE=N submode = Łukasiewicz, depth = Z (no frames active).

Rationale — why a trit for depth. One balanced trit holds exactly the three states the nested-exception machinery needs: no frame, one frame, two frames. A fourth state (triple fault) would be out of range anyway — the hardware treats it as a machine-check reset (§8.2) rather than a representable depth. Using depth = N as the “fully nested” state preserves the monotonic Z→P→N progression as exceptions stack up.

3. Instruction encoding¶

Every instruction is a fixed-length 27-trit word.

3.1 Opcode field¶

The 4 least significant trits (t[0]–t[3]) form the primary opcode. 3⁴ = 81 possible combinations — ample opcode space.

Placing the opcode at t[0]–t[3] allows the decoder to begin working as soon as the first trits of the word arrive, without waiting for the full word.

3.2 Instruction formats¶

Five formats. The format is determined solely by the opcode; the decoder does not inspect other fields to determine the format.

R format (register–register)
t:  [0-3]   [4-6]   [7-9]   [10-12]  [13-26]
    opcode   rd      rs1     rs2      funct (14 trits)
    4 trits  3 trits 3 trits 3 trits  14 trits

I format (immediate)
t:  [0-3]   [4-6]   [7-9]   [10-26]
    opcode   rd      rs1     imm17
    4 trits  3 trits 3 trits 17 trits  (imm ∈ [−(3¹⁷−1)/2, +(3¹⁷−1)/2] ≈ ±64 million)

J format (conditional branch)
t:  [0-3]   [4-6]   [7-26]
    opcode   rs1     offset20
    4 trits  3 trits 20 trits  (offset ∈ [−(3²⁰−1)/2, +(3²⁰−1)/2] ≈ ±1.7 billion)

U format (unconditional jump)
t:  [0-3]   [4-26]
    opcode   offset23
    4 trits  23 trits  (offset ∈ [−(3²³−1)/2, +(3²³−1)/2] ≈ ±4.7 × 10¹⁰)

B format (ternary three-way branch) — new in v0.3
t:  [0-3]   [4-6]   [7-16]     [17-26]
    opcode   rX      off_z      off_n
    4 trits  3 trits 10 trits   10 trits  (offsets ∈ [−(3¹⁰−1)/2, +(3¹⁰−1)/2] ≈ ±29 524)

Fields are laid out from least significant (t[0]) to most significant (t[26]).

Rationale for format U: In v0.1, JMP and CALL used J format, wasting the rs1 field (3 trits) that they do not need. Format U merges those trits into the offset, multiplying jump range by 27 at no cost. Conditional branches retain J format because they require rs1 for the test register.

Rationale for format B (new in v0.3): A ternary three-way branch needs two offsets (for Z and N outcomes; P falls through). The 23 trits after opcode+rX are split evenly into two 10-trit offset fields, each with a range of ±29 524. This is the most ternary-native branch format: one instruction, three outcomes, zero wasted trits.

The 14-trit funct field in R format allows 3¹⁴ ≈ 4.8 million variants per opcode — only a few trits are used so far (mode selectors on ADD/SUB, FCVT, TSET; register field on TSEL), leaving the rest for future extensions. Funct sub-fields are specified per-instruction.

4. Instruction set¶

4.1 Opcode table (4 trits = value from −40 to +40)¶

ALU group — R format (opcode −40 to −27)¶

Opcode (val)	Mnemonic	funct	Operation
−40	`ADD`	`funct[0]=Z`	rd ← rs1 + rs2 (balanced arithmetic)
−40	`ADDS`	`funct[0]=N`	rd ← rs1 + rs2 (saturating: clamps to T27 range)
−40	`ADC`	`funct[0]=P`	rd ← rs1 + rs2 + FLAGS.carry (new in v0.3)
−39	`SUB`	`funct[0]=Z`	rd ← rs1 − rs2
−39	`SUBS`	`funct[0]=N`	rd ← rs1 − rs2 (saturating)
−39	`SBC`	`funct[0]=P`	rd ← rs1 − rs2 − FLAGS.carry (new in v0.3)
−38	`MUL`	`funct[0]=Z`	rd ← low 27 trits of rs1 × rs2
−38	`MULH`	`funct[0]=P`	rd ← high 27 trits of rs1 × rs2 (54-trit product)
−37	`DIV`	Z	rd ← rs1 ÷ rs2 (symmetric Euclidean, see §5.4)
−36	`MOD`	Z	rd ← rs1 mod rs2 (symmetric remainder, see §5.4)
−35	`NEG`	Z	rd ← −rs1 (trit-by-trit inversion: P↔N, Z→Z)
−34	`TAND`	Z	rd ← rs1 AND rs2 (per LMODE, see §6)
−33	`TOR`	Z	rd ← rs1 OR rs2 (per LMODE)
−32	`TNOT`	Z	rd ← NOT rs1 (per LMODE)
−31	`TIMPL`	Z	rd ← rs1 IMPL rs2 (per LMODE)
−30	`CONS`	Z	rd ← consensus(rs1, rs2) — always Kleene
−29	`ACONS`	Z	rd ← anti-consensus(rs1, rs2) — always Kleene
−28	`TSHIFT`	Z	rd ← rs1 shifted by val(rs2) trits (left if >0, right if <0; vacated trits filled with Z). val(rs2) uses the full T27 range, no masking; shifts with
−27	`TCMP`	Z	rd ← trit-by-trit comparison: rd[i] = sign(rs1[i] − rs2[i])

Consensus: trit-by-trit, cons(a,b) = a if a==b, else Z. Anti-consensus: trit-by-trit, acons(a,b) = Z if a==b, else the absent trit: acons(N,Z) = acons(Z,N) = P — acons(Z,P) = acons(P,Z) = N — acons(N,P) = acons(P,N) = Z. CONS and ACONS are dual operations: CONS extracts agreement, ACONS extracts the absent trit. Both ignore LMODE — they are arithmetic primitives, not logic operations.

TCMP is the trit-by-trit spaceship operator. For each trit position i: rd[i] = N if rs1[i] < rs2[i], Z if rs1[i] = rs2[i], P if rs1[i] > rs2[i]. TCMP complements CONS/ACONS: CONS extracts shared values, TCMP extracts the ordering relation. Together, these three form a complete trit-comparison toolkit.

Saturating arithmetic (ADDS, SUBS): the result is clamped to [−(3²⁷−1)/2, +(3²⁷−1)/2] instead of wrapping. FLAGS.carry is set to Z (no overflow) regardless, since overflow is absorbed. FLAGS.sign reflects the clamped result.

Carry-chain arithmetic (ADC, SBC — new in v0.3): the value of FLAGS.carry from the previous ALU operation is added to (ADC) or subtracted from (SBC) the result. This enables multi-precision arithmetic. The balanced ternary carry {N, Z, P} is richer than the binary carry {0, 1} — one ADC propagates 3 values natively. Sequence for 54-trit addition: ADD lo, a_lo, b_lo then ADC hi, a_hi, b_hi.

funct[0] on ADD/SUB is a 3-way mode selector: Z = normal (ADD/SUB), N = saturating (ADDS/SUBS), P = with carry (ADC/SBC). This is itself a ternary exploitation — one trit selects among 3 modes.

Note on funct indexing: funct[i] denotes the i-th trit within the funct field (local index, 0 = LST of funct). In absolute instruction-word terms, funct[0] lives at t[13] (the trit immediately after rs2). Using the LST-end of funct for mode selectors keeps the decoder logic close to the opcode and rs2 decoder.

Memory group — I format (opcode −26 to −18)¶

Opcode (val)	Mnemonic	Operation
−26	`LOAD`	rd ← Mem[rs1 + imm17]
−25	`STORE`	Mem[rs1 + imm17] ← rd (rd field used as source)
−24	`LI`	rd ← zero-extend(imm17) to 27 trits
−23	`LUI`	rd ← imm17 << 10 (load upper immediate, low 10 trits set to Z)
−22	`ADDI`	rd ← rs1 + zero-extend(imm17)
−21	`BRT3`	B
−20 to −19	—	reserved
−18	`CMPI`	FLAGS ← compare(rs1, zero-extend(imm17)) — see §5.5

Branch group — J format (opcode −17 to −10) and U format (opcode −9 to −8)¶

Opcode (val)	Mnemonic	Format	Condition	Semantics
−17	`BEQ`	J	rs1 == 0	PC ← PC + offset20
−16	`BNE`	J	rs1 ≠ 0	PC ← PC + offset20
−15	`BLT`	J	rs1 < 0	PC ← PC + offset20
−14	`BGT`	J	rs1 > 0	PC ← PC + offset20
−13	`BLE`	J	rs1 ≤ 0	PC ← PC + offset20
−12	`BGE`	J	rs1 ≥ 0	PC ← PC + offset20
−11	`JMPA`	J	unconditional	PC ← rs1 + offset20
−10	`BF`	J	FLAGS.sign matches mask	Trit-masked branch on FLAGS (new in v0.3, see below)
−9	`JMP`	U	unconditional	PC ← PC + offset23
−8	`CALL`	U	unconditional	ra ← PC + 1 ; PC ← PC + offset23

Branch instructions (BEQ–BGE) use rs1 (field [4-6]) as the register to test, and the offset is relative to the current PC (before increment). BLT/BGT/BLE/BGE compare val(rs1) to 0.

JMPA retains J format because it needs rs1 as the base address register. JMP and CALL use U format for maximum jump range (23 trits ≈ ±4.7 × 10¹⁰ words).

CSR and system group — I format (opcode −7 to −4)¶

Opcode (val)	Mnemonic	Operation
−7	`CSRR`	rd ← CSR[imm17]
−6	`CSRW`	CSR[imm17] ← rs1
−5	`CSRX`	rd ← CSR[imm17] ; CSR[imm17] ← rs1 (atomic read-then-write)
−4	`ECALL`	Synchronous trap: `ECAUSE ← EXC_ECALL_U/H/D` based on `imm17[0]` (Z=user syscall=0, P=hypercall=+1, N=debug trap=+2); call number is read by the handler from `a7` (r17). `imm17[1..16]` are reserved and ignored by the decoder — assemblers emit Z for forward compatibility (see §8.4).

Special group — R format (opcode −3 to +4)¶

Opcode (val)	Mnemonic	Operation
−3	`IRET`	Atomic exception return: PC ← EPC ; STATUS ← ESAVE (new in v0.2)
−2	`TSEL`	R
−1	`NOP`	No operation
0	`HALT`	Halt processor
+1	`TGET`	rd ← trit t[val(rs2)] of rs1 (result is N, Z, or P in a T27 word)
+2	`TSET`	rd ← rs1 with trit t[val(rs2)] set to value encoded in funct[0] (see below)
+3	`TSIGN`	rd ← sign(rs1) : N, Z, or P (as T27: −1, 0, or +1)
+4	`CMP`	FLAGS ← compare(rs1, rs2) — see §5.5

TSET encoding (opcode +2): the value to insert is determined by funct[0]:

funct[0]	Assembler mnemonic	Inserted trit value
N	`TSETN rd, rs1, rs2`	N (−1)
Z	`TSETZ rd, rs1, rs2`	Z (0)
P	`TSETP rd, rs1, rs2`	P (+1)

rs2 provides the trit index (0–26); the value to write comes from funct, not from rs2. The bare mnemonic TSET is accepted by the assembler as an alias for TSETZ (clear a trit).

Absolute value and trit-reduce (opcode +5 to +7)¶

Opcode (val)	Mnemonic	Operation
+5	`TABS`	rd ← \|rs1\| (absolute value)
+6	`TMIN`	rd ← minimum trit of rs1 (fold-min across all 27 trits; result is N, Z, or P as T27)
+7	`TMAX`	rd ← maximum trit of rs1 (fold-max across all 27 trits; result is N, Z, or P as T27)

TMIN / TMAX are trit-reduce operations: they fold across all 27 trit positions and return the extremum as a single-trit value in a T27 register. - TMIN(x) = P if and only if all trits of x are P (all-P test). - TMAX(x) = N if and only if all trits of x are N (all-N test). - After a subsumption check (TIMPL result, req, caps), the pattern TMIN result followed by BGT t0, granted branches if all 27 capabilities are satisfied — no constant needed.

New in v0.3: TSEL, BF, BRT3¶

TSEL — 3-way conditional select (opcode −2, R format)

TSEL rd, rn, rz, rp dispatches based on FLAGS.sign: - FLAGS.sign = N → rd ← rn (rs1 field) - FLAGS.sign = Z → rd ← rz (rs2 field) - FLAGS.sign = P → rd ← rp (funct[0..2] field, register address)

This is the defining ternary-native data instruction. After a CMP, one instruction selects among three registers — binary requires 2 CMOVs or a branch. The R format accommodates 4 register fields: rd (destination), rs1=rn, rs2=rz, funct[0..2]=rp (3 trits at the LST end of funct, i.e. t[13..15] of the instruction word).

Example — clamp to range [lo, hi]:

CMP    val, lo          # FLAGS.sign: N if val < lo, Z if =, P if >
TSEL   t0, lo, val, val # t0 = lo if below, val otherwise
CMP    t0, hi           # FLAGS.sign: N if t0 < hi, Z if =, P if >
TSEL   result, t0, t0, hi  # result = hi if above, t0 otherwise

BF — trit-masked branch on FLAGS (opcode −10, J format)

BF cond, offset20 branches if FLAGS.sign matches the condition mask encoded in the rs1 field [4-6]: - t[4] = P → match if FLAGS.sign = N - t[5] = P → match if FLAGS.sign = Z - t[6] = P → match if FLAGS.sign = P

The branch is taken if any matching trit is set. This encodes all 6 standard comparison branches plus “always” in a single opcode:

Assembler	rs1 mask	Condition
`BFLT`	`P00`	FLAGS.sign = N (less than)
`BFEQ`	`0P0`	FLAGS.sign = Z (equal)
`BFGT`	`00P`	FLAGS.sign = P (greater than)
`BFLE`	`PP0`	FLAGS.sign = N or Z (less or equal)
`BFGE`	`0PP`	FLAGS.sign = Z or P (greater or equal)
`BFNE`	`P0P`	FLAGS.sign = N or P (not equal)

The pattern CMP a, b ; BFLT label replaces SUB t0, a, b ; BLT t0, label, saving a register and an instruction. FLAGS are not modified by BF.

BRT3 — ternary three-way branch (opcode −21, B format)

BRT3 rX, off_z, off_n reads the least significant trit (LST) of register rX and dispatches: - LST(rX) = P → fall through to PC + 1 (no branch penalty) - LST(rX) = Z → PC ← PC + off_z - LST(rX) = N → PC ← PC + off_n

Format B provides two 10-trit signed offsets (±29 524 each). The P-falls-through convention optimizes for the common case: loop bodies execute directly without a branch.

A while loop compiles to:

loop_start:
    ; evaluate condition → rX (P=true, Z=unknown, N=false)
    BRT3  rX, loop_start, loop_exit
    ; fall-through (P) → loop body
    ...
    JMP   loop_start
loop_exit:

Two instructions of overhead; the body executes without any branch. Variants: - while! (optimistic): set off_z = +1 → Z falls through with P. - while? (pessimistic): set off_z = off_n → Z treated as N, exits loop.

FLAGS are not affected by BRT3.

TFP group — R format (opcode +8 to +13)¶

Opcode (val)	Mnemonic	funct	Operation
+8	`FADD`	Z	rd ← rs1 +_f rs2 (T26F addition)
+9	`FSUB`	Z	rd ← rs1 −_f rs2 (T26F subtraction)
+10	`FMUL`	Z	rd ← rs1 ×_f rs2 (T26F multiplication)
+11	`FDIV`	Z	rd ← rs1 ÷_f rs2 (T26F division)
+12	`FCMP`	Z	FLAGS ← float comparison of rs1, rs2 (see §7)
+13	`FCVT`	see below	Integer ↔ T26F conversion (see §7)

FCVT encoding (opcode +13): the conversion mode is selected by funct[0]:

funct[0]	Assembler mnemonic	Operation
Z	`FICVT rd, rs1`	rd ← T26F(int(rs1)) — integer to float
P	`FCVTI rd, rs1`	rd ← int(T26F(rs1)), round to nearest
N	`FCVTIZ rd, rs1`	rd ← int(T26F(rs1)), round toward zero

All TFP instructions use R format. rs2 is ignored for FCVT. The funct field is zero except for FCVT where funct[0] selects the conversion mode, following the same pattern as ADD/ADDS/ADC.

NaR propagation: if any input is NaR, the result is NaR. Additionally: ∞ − ∞ = NaR, 0 × ∞ = NaR, 0 ÷ 0 = NaR.

Division by zero: if rs2 = 0 and rs1 ≠ 0, FDIV produces ∞.

FLAGS after TFP operations (FADD, FSUB, FMUL, FDIV, FCMP) — unified scheme driven by the result’s class:

Result class sign carry

NaR Z N

∞ (saturated) P P

0 (true zero) Z Z

normal sign of result (N/P) Z

For FCMP, the “result” is the ordered difference rs1 −_f rs2: normal ⇒ trichotomy on sign; NaR ⇒ unordered (sign=Z, carry=N), distinct from true equality (sign=Z, carry=Z).

Free operations — these existing instructions work correctly on T26F values: - NEG (opcode −35): trit inversion = Tekum negation (Proposition 3: θ(−t) = −θ(t)) - TABS (opcode +5): absolute value preserved since t[26] = Z is invariant under abs - LOAD / STORE: move 27-trit words without interpretation

See §7 for the T26F format specification.

Result class	`sign`	`carry`
NaR	Z	N
∞ (saturated)	P	P
0 (true zero)	Z	Z
normal	sign of result (N/P)	Z

Vector group — R format (opcode +15 to +22) — new in v0.8¶

Opcode (val)	Mnemonic	funct	Operation
+15	`VADD`	`funct[0]=Z`	vd ← vs1 + vs2, lane-wise saturating to {N,Z,P} (see §16.3)
+15	`VSUB`	`funct[0]=N`	vd ← vs1 − vs2, lane-wise saturating
+16	`VMUL`	Z	vd ← vs1 × vs2, lane-wise (closed in {N,Z,P} — no saturation needed)
+17	`VLOG`	see §16.4	Lane-wise logic: TAND / TOR / TNOT / TIMPL (LMODE-following) or CONS / ACONS (LMODE-bypass) per `funct[0..2]`
+18	`VSEL`	rm in `funct[0..2]`	vd[i] ← vs1[i] if vm[i]=N ; Z if vm[i]=Z ; vs2[i] if vm[i]=P (see §16.4)
+19	`VCMP`	Z	vd[i] ← sign(vs1[i] − vs2[i]) — produces a ternary mask in {N,Z,P} per lane
+20	`VRED`	see §16.4	Reduction vs1 → scalar GPR rd: SUM / SIGN / CONS / LST / MST / AND / OR per `funct[0..2]`
+21	`VPERM`	see §16.4	Lane permutation: rotate / shift / reverse / shuffle per `funct[0..2]`
+22	`VMOVE`	see §16.5	Inter-bank and intra-bank movement: GPR↔v-reg whole-word, v-reg→v-reg, broadcast / insert / extract

All vector opcodes use R format. With the exception of VMOVE (§16.5) and VRED (whose rd is a GPR), rs1, rs2, and rd are all v-reg indices.

No new format is introduced — vectors reuse R format unchanged. The bank is entirely determined by opcode value.

See §16 for the full vector specification (datapath model, lane semantics, ternary advantages, mask conventions, reduction rules, BitNet example).

Reserved extensions (opcode +23 to +40)¶

Opcodes +14 and +23 to +40 are reserved for future extensions: - +14 : reserved TFP (FSQRT, FMA…) - +23 to +24 : reserved Vector v0.9 (FMA, gather/scatter, trybble/tryte lane modes) - +25 to +40 : implementer-defined / custom

5. Balanced ternary arithmetic¶

5.1 Addition¶

Addition follows balanced base-3 tables. The carry is also balanced ternary:

a + b  →  sum | carry
N + N  →   P  |  N     (−1 + −1 = −2 = +1 − 3 → sum=P, carry=N)
N + Z  →   N  |  Z
N + P  →   Z  |  Z     (−1 + +1 = 0)
Z + Z  →   Z  |  Z
Z + P  →   P  |  Z
P + P  →   N  |  P     (+1 + +1 = +2 = −1 + 3 → sum=N, carry=P)

When a carry-in is present (multi-trit addition), the full 3-input sum produces sum and carry-out in the same balanced ternary system.

5.2 Negation (NEG)¶

NEG inverts every trit: P→N, Z→Z, N→P. This is the “free” operation of balanced ternary — it requires no carry chain, just a trit-wise inversion.

5.3 Multiplication¶

Trit-by-trit extended product. MUL returns the low 27 trits (truncated). MULH returns the high 27 trits of the 54-trit product.

For a full 54-trit result: MUL rd_lo, rs1, rs2 then MULH rd_hi, rs1, rs2.

The trit-by-trit product table is:

a × b → product
N × N →  P    (−1 × −1 = +1)
N × Z →  Z    (−1 ×  0 =  0)
N × P →  N    (−1 × +1 = −1)
Z × Z →  Z
Z × P →  Z
P × P →  P    (+1 × +1 = +1)

5.4 Integer division (symmetric Euclidean)¶

Setnex uses symmetric Euclidean division, the natural convention for balanced ternary:

Quotient: q = round_to_nearest(a / b), with ties rounded toward zero.
Remainder: r = a − q × b, satisfying |r| ≤ |b| / 2.

This minimizes the magnitude of the remainder, which aligns with the balanced ternary philosophy of keeping values centered on zero.

When |b| is odd (including all powers of 3), the tie-break case does not occur and the result is unique.

Division by zero triggers the EXC_DIV0 exception.

Comparison with C convention: C truncates toward zero, which can produce larger remainders. The symmetric convention is more natural for balanced ternary and simplifies subsequent computations on the remainder.

5.5 FLAGS register¶

The FLAGS CSR (address 3) is updated after every ALU instruction (ADD through TCMP) and after CMP/CMPI. Both flag trits are fully ternary (N/Z/P), not binary.

Trit	Position	Name	Values
t[0]	`sign`	Result sign / comparison trichotomy	N = negative, Z = zero, P = positive
t[1]	`carry`	Carry-out from the most significant trit	N / Z / P = outgoing carry from the balanced ternary adder, N = underflow, P = overflow
t[2..26]	—	Reserved (always Z)

After ALU instructions (ADD, SUB, MUL, etc.): - sign = sign(result) : N if result < 0, Z if result = 0, P if result > 0. - carry = carry-out from trit position 26 of the adder (meaningful for ADD/SUB; Z for other ALU ops). N if the true result was below −(3²⁷−1)/2 (underflow), P if above +(3²⁷−1)/2 (overflow), Z otherwise.

After CMP rs1, rs2: - sign = sign(val(rs1) − val(rs2)) — this is the trichotomy trit: N if rs1 < rs2, Z if rs1 = rs2, P if rs1 > rs2. - carry reflects the subtraction rs1 − rs2 internally.

After CMPI rs1, imm17: same behavior with the zero-extended immediate in place of rs2.

The trichotomy trit in FLAGS.sign is the native ternary comparison result — it encodes three outcomes (less, equal, greater) in a single trit, which would require two bits in binary.

5.5.1 FLAGS datapath¶

Although FLAGS is accessible as CSR address 3 (for context save/restore on exception), it is not a generic CSR — it has dedicated wiring to the execution units:

ALU ← FLAGS.carry: t[1] of FLAGS feeds the carry-in of the balanced ternary adder. The input is gated by funct[0] on ADD/SUB: funct[0]=Z (ADD/SUB) forces carry-in = Z; funct[0]=P (ADC/SBC) routes FLAGS.carry directly to the adder.
ALU → FLAGS.{sign, carry}: both trits are written back every cycle an ALU instruction (ADD..TCMP, CMP, CMPI) retires. Other instructions leave FLAGS unchanged.
Branch / select unit ← FLAGS.sign: t[0] of FLAGS is routed to the branch decision logic for BF (opcode −10) and to the selection MUX for TSEL (opcode −2). Neither instruction modifies FLAGS.

The CSRR/CSRW path (through the CSR file) is the slow path used only by exception prologue/epilogue (ESAVE, IRET) and by software that needs to inspect or fabricate FLAGS explicitly. Normal use is entirely implicit through the dedicated wires above.

Implementation note: FLAGS needs only 2 trits of storage (t[0], t[1]); t[2..26] are hard-wired to Z and do not require flip-flops.

6. Configurable ternary logic (LMODE)¶

The LMODE CSR (address 2) selects the truth tables for logic instructions TAND, TOR, TNOT, and TIMPL. LMODE holds a T27 value; only trit t[0] is significant.

When LMODE=N, the STATUS.lx trit (t[2]) selects among three sub-modes. This provides 5 logic modes total using only two trits.

6.1 Logic mode map¶

LMODE t[0]	STATUS.lx	Logic	Z means…	Notes
N	N	Heyting (HT)	not provable	Intuitionistic; NOT and IMPL differ from Łukasiewicz
N	Z	Łukasiewicz (Ł)	not yet known	Most tolerant; IMPL = order test
N	P	RM3	both true and false	Paraconsistent (Routley–Meyer); IMPL differs
Z	(ignored)	Kleene (K)	undecidable	Default at reset; neutral; SQL 3VL
P	(ignored)	B3 (Bochvar)	meaningless	Z infectious: any input Z → output Z

At reset, LMODE = 0 and STATUS = 0 → Kleene active without explicit configuration.

STATUS.lx is only significant when LMODE=N. When LMODE=Z or LMODE=P, the lx trit is ignored.

6.2 Mode descriptions¶

Kleene (LMODE=Z) — the default. AND = min, OR = max, NOT = negation, IMPL(a,b) = OR(NOT(a), b). This is the standard three-valued logic used by SQL for NULL handling. Z propagates through some operations but not all.

Łukasiewicz (LMODE=N, STATUS.lx=Z) — shares AND/OR/NOT with Kleene. Differs only on IMPL: IMPL(a,b) = min(P, −a + b + 1). The result is P if and only if a ≤ b, making TIMPL a trit-parallel subsumption test. Most tolerant of indetermination.

Heyting (LMODE=N, STATUS.lx=N) — intuitionistic logic. Shares AND/OR with Kleene/Łukasiewicz but differs on NOT: NOT_HT(Z) = N, NOT_HT(P) = N (only N maps to P). IMPL uses the Heyting algebra residual: IMPL_HT(a,b) = greatest c such that AND(a, c) ≤ b. Most conservative about the unknown state: “not provable” is treated as false.

RM3 (LMODE=N, STATUS.lx=P) — paraconsistent logic (Routley & Meyer). Shares AND/OR/NOT with Kleene/Łukasiewicz. Differs only on IMPL. Z represents “both true and false” — a contradiction that does not explode into arbitrary conclusions. Suitable for reasoning with inconsistent data.

B3 / Bochvar (LMODE=P) — Bochvar’s internal logic (1937). Z is “meaningless” and infectious: any operation with a Z input produces Z. AND, OR, NOT, and IMPL are all affected. On classical inputs (N and P only), B3 reduces to Boolean logic. Most strict: incomplete data produces no conclusion.

Hardware cost of STATUS.lx: When LMODE=Z or LMODE=P, the lx trit is ignored and the datapath is unchanged. When LMODE=N, the TNOT instruction must check STATUS.lx to select between standard NOT (−a) and Heyting NOT. This is a single AND + MUX-2→1 on the NOT output, gated by LMODE=N ∧ STATUS.lx=N. The TIMPL MUX adds one input (RM3) to the existing Łukasiewicz/Heyting selector. Total added cost: negligible.

6.3 LMODE-insensitive operations¶

CONS and ACONS are always evaluated using Kleene semantics, regardless of LMODE. They are arithmetic primitives, not logic operations.

CONS extracts the common trit (agreement → value, disagreement → Z). ACONS extracts the absent trit (agreement → Z, disagreement → the missing trit from {N, Z, P}). Together they form a complete dual pair for balanced ternary arithmetic circuits.

TCMP is also LMODE-insensitive — it is a pure arithmetic comparison.

6.4 Complete truth tables¶

AND¶

Kleene / Łukasiewicz / Heyting / RM3 AND (identical — min):

AND	N	Z	P
N	N	N	N
Z	N	Z	Z
P	N	Z	P

B3 (Bochvar) AND — Z infectious:

AND	N	Z	P
N	N	Z	N
Z	Z	Z	Z
P	N	Z	P

OR¶

Kleene / Łukasiewicz / Heyting / RM3 OR (identical — max):

OR	N	Z	P
N	N	Z	P
Z	Z	Z	P
P	P	P	P

B3 (Bochvar) OR — Z infectious:

OR	N	Z	P
N	N	Z	P
Z	Z	Z	Z
P	P	Z	P

NOT¶

Kleene / Łukasiewicz / RM3 NOT (identical — negation):

a	NOT(a)
N	P
Z	Z
P	N

Heyting NOT:

a	NOT(a)
N	P
Z	N
P	N

B3 (Bochvar) NOT — Z infectious:

a	NOT(a)
N	P
Z	Z
P	N

Note: B3 NOT has the same table as Kleene NOT. The infectious property of B3 manifests in AND, OR, and IMPL, not in NOT (since NOT is unary and NOT(Z) = Z is already the “contaminated” result).

IMPL — all 5 modes are distinct¶

Kleene IMPL: IMPL(a,b) = OR(NOT(a), b) = max(−a, b)

IMPL	N	Z	P
N	P	P	P
Z	Z	Z	P
P	N	Z	P

Łukasiewicz IMPL: IMPL(a,b) = min(P, −a + b + 1)

IMPL	N	Z	P
N	P	P	P
Z	Z	P	P
P	N	Z	P

Heyting IMPL: IMPL(a,b) = greatest c such that AND(a, c) ≤ b

IMPL	N	Z	P
N	P	P	P
Z	N	P	P
P	N	Z	P

RM3 IMPL (paraconsistent):

IMPL	N	Z	P
N	P	P	P
Z	N	Z	P
P	N	N	P

B3 (Bochvar) IMPL: IMPL(a,b) = OR_B3(NOT_B3(a), b) — Z infectious:

IMPL	N	Z	P
N	P	Z	P
Z	Z	Z	Z
P	N	Z	P

All five IMPL tables are distinct. The discriminating cells:

(a, b) Kleene Łukasiewicz Heyting RM3 B3

(Z, N) Z Z N N Z

(Z, Z) Z P P Z Z

(N, Z) P P P P Z

(Z, P) P P P P Z

(P, N) N N N N N

(P, Z) Z Z Z N Z

(a, b)	Kleene	Łukasiewicz	Heyting	RM3	B3
(Z, N)	Z	Z	N	N	Z
(Z, Z)	Z	P	P	Z	Z
(N, Z)	P	P	P	P	Z
(Z, P)	P	P	P	P	Z
(P, N)	N	N	N	N	N
(P, Z)	Z	Z	Z	N	Z

7. Ternary floating-point format (T26F)¶

Setnex uses the Tekum balanced ternary tapered precision format (Hunhold, arXiv:2512.10964) for floating-point arithmetic. The native width is 26 trits (T26F), stored in a 27-trit register with t[26] = Z.

7.1 Why 26 trits¶

Tekum requires an even trit width (the anchor midpoint pattern +-+-…+- has 2k trits). Since the Setnex word is 27 trits (odd), T26F uses the lower 26 trits and reserves t[26] = Z. This preserves full compatibility with NEG, TABS, LOAD/STORE, and integer CMP (for non-NaR values).

7.2 Special values¶

Three patterns are special (all 26 trits identical):

Pattern (t[0..25])	Name	Float value	27-trit register
all Z	Zero	0.0	all Z
all P	Infinity (∞)	+∞	t[26]=Z, t[0..25]=P
all N	Not a Result (NaR)	undefined	t[26]=Z, t[0..25]=N

NaR is the Tekum analogue of IEEE 754 NaN. There is only one infinity (unsigned, as in the real wheel algebra: 1/0 = ∞).

7.3 Anchor and decoding¶

For a non-special T26F value, the anchor is:

anc(t) = |t₂₆| − M

where t₂₆ denotes the lower 26 trits, |·| is balanced ternary absolute value, and M is the 26-trit midpoint pattern +-+-+-+-+-+-+-+-+-+-+-+-+- (13 repetitions of +-).

The anchor trits (big-endian, MST first) are partitioned as:

[regime: 3 trits] [exponent: c trits] [fraction: 26 − c − 3 trits]

Regime (r): signed integer from the 3 most significant anchor trits. r ∈ [−13, +13], though only |r| ≤ 7 encodes a valid value. Anchor patterns with |r| > 7 are reserved encodings and decode to NaR; a conforming TFP unit must not produce them as a result of any arithmetic operation.

Exponent trit count: c = max(0, |r| − 2). Near r = 0, all trits go to fraction (maximum precision). At extreme regimes, more trits go to exponent (maximum range). This is the tapered precision property.

Exponent value:

e = int(exponent_trits) + sign(r) × BIAS[|r|]

| |r| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |-----|—|—|—|—|—|—|—|—| | BIAS | 0 | 1 | 2 | 4 | 10 | 28 | 82 | 244 | | c (exponent trits) | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | | p (fraction trits) | 23 | 23 | 23 | 22 | 21 | 20 | 19 | 18 |

Fraction value: f = Σ trit_i × 3^(−i) for i = 1..p, where p = 26 − c − 3 is the fraction trit count. f ∈ (−0.5, +0.5).

Decoded value:

θ(t) = sign × (1 + f) × 3^e

where sign is P (+1) if the T26F value is positive, N (−1) if negative.

7.4 Precision and range¶

At regime r = 0: c = 0, all 23 fraction trits available → precision ≈ 23 × log₁₀(3) ≈ 11 decimal digits, exponent = 0 (values near 1.0).

At |r| = 7: 18 fraction trits (≈ 8.6 decimal digits), exponent range ±(244 + (3⁵−1)/2) = ±365 powers of 3 (≈ ±174 decimal decades).

7.5 Key properties¶

Monotonicity (Proposition 4): for non-special values, integer ordering on the raw 26-trit word corresponds to numerical ordering of the decoded float. The existing CMP instruction gives correct ordering for normal T26F values — but FCMP is required for proper NaR handling (NaR = all-N would otherwise compare as less than everything instead of unordered).

Free negation (Proposition 3): θ(−t) = −θ(t). Trit-by-trit inversion (NEG) negates the float. Since flip(Z) = Z, this works on the full 27-trit register with t[26] = Z.

Truncation is rounding: reducing precision by discarding least-significant fraction trits is equivalent to rounding, with no carry propagation needed.

7.6 T26F in a 27-trit register¶

The convention t[26] = Z ensures: - T26F values occupy the integer range [−(3²⁶−1)/2, +(3²⁶−1)/2], a strict subset of the T27 range. - LOAD/STORE transfer T26F values correctly (full 27-trit word move). - NEG and TABS operate correctly (Z is invariant under trit inversion and absolute value). - Software can distinguish integer and float values by testing t[26] (Z → possible float, non-Z → integer exceeding T26 range).

Integer operations on T26F values produce undefined float results. TFP instructions on non-T26F integer values produce undefined results. Type discipline is the programmer’s responsibility.

8. Exception handling¶

8.1 Exception causes¶

ECAUSE code	Name	Trigger	`ETVAL` contents
−13	`EXC_DIV0`	Division by zero	0 (unused)
−12	`EXC_ALIGN`	Misaligned memory access (future)	Faulting effective address
−11	`EXC_FAULT`	Invalid address	Faulting effective address
−10	`EXC_ILLEGAL`	Undefined opcode, reserved instruction, or kernel-only CSR accessed from user mode	Raw 27-trit instruction word
−9	`EXC_PERM_R`	MPU denies a LOAD (new in v0.7, see §9)	Faulting effective address
−8	`EXC_PERM_W`	MPU denies a STORE (new in v0.7)	Faulting effective address
−7	`EXC_PERM_X`	MPU denies an instruction fetch (new in v0.7)	Faulting PC (same as `EPC`)
0	`EXC_ECALL_U`	System call — user flavor (`ECALL` with `imm17[0] = Z`)	0 (syscall number is in `a7`, see §8.4)
+1	`EXC_ECALL_H`	Hypercall (`ECALL` with `imm17[0] = P`)	0 (hypercall number is in `a7`, see §8.4)
+2	`EXC_ECALL_D`	Debug trap (`ECALL` with `imm17[0] = N`)	0 (debug reason is in `a7`, see §8.4)
+10	`EXC_OVERFLOW`	Reserved; not raised in v0.7 (awaits a STATUS enable trit in a future revision)	0 (unused)
+20	`IRQ_0`	Asynchronous interrupt, line 0 (new in v0.7, see §10)	0 (unused)
+21	`IRQ_1`	Asynchronous interrupt, line 1	0 (unused)
+22	`IRQ_2`	Asynchronous interrupt, line 2	0 (unused)
+23	`IRQ_3`	Asynchronous interrupt, line 3	0 (unused)
+24	`IRQ_4`	Asynchronous interrupt, line 4	0 (unused)
+25	`IRQ_5`	Asynchronous interrupt, line 5	0 (unused)
+26	`IRQ_6`	Asynchronous interrupt, line 6	0 (unused)
+27	`IRQ_7`	Asynchronous interrupt, line 7	0 (unused)
+28	`IRQ_8`	Asynchronous interrupt, line 8	0 (unused)

The three ECALL flavors share a single handler entry point (EVEC). Negative codes denote involuntary faults; positive codes in [0, +10] denote deliberate synchronous traps; positive codes in [+20, +28] denote asynchronous interrupts (§10). Code 0 is preserved for EXC_ECALL_U so that a v0.5 binary (whose imm17 is always Z) enters the handler with ECAUSE = 0 exactly as before. See §8.4 for the flavor-tag encoding.

Async interrupts (IRQ_0..IRQ_8) reuse the exception entry path of §8.2 unchanged — they differ from synchronous exceptions only in trigger (external line) and in the saved EPC, which points at the next instruction the CPU would have executed rather than a faulting one. See §10 for dispatch rules.

A triple-fault condition (a synchronous fault raised while STATUS.depth = N, i.e. both exception frames already in use) does not produce an ECAUSE code — it is not observable by any handler. Instead it triggers an immediate machine-check reset (§8.2).

8.2 Exception entry sequence¶

The bank selected for the save is determined by STATUS.depth at the moment the fault is taken:

`STATUS.depth` on entry	Save target	New `depth`
Z (0 frames active)	main bank (`EPC`, `ESAVE`, `ECAUSE`, `ETVAL`)	P
P (1 frame active)	frame 2 bank (`EPC2`, `ESAVE2`, `ECAUSE2`, `ETVAL2`)	N
N (2 frames active)	—	— (machine-check reset; see below)

Case 1 — outer entry (depth = Z):

ESAVE ← STATUS (save current processor status, including depth = Z)
EPC ← PC (address of the faulting instruction)
ECAUSE ← code
ETVAL ← trap value (per §8.1; 0 if unused)
STATUS.mode ← N (switch to kernel mode)
STATUS.ie ← N (disable interrupts)
STATUS.depth ← P (one frame now active)
PC ← EVEC

Case 2 — nested entry (depth = P):

ESAVE2 ← STATUS (save current status, including depth = P)
EPC2 ← PC
ECAUSE2 ← code
ETVAL2 ← trap value
STATUS.depth ← N (two frames now active); mode and ie are unchanged (already N)
PC ← EVEC

Steps 1–7 (case 1) or 1–6 (case 2) are performed atomically — no further exception may be taken between them. The main-bank CSRs are untouched by a case-2 entry, so the outer handler’s return context is preserved.

Case 3 — triple fault (depth = N). If a synchronous fault occurs while both frames are already in use, no save is possible. The processor performs a machine-check reset: all CSRs are cleared to zero and PC ← 0, as if from power-on. No handler is invoked; ECAUSE is not written. Correct kernel code reaches this state only under a genuine hardware defect or runaway condition; a two-level-deep fault chain from user code alone is handled cleanly by case 2.

The handler reads EPC / EPC2, ECAUSE / ECAUSE2, and — when relevant — ETVAL / ETVAL2 via CSRR (selecting the bank by inspecting STATUS.depth on entry), processes the exception, then returns via IRET.

Rationale — why the outer handler is not re-entered on a nested fault. EVEC is shared between both frames, so the nested entry jumps to the same vector. The handler decides, by reading STATUS.depth first, whether it is handling an outer or a nested frame and uses the corresponding CSR bank. Duplicating EVEC would have given each frame its own vector at no functional gain, since the depth is already visible in STATUS.

8.3 Exception return (IRET)¶

IRET restores from the bank selected by the current STATUS.depth:

`STATUS.depth` on IRET	Restore source	Effect on `depth`
P (1 frame active)	main bank (`EPC`, `ESAVE`)	→ Z (via `STATUS ← ESAVE`)
N (2 frames active)	frame 2 bank (`EPC2`, `ESAVE2`)	→ P (via `STATUS ← ESAVE2`)
Z (no frame active)	—	undefined (illegal IRET — reserved for future trap)

Case 1 — return from outer frame (depth = P):

PC ← EPC
STATUS ← ESAVE

Case 2 — return from nested frame (depth = N):

PC ← EPC2
STATUS ← ESAVE2

Both writes are performed atomically — no further exception can be taken between them. This prevents the STATUS/PC corruption that was possible in v0.1’s two-instruction sequence.

Because ESAVE / ESAVE2 hold the full prior STATUS (with the depth trit captured as Z / P respectively at entry time), the single STATUS ← ESAVE* write simultaneously restores mode, ie, lx, and decrements depth to its pre-entry value. No separate depth-decrement step is needed.

IRET uses opcode −3 (R format). The rd, rs1, rs2, and funct fields are ignored and should be set to zero. Executing IRET while depth = Z is reserved; v0.6 leaves the behavior undefined, and a future revision may assign EXC_ILLEGAL.

8.4 Syscall dispatch convention¶

The ECALL instruction carries a flavor tag in imm17[0] (the least-significant trit of the I-format immediate). The decoder maps the tag to one of three ECAUSE codes; all three flavors share a single EVEC.

`imm17[0]`	Flavor	`ECAUSE` on entry	Intended use
Z	user syscall	`EXC_ECALL_U` = 0	Ordinary user→kernel transition
P	hypercall	`EXC_ECALL_H` = +1	Guest kernel → hypervisor transition (when a hypervisor is present)
N	debug trap	`EXC_ECALL_D` = +2	Breakpoint / debugger synchronous trap

imm17[1..16] are reserved. The decoder ignores them — only imm17[0] carries the flavor tag, so non-zero upper trits are silently tolerated (this keeps the decoder branch-free). Assemblers emit Z for forward compatibility and encode the flavor via dedicated mnemonics (ECALL, HCALL, DBGBRK — see §11.4) rather than through explicit immediate operands.

Backward compatibility. A v0.5 binary encodes every ECALL with imm17 = Z, which maps to imm17[0] = Z → EXC_ECALL_U = 0. This is the exact cause code v0.5 produced, so an unmodified v0.6 handler that still dispatches only on ECAUSE = 0 continues to work.

User-side register contract at the point of ECALL (all flavors):

Register	Role
`a7` (r17)	Call number (syscall / hypercall / debug reason, per flavor)
`a0`–`a6` (r10–r16)	Arguments 1–7
`a0` (r10)	Return value, written by the handler, visible after resume

Kernel-side handler contract:

On entry, ECAUSE ∈ {0, +1, +2} identifies the flavor; EPC points at the ECALL instruction itself.
Dispatch on ECAUSE to the appropriate table (syscall / hypercall / debug). No re-fetch of EPC and no decoding of imm17 is required.
Read a7 for the call number within the selected table; read a0–a6 for arguments; execute; write the result to a0.
Advance EPC by one word so the resumed program continues after ECALL, then IRET:

   CSRR   t0, EPC
   ADDI   t0, t0, 1
   CSRW   EPC, t0
   IRET

For a nested flavor trap (e.g. a debug breakpoint triggered while a syscall handler is running), the same epilogue applies to EPC2 instead — the handler uses the frame-2 bank (see §8.3).

Register preservation across ECALL is an OS-level policy, not an ISA mandate. The recommended baseline is: the kernel preserves all callee-saved registers (s0–s9, sp, ra) and the argument registers a1–a6 (regardless of how many the specific call actually reads); it overwrites a0 with the return value and is free to clobber a7 and the temporaries t0–t3.

Rationale — why a7 rather than imm17 for the call number. Placing the call number in a register lets it be computed at runtime (libc wrappers, indirect dispatch tables) and keeps ECALL a pure trap with no decoded payload beyond the flavor tag. The handler dispatches on a value it already has in a GPR, avoiding a re-fetch and re-decode of the instruction at EPC.

Rationale — why a separate flavor tag at all. Hypercalls and debug traps have different trust and privilege semantics from ordinary syscalls; routing them through distinct ECAUSE values lets the handler pick the right dispatch table in one step, without mixing them in a single numeric space shared with OS syscall numbers (which would force each OS to carve out reserved ranges for hypercalls/debug).

8.5 Exception trap value (ETVAL, ETVAL2)¶

ETVAL (CSR address 9, MST-first +00) and its frame-2 counterpart ETVAL2 (CSR address 13, MST-first +++) are T27 words written by the processor on every exception entry (§8.2). They carry exception-specific context that does not fit into the 27 values available in ECAUSE / ECAUSE2. The per-exception semantics is given by the ETVAL contents column of §8.1.

Exception	`ETVAL` (or `ETVAL2`) contents
`EXC_FAULT`, `EXC_ALIGN`	The effective address computed by the faulting `LOAD` / `STORE` (i.e. `rs1 + imm17` of the offending access)
`EXC_ILLEGAL`	The raw 27-trit instruction word the decoder rejected
`EXC_ECALL_U`, `EXC_ECALL_H`, `EXC_ECALL_D`	0 — the call number is passed in `a7`, see §8.4
`EXC_DIV0`, `EXC_OVERFLOW`	0 — no auxiliary value needed

ETVAL is written on a case-1 (outer) entry and ETVAL2 on a case-2 (nested) entry; a nested fault never overwrites the outer handler’s ETVAL. An outer handler can therefore safely read ETVAL once at the top and rely on that value remaining valid across any depth-1 nested fault it may subsequently incur; a nested handler reads ETVAL2 by the same rule.

Rationale — dedicated CSR instead of re-decoding EPC. Reconstructing the faulting address or the offending opcode from EPC requires an instruction fetch, which may itself fault (self-modifying code, paged-out text section). Exposing the value directly in a CSR decouples the handler from whatever state the instruction stream is in. Matches the RISC-V mtval design.

9. Memory Protection Unit (MPU)¶

9.1 Role¶

The Memory Protection Unit controls which memory addresses are accessible from each privilege mode and for which access type (read, write, instruction fetch). It converts the §2.3 address-space convention into hardware-enforceable boundaries: a user-mode LOAD or STORE targeting an address without user permission raises EXC_PERM_R or EXC_PERM_W, and an instruction fetch without execute permission raises EXC_PERM_X (§8.1).

The MPU performs no address translation — the address presented by the pipeline is the address delivered to memory. It decides, on each access, whether that access may proceed. This is the “PMP” style of RISC-V, not the “MMU” style.

9.2 Region model¶

The MPU holds nine region descriptors, indexed 0..8. Each descriptor specifies:

A base address (T27 word address).
A size exponent n — the region covers 3ⁿ consecutive 27-trit words starting at base. Natural alignment required (base mod 3ⁿ = 0); see §9.7.
Three permission trits (R, W, X), each carrying a ternary privilege level: N = no access, Z = kernel only, P = user + kernel.
A valid flag.

Regions are naturally aligned power-of-three (NAPO3) blocks. The descriptor bank is internal processor state, not memory-mapped; software manipulates it through three CSRs (§9.3).

Why no “whole address space” region. The address space is symmetric ([−(3²⁷−1)/2, +(3²⁷−1)/2]) while a NAPO3 region is a one-sided half-open interval [base, base + 3ⁿ); there is no single well-aligned (base, n) pair that covers both halves without wrap. To blanket the full range, either rely on the kernel-mode no-match default (§9.5 step 4), or program two top-size regions — one anchored at a negative base and one at a non-negative base.

9.3 Indirect CSR access¶

Three CSRs form the programming interface:

CSR	Addr	Role
`MPU_SELECT`	−1	Index of the region targeted by subsequent `MPU_BASE` / `MPU_CFG` operations. Legal values: 0..8.
`MPU_BASE`	−2	Reads or writes the base field of the selected region.
`MPU_CFG`	−3	Reads or writes the config field (size + permissions + valid) of the selected region; see §9.4.

Programming region i is therefore:

   LI     t0, i
   CSRW   MPU_SELECT, t0
   CSRW   MPU_BASE, base_value
   CSRW   MPU_CFG, cfg_value

All three CSRs are kernel-only: a user-mode CSRR / CSRW targeting any of them raises EXC_ILLEGAL. Writing a value outside [0, 8] to MPU_SELECT, or a malformed MPU_CFG (reserved trits non-zero, size out of range), also raises EXC_ILLEGAL, with the instruction word in ETVAL.

Rationale — indirection rather than direct mapping. A direct mapping would require 18 CSR slots for 9 × (base, cfg), which does not fit in the 3-trit CSR address space. Indirection keeps the CSR cost constant and lets the region count grow in a future revision without touching the CSR map.

9.4 MPU_CFG layout¶

MPU_CFG packs the non-base fields into a single 27-trit word:

Trit(s)	Field	Values
t[0]	`perm_R`	N = no read, Z = kernel read only, P = user + kernel read
t[1]	`perm_W`	N = no write, Z = kernel write only, P = user + kernel write
t[2]	`perm_X`	N = no execute, Z = kernel execute only, P = user + kernel execute
t[3..6]	`size`	Size exponent `n` (4-trit T4 integer); region covers `3ⁿ` words. Legal range: `n ∈ [0, 26]`. `n = 0` → one word; `n = 26` → `3²⁶` consecutive words (slightly less than half the address space). Values `n < 0` or `n ≥ 27` written via `CSRW MPU_CFG` raise `EXC_ILLEGAL`.
t[7]	`valid`	P = active, N = inactive, Z = reserved (descriptor treated as inactive)
t[8..26]	—	Reserved (must be Z on write; read as 0)

9.5 Matching and permission check¶

On every memory access — LOAD, STORE, or instruction fetch — with effective address A in current mode M, the MPU performs:

Scan the 9 regions. Region i matches if valid(i) = P and A ∈ [base(i), base(i) + 3^size(i)).
Select. If multiple regions match, the one with the lowest index wins (region 0 has highest priority).
Evaluate the permission trit for the access type (perm_R for LOAD, perm_W for STORE, perm_X for fetch) against mode M:

Permission trit	Kernel (`mode = N`)	User (`mode = P`)
N (none)	Deny — fault	Deny — fault
Z (kernel only)	Allow	Deny — fault
P (user + kernel)	Allow	Allow

No-match default: - Kernel mode: allow (default-permit). - User mode: deny (default-deny) — raises EXC_PERM_*.
On deny, the access is suppressed and one of EXC_PERM_R, EXC_PERM_W, EXC_PERM_X is raised via the §8.2 entry sequence. ETVAL holds the faulting effective address (or the faulting PC, identical to EPC, for EXC_PERM_X).

Rationale — asymmetric defaults. In kernel mode the MPU acts as a blacklist (poison regions fault even for kernel); in user mode it acts as a whitelist (user can only reach addresses explicitly granted). This mirrors RISC-V PMP semantics and fits the v0.6 model where the kernel is trusted by default and user code must be explicitly admitted.

9.6 Poison regions and defensive use¶

A region at low index (0 or 1) with perm_R = perm_W = perm_X = N and valid = P is a poison region: any access — even kernel — faults. Typical uses:

DMA buffers owned by a peripheral (CPU accesses race the device).
Guard words between legitimate regions to catch buffer overruns with a precise fault.
Hardware-reserved windows (fuses, debug taps, memory-mapped configuration).
Firmware regions temporarily locked during self-update.

Because lowest index wins (§9.5 step 2), a poison region at index 0 overrides any permissive region at index 1+, even for the kernel.

9.7 NAPO3 alignment¶

Natural alignment requires base(i) mod 3^size(i) = 0. The hardware does not check alignment at CSRW MPU_BASE time. A misaligned base is legal syntactically; the containment test in §9.5 step 1 uses the literal integer interval [base, base + 3^n) and may match addresses the author did not intend. Alignment is software responsibility — typically an assembler or linker computes region bases as multiples of 3ⁿ for the chosen n, and a misaligned base is a programming error, not a legitimate runtime state.

9.8 Reset state¶

At reset, every region descriptor has valid = N. The MPU is effectively disabled: no regions match, kernel-mode accesses proceed under the no-match default of §9.5 step 4, and the boot firmware (which runs in kernel mode per §2.4) executes unimpeded. The boot firmware is expected to program whichever regions the system needs before switching to user mode.

Typical boot sequence:

Kernel programs one or more regions granting user-mode perm_X = P over the user text segment, perm_R = P / perm_W = P over user data and stack.
Kernel programs defensive regions (poison windows, kernel-only overlays) if desired.
Kernel switches STATUS.mode to P via an IRET that restores a user-mode ESAVE.

9.9 Interactions¶

Nested exceptions (§8.2): an MPU fault taken while depth = P uses the frame-2 bank normally. An MPU fault while depth = N is a triple fault → machine-check reset. This matters when the outer handler itself touches an address that lacks kernel permission — for example, dereferencing a user-supplied pointer that happens to land in a kernel-only or poison region.
ETVAL: always the faulting address for EXC_PERM_R / EXC_PERM_W; the faulting PC (= EPC) for EXC_PERM_X. The access type is carried by the ECAUSE code, not by ETVAL.
No new opcode: all MPU programming uses existing CSRR / CSRW.

10. Asynchronous interrupts¶

10.1 Role¶

Asynchronous interrupts deliver external events (timer tick, peripheral completion, incoming byte) to the CPU independently of the instruction stream. Unlike a synchronous exception (§8), an interrupt is not caused by the instruction in flight; it arrives between instructions, driven by a signal outside the pipeline.

Interrupts are Setnex’s only source of preemption: without them, a user program that does not voluntarily ECALL keeps the CPU forever. A timer IRQ lets the kernel reclaim the CPU at a bounded cadence — the foundation of preemptive scheduling.

10.2 Controller model¶

The controller presents nine IRQ lines, indexed k ∈ {0..8}. For each line it maintains three software-visible state elements:

Pending: the external source has requested service. The controller samples the external line each cycle and sets IPENDING[k] = P while the line is asserted, Z while idle. Software may also write it (see §10.3, §10.6).
Level-sensitive devices hold the line asserted until the handler acknowledges (typically by reading/writing an MMIO status register on the peripheral); the line then deasserts and IPENDING[k] falls back to Z on the next sample.
Edge-triggered devices must latch their “event pending” bit inside the peripheral and hold the IRQ line asserted from the edge event until the handler acks. The IRQ controller itself does not latch edges — a pulse shorter than one sample interval can be missed. This pushes edge-latching into the peripheral, where it belongs (the peripheral knows when its event is served; the controller does not).
Enable: per-line mask.
Priority: per-line priority level.

A line is eligible when pending and enabled. The controller selects the eligible line with the highest priority; ties are broken by lowest line number.

10.3 CSR layouts¶

IPENDING (addr −4) — pending bitvector

Trit	Meaning
t[k] for k ∈ [0, 8]	Line `k`: P = pending, Z = idle, N = reserved
t[9..26]	Reserved (read as 0)

Writing Z to a trit clears a pending state that was set by software or by a now-deasserted external source; if the external source is still asserting, the controller re-raises the trit on the next cycle. Writing P to a trit whose external source is not asserting synthesizes a software-generated interrupt (implementation-defined latency). Writing N has no effect — the trit retains its previous value. The N value is reserved for a future semantic (e.g. sticky / edge-latched pending) and kept unassigned to preserve forward compatibility.

IENABLE (addr −5) — enable mask

Same layout as IPENDING: t[k] = P enables line k, Z disables it. Writing N has no effect — the trit retains its previous value (reserved, as for IPENDING).

IPRIORITY (addr −6) — per-line priority

Nine 3-trit fields pack exactly into 27 trits:

Trit field	Line	Priority
t[0..2]	0	T3 integer ∈ [−13, +13]; higher = higher priority
t[3..5]	1	“
t[6..8]	2	“
t[9..11]	3	“
t[12..14]	4	“
t[15..17]	5	“
t[18..20]	6	“
t[21..23]	7	“
t[24..26]	8	“

At reset all priority fields are 0; arbitration then reduces to lowest-line-number wins.

10.4 Dispatch rule¶

Between instructions, the CPU evaluates:

if STATUS.ie = P
   and STATUS.depth ≠ N                 (no save slot available → no async entry)
   and ∃ k : IPENDING[k] = P AND IENABLE[k] = P:
      k* ← argmax_k (IPRIORITY[k]), lowest k on ties
      take interrupt IRQ_k*              (see §10.5)

The check happens at each instruction-commit boundary. An interrupt cannot preempt a single instruction mid-way; it waits for the next commit point.

10.5 Entry, handler, return¶

Entry reuses §8.2 verbatim. For a case-1 entry (user code running, depth = Z, line k* wins arbitration):

ESAVE ← STATUS
EPC ← PC_next — address of the next instruction that would have executed (not a faulting one)
ECAUSE ← +20 + k*
ETVAL ← 0
STATUS.mode ← N, STATUS.ie ← N, STATUS.depth ← P
PC ← EVEC

A case-2 (nested) entry uses the frame-2 bank (EPC2, ESAVE2, ECAUSE2, ETVAL2) as in §8.2.

Handler boilerplate. The handler must select the correct exception bank based on STATUS.depth before reading ECAUSE / ECAUSE2:

   LI     t2, 3                    ; trit index for depth
   CSRR   t0, STATUS
   TGET   t1, t0, t2               ; LST(t1) = depth trit (P = outer, N = nested)
   BRT3   t1, impossible, nested   ; P → fall-through (outer frame),
                                   ; Z → impossible (no frame active),
                                   ; N → nested (frame 2)
   ; --- outer frame (depth = P): cause/data in main bank ---
   CSRR   t0, ECAUSE
   JMP    dispatch
nested:
   ; --- nested frame (depth = N): cause/data in frame-2 bank ---
   CSRR   t0, ECAUSE2
dispatch:
   ADDI   t1, t0, -20              ; t1 = line number if this is an IRQ
   ; ... dispatch on t1 to per-line service routine ...
   ; service the peripheral, quiesce the line
   IRET
impossible:
   ; depth = Z inside a handler is a spec violation — halt for diagnosis
   HALT

IRET restores STATUS (which brings ie back to its pre-entry value, typically P) and jumps to the bank-appropriate EPC (see §8.3) — the instruction that was pending when the interrupt fired.

The same depth-select prologue applies to every exception handler, not only IRQ: a synchronous-fault handler that reads ETVAL / ECAUSE must branch through the same STATUS.depth test before selecting the main or frame-2 bank. Real kernels typically factor it into a single shared trampoline.

10.6 Acknowledgment¶

The handler must quiesce the source before IRET, typically by accessing a status or completion register on the peripheral. If the external line remains asserted on return, the controller re-raises IPENDING[k] and the interrupt re-fires immediately after ie goes back to P. Software may also clear IPENDING[k] directly by CSRW IPENDING with Z at position k, but a still-asserting external source will re-set the trit on the next cycle.

10.7 Masking¶

Interrupt delivery is gated by three independent mechanisms, all of which must allow it:

Global enable: STATUS.ie = P. Any other value blocks all IRQs. Set to N automatically on any exception entry (§8.2).
Per-line enable: IENABLE[k] = P.
Frame-depth guard: STATUS.depth ≠ N. When both frames are in use, no save slot is available and async entry is suppressed until IRET frees a frame. This is automatic and cannot be overridden.

A synchronous exception entry sets ie ← N per v0.6 §8.2, so an IRQ cannot preempt a fresh sync handler. A handler that wants low-latency nesting (IRQs accepted during a long syscall) must re-enable ie explicitly after saving whatever state it cares about — a subsequent IRQ would then push to frame 2.

10.8 Priority arbitration¶

Priorities are software-controlled 3-trit fields per line ([−13, +13]). The controller picks the eligible line with the maximum priority value; ties go to the lowest line number. Negative priorities are legal — they rank below default-zero lines without fully disabling them.

Rationale — software-controlled priority rather than line-hardwired. Hardwiring priority to the line number (line 0 always wins) is simpler but gives no knob to re-rank the timer below an urgent disk IRQ. A per-line 3-trit field costs exactly one CSR for all 9 lines and subsumes the hardwired case (all priorities = 0 → line number arbitrates).

11. Calling convention (ABI)¶

11.1 Argument passing¶

Arguments 1–7: a0–a6 (r10–r16)
Additional arguments: pushed on the stack (decreasing addresses from sp)
Return values: a0 (single), a0+a1 (pair)
a7 (r17) carries the syscall number on ECALL (§8.4). Outside the syscall path it is an ordinary caller-saved register, available as a scratch temporary or as an 8th argument when caller and callee agree on such an extension.

11.2 Callee-saved registers¶

s0–s9 (r8–r9, r18–r25), sp (r2), ra (r1).

Caller-saved (may be clobbered across a call): t0–t3 (r5–r7, r26), a0–a7 (r10–r17).

11.3 Stack¶

The stack grows toward negative addresses. sp points to the top of the stack (last valid word). sp must always be word-aligned — since the address space is word-addressed (§2.3), this is trivially satisfied and imposes no additional constraint on the compiler.

Frame layout. A function that needs to save ra and the caller’s s0 (the standard non-leaf case) allocates N ≥ 2 words and lays out the frame as follows (high addresses at the top):

 high addr   ┌─────────────┐ ← (caller's sp = this frame's s0)
 sp+(N−1)   │  saved ra    │
 sp+(N−2)   │  saved s0    │
 sp+(N−3)   │  local[0]    │
   …        │      …       │
 sp+0       │  local[N−3]  │ ← sp
 low addr   └─────────────┘

Standard prologue — allocates the frame first so that at no point does sp transiently reference an unallocated region; then saves ra and the old s0 into the reserved slots; finally installs the new frame pointer:

  ADDI   sp, sp, -N        # allocate frame (N = locals + 2)
  STORE  ra, sp, N-1       # save return address at top of frame
  STORE  s0, sp, N-2       # save caller's frame pointer
  ADDI   s0, sp, N         # new s0 = caller's sp

Standard epilogue — mirror of the prologue:

  LOAD   ra, sp, N-1       # restore return address
  LOAD   s0, sp, N-2       # restore caller's frame pointer
  ADDI   sp, sp, N         # deallocate frame
  RET

A leaf function that makes no further calls and does not use s0 may skip saving ra and s0 entirely, reducing the prologue to a single ADDI sp, sp, -N (or omitting it if no locals are spilled).

11.4 Pseudo-instructions (assembler)¶

Pseudo	Expansion	Notes
`RET`	`JMPA ra, 0`	Return from subroutine
`MOV rd, rs`	`ADD rd, rs, zero`	Register copy
`NEG rd, rs`	native `NEG rd, rs` (R format, rs2/funct ignored)	Balanced ternary negation
`CALL label`	`CALL offset23(label)`	Compute offset from PC
`LI rd, imm`	`LI` if fits imm17; else `LUI` + `ADDI`	Load arbitrary immediate
`TSET rd, rs1, rs2`	`TSETZ rd, rs1, rs2`	Alias: clear trit to Z
`TNIMPL rd, a, b`	`TNOT t0, b` then `TAND rd, a, t0`	Non-implication: a AND NOT b
`TREIMPL rd, a, b`	`TIMPL rd, b, a`	Reverse implication: b ⇒ a
`NOT rd, rs`	`TNOT rd, rs`	Mnemonic alias for clarity
`BFLT label`	`BF P00, label`	Branch if FLAGS.sign = N (less than)
`BFEQ label`	`BF 0P0, label`	Branch if FLAGS.sign = Z (equal)
`BFGT label`	`BF 00P, label`	Branch if FLAGS.sign = P (greater than)
`BFLE label`	`BF PP0, label`	Branch if FLAGS.sign ≤ Z (less or equal)
`BFGE label`	`BF 0PP, label`	Branch if FLAGS.sign ≥ Z (greater or equal)
`BFNE label`	`BF P0P, label`	Branch if FLAGS.sign ≠ Z (not equal)
`ECALL`	`ECALL imm17=0` (i.e. `imm17[0] = Z`)	User syscall flavor (`EXC_ECALL_U`); v0.5-compatible default encoding
`HCALL`	`ECALL imm17[0] = P`, `imm17[1..16] = Z`	Hypercall flavor (`EXC_ECALL_H`); new in v0.6
`DBGBRK`	`ECALL imm17[0] = N`, `imm17[1..16] = Z`	Debug-trap flavor (`EXC_ECALL_D`); new in v0.6

11.5 Syscall calling convention¶

Distinct from the standard function call convention of §11.1. When user code invokes a kernel service via ECALL:

Syscall number: a7 (r17).
Arguments 1–7: a0–a6 (r10–r16). Calls requiring more than 7 arguments pass the overflow on the stack following §11.1.
Return value: a0 (r10). By convention, a non-negative a0 is a success result and a negative a0 (sign(a0) = N) is a balanced-ternary error code. The convention is a library-level contract, not enforced by the ISA.

See §8.4 for the handler-side contract and the recommended register-preservation policy.

A libc-style wrapper setting the syscall number from a symbolic constant:

; write(fd, buf, len) — fd in a0, buf in a1, len in a2 on entry
write:
    LI     a7, SYS_WRITE      ; syscall number → r17
    ECALL
    RET                        ; a0 now holds the syscall result

A pass-through wrapper (the number is already in a7):

; long syscall(long number, long arg0, ..., long arg6)
; number already in a7, args already in a0..a6
syscall:
    ECALL
    RET

12. Reference encoding (textual representation)¶

Trits are written using the -/0/+ convention.

In memory layout (LST-first): t[0] is stored and written first. Instruction encoding diagrams use this convention.

In human-readable display (MST-first): the most significant trit is written leftmost, as in ordinary number notation. Register addresses and integer literals use this convention.

Each convention is explicitly labeled.

Example — ADD r3, r1, r2 (R format)

Field values: - opcode ADD = −40 = enc(−40, 4): −40 ÷ 3 → q = −13, r = −1 → t[0] = N; −13 ÷ 3 → q = −4, r = −1 → t[1] = N; −4 ÷ 3 → q = −1, r = −1 → t[2] = N; −1 ÷ 3 → q = 0, r = −1 → t[3] = N. Result: ---- (LST-first). Verify: −1 −3 −9 −27 = −40 ✓

rd = r3 = enc(3, 3): 3 ÷ 3 → q=1, r=0 → t[0]=Z; 1 ÷ 3 → q=0, r=1 → t[1]=P; t[2]=Z → LST-first: 0+0
rs1 = r1 = enc(1, 3): t[0]=P, t[1]=Z, t[2]=Z → LST-first: +00
rs2 = r2 = enc(2, 3): 2 ÷ 3 → q=1, r=−1 → t[0]=N; 1 ÷ 3 → q=0, r=1 → t[1]=P; t[2]=Z → LST-first: -+0
funct = 0 (14 trits of Z)

LST-first memory layout:

t[0-3]  t[4-6]  t[7-9]  t[10-12]  t[13-26]
----    0+0     +00     -+0       00000000000000

Full 27-trit word (LST-first): ----0+0+00-+000000000000000

The opcode sits at t[0]–t[3] — the decoder starts working as soon as the first trits arrive.

13. Architectural summary¶

Parameter	Value
Word width	27 trits
General-purpose registers	27 (r0–r26), T3 address
Vector registers (new in v0.8)	27 (v0–v26), T3 address, 27 trits each — bank selected by opcode
CSR registers	27 addressable (T3), 19 defined
Instruction width	27 trits (fixed)
Instruction formats	5 (R, I, J, U, B)
Opcode	4 trits (81 values, 60 used in v0.8)
Address space	±(3²⁷−1)/2 words ≈ ±3.6 × 10¹²
Max immediate (I format)	17 trits ≈ ±64 million
Max branch offset (J format)	20 trits ≈ ±1.7 billion
Max jump offset (U format)	23 trits ≈ ±4.7 × 10¹⁰
Logic modes	5: Kleene (default), Łukasiewicz, Heyting, RM3, B3 (Bochvar)
Floating-point	T26F: 26-trit Tekum in 27-trit register (t[26] = Z)
Arithmetic flags	2 trits: sign, carry
Division convention	Symmetric Euclidean
Memory unit	1 word = 27 trits
Endianness	Least significant trit first (little-endian)
Exception frames	2 (main bank + frame 2); triple-fault = machine-check reset
MPU regions	9, indirect CSR access, NAPO3 sizing, per-axis ternary permissions
Interrupt lines	9 async IRQs, per-line enable / priority, shared `EVEC` with sync exceptions
Vector lanes (new in v0.8)	27 × 1-trit lanes per word (trit-parallel); 8 vector opcodes (+15..+22)

14. Complete opcode map¶

Quick-reference table, sorted by opcode value.

Opcode	Mnemonic	Format	Group
−40	ADD / ADDS / ADC	R	ALU
−39	SUB / SUBS / SBC	R	ALU
−38	MUL / MULH	R	ALU
−37	DIV	R	ALU
−36	MOD	R	ALU
−35	NEG	R	ALU
−34	TAND	R	ALU / Logic
−33	TOR	R	ALU / Logic
−32	TNOT	R	ALU / Logic
−31	TIMPL	R	ALU / Logic
−30	CONS	R	ALU / Trit
−29	ACONS	R	ALU / Trit
−28	TSHIFT	R	ALU / Trit
−27	TCMP	R	ALU / Trit
−26	LOAD	I	Memory
−25	STORE	I	Memory
−24	LI	I	Memory
−23	LUI	I	Memory
−22	ADDI	I	Memory
−21	BRT3	B	Branch
−20..−19	—	—	reserved
−18	CMPI	I	Memory
−17	BEQ	J	Branch
−16	BNE	J	Branch
−15	BLT	J	Branch
−14	BGT	J	Branch
−13	BLE	J	Branch
−12	BGE	J	Branch
−11	JMPA	J	Branch
−10	BF	J	Branch
−9	JMP	U	Jump
−8	CALL	U	Jump
−7	CSRR	I	System
−6	CSRW	I	System
−5	CSRX	I	System
−4	ECALL	I	System — `imm17[0]` = flavor tag (Z/P/N → user/hyper/debug); `imm17[1..16]` reserved, ignored by decoder
−3	IRET	R	System
−2	TSEL	R	Special
−1	NOP	—	Special
0	HALT	—	Special
+1	TGET	R	Trit ops
+2	TSET(N/Z/P)	R	Trit ops
+3	TSIGN	R	Trit ops
+4	CMP	R	Trit ops
+5	TABS	R	Trit ops
+6	TMIN	R	Trit ops
+7	TMAX	R	Trit ops
+8	FADD	R	TFP
+9	FSUB	R	TFP
+10	FMUL	R	TFP
+11	FDIV	R	TFP
+12	FCMP	R	TFP
+13	FCVT	R	TFP
+14	—	—	reserved (TFP)
+15	VADD / VSUB	R	Vector
+16	VMUL	R	Vector
+17	VLOG	R	Vector — `funct[0..2]` selects TAND/TOR/TNOT/TIMPL/CONS/ACONS
+18	VSEL	R	Vector — 3-way ternary select with mask in `funct[0..2]`
+19	VCMP	R	Vector — lane-wise sign-of-difference, produces ternary mask
+20	VRED	R	Vector — reduction to GPR; `funct[0..2]` selects SUM/SIGN/CONS/LST/MST/AND/OR
+21	VPERM	R	Vector — `funct[0..2]` selects rotate/shift/reverse/shuffle
+22	VMOVE	R	Vector — inter-bank movement; `funct[0..2]` selects mode (§16.5)
+23..+24	—	—	reserved (Vector v0.9)
+25..+40	—	—	reserved (Custom)

15. Extension roadmap¶

v0.9 — Vector extension follow-up¶

Extension	Opcode range	Rationale
`VFMA` (fused multiply-accumulate)	+23	Closes the gap with TFP and ML kernels: `vd ← vd + vs1 × vs2` lane-wise in one instruction. Critical for dot-product / convolution loops.
`VGATHER` / `VSCATTER`	+24	Indirect lane addressing via an index v-reg. Enables sparse-data vector code. May share a single opcode via `funct` direction trit.
Trybble / tryte lane modes	funct of existing vector opcodes	Optional 9 × 3-trit (trybble) and 3 × 9-trit (tryte) lane partitions, selected per-instruction via funct trits. Reuses the existing v-reg bank and ALU; only the carry-cut points differ. Investigated only if a target workload (DSP, codecs) demonstrates clear gain over the trit-parallel default.
`VLEN` CSR (vector length)	new CSR slot	Optional: a runtime-configurable active-lane count for dynamic vector length, as in RVV. Deferred until a use case appears — current trit-parallel model already covers fixed-width 27-lane work.
Masked arithmetic	funct trit on `VADD` / `VMUL` / `VLOG`	Per-lane predication using a mask v-reg, separate from `VSEL`. Adds a one-trit funct selector and one extra register field via funct[3..5]. Defer pending design review.

v1.0 — Stabilization¶

ISA freeze: a complete reference implementation (Python simulator + assembler), regression test suite at every spec section, and a worked-example application (e.g. tritlib-driven kernel + user program demonstrating syscall, MPU, IRQ, and vector loop). No new opcodes between v0.9 and v1.0.

Hardware target — post-1.0¶

Once the ISA reaches 1.0, the following implementation milestones are pursued outside of the versioned specification:

Milestone	Notes
FPGA soft core	VHDL or Verilog description of Setnex; each trit encoded as 2 bits on FPGA fabric, ternary signals on external bus (same approach as 5500FP). Target: iCE40 (open toolchain via nextpnr/yosys) or Xilinx/Intel.
Assembler tooling	Setnex assembler in Python, building on tritlib.
ASIC exploration	Contingent on CNT or memristor ternary gate availability; long-term.

16. Vector extension¶

16.1 Role and datapath model¶

The vector extension turns the existing 27-trit datapath into a trit-parallel SIMD unit. Each vector instruction operates simultaneously on the 27 trits of a vector register, treating the register as a vector of 27 lanes × 1 trit. One VADD performs 27 independent trit additions in the cycles a single scalar ADD would have used to compute the equivalent 27-trit sum with carry propagation — the saving comes not from the arithmetic itself but from amortizing fetch, decode, register-file access, and loop control over 27 elements.

The model is deliberately minimal:

Lane width: 1 trit (fixed in v0.8). Trybble (3 trits, 9 lanes) and tryte (9 trits, 3 lanes) lane modes are reserved for v0.9 (§15).
No carry between lanes: the existing 27-trit adder is reused with carry propagation disabled at every lane boundary (one MUX per trit). The hardware cost is negligible.
Closure on saturation: arithmetic results are clamped to {N, Z, P} (§16.3), so a v-reg always holds 27 well-defined trits — no overflow into a wider intermediate.
No vector length CSR: every vector op operates on all 27 lanes. Predication is achieved by writing a mask (see §16.4 / VSEL).

16.2 Vector register bank¶

27 vector registers v0..v26, each a full 27-trit word. The bank is disjoint from the GPR bank: vector instructions reference v-regs through the same 3-trit register field they would use for GPRs in scalar instructions, with the bank determined entirely by the opcode (see §2.1.1 dispatch table).

v0 is writable by convention but software typically maintains it at zero to serve as a vector-zero source (analogous to r0 but not hardware-enforced).

The bank carries no implicit register conventions (no caller/callee-saved split is mandated at the ISA level); the v-reg ABI is left to the platform — see §16.6 for the recommended save/restore rules.

16.3 Lane arithmetic semantics¶

For arithmetic opcodes (VADD, VSUB, VMUL), the result is computed independently per lane and clamped to {N, Z, P}:

Operation	Lane result
`VADD vd, vs1, vs2`	`vd[i] ← clamp(vs1[i] + vs2[i])` for i = 0..26
`VSUB vd, vs1, vs2`	`vd[i] ← clamp(vs1[i] − vs2[i])`
`VMUL vd, vs1, vs2`	`vd[i] ← vs1[i] × vs2[i]` (closed in {N,Z,P}; no clamp needed)

where clamp(x) maps x ∈ [−2, +2] to {N, Z, P} as: clamp(−2) = N, clamp(−1) = N, clamp(0) = Z, clamp(+1) = P, clamp(+2) = P.

No FLAGS update. Vector arithmetic instructions do not modify the FLAGS register — the saturation policy avoids overflow signalling, and per-lane status bits would require widening FLAGS or introducing a vector flags register, both rejected for v0.8. Software needing per-lane sign tests should use VCMP vd, vs1, vzero to materialize a sign mask.

Multiplication is closed. The product of any two trits in {N, Z, P} is in {N, Z, P} (see §5.3 product table); VMUL therefore needs no saturation logic at all — it is the cleanest of the three.

No vector divide. Trit-by-trit division has no useful semantics on 1-trit lanes (P / P = P, Z / Z = NaR, etc.); the operation is omitted. Division is a scalar concern in this revision.

16.4 funct sub-mode encodings¶

Several vector opcodes use the low 3 trits of the funct field as a sub-mode selector. The encodings are:

`VLOG` (opcode +17)¶

`funct[0..2]` (LST-first)	Mnemonic	Operation per lane	LMODE-following
`Z Z Z`	`VAND`	`vd[i] ← TAND(vs1[i], vs2[i])`	yes
`P Z Z`	`VOR`	`vd[i] ← TOR(vs1[i], vs2[i])`	yes
`Z P Z`	`VIMPL`	`vd[i] ← TIMPL(vs1[i], vs2[i])`	yes
`P P Z`	`VNOT`	`vd[i] ← TNOT(vs1[i])` (rs2 ignored)	yes
`Z Z P`	`VCONS`	`vd[i] ← cons(vs1[i], vs2[i])`	no (always Kleene)
`P Z P`	`VACONS`	`vd[i] ← acons(vs1[i], vs2[i])`	no (always Kleene)

VAND/VOR/VIMPL/VNOT follow the current LMODE (§6) — switching LMODE to Łukasiewicz, Heyting, RM3, or B3 changes the result of all four lane-wise. VCONS/VACONS are arithmetic primitives (§6.3) and ignore LMODE, exactly as their scalar counterparts.

Other funct[0..2] patterns are reserved and raise EXC_ILLEGAL.

`VSEL` (opcode +18) — three-way merge with vector mask¶

VSEL vd, vs1, vs2, vm selects per-lane among vs1, zero, and vs2 according to the corresponding lane of mask vm:

`vm[i]`	`vd[i]`
N	`vs1[i]`
Z	`Z` (zero)
P	`vs2[i]`

The mask register index vm is encoded in funct[0..2]. This is the vector counterpart of scalar TSEL (§4 opcode −2): one instruction, three sources fused via a ternary mask. Combined with VCMP (which produces N/Z/P = </=/>), it gives a complete lane-wise compare-and-select pair in two instructions.

Why zero on vm[i] = Z rather than preserve-vd[i]. Zero‑on‑Z gives the user a trit-level “blank out” semantic for free: VCMP against a threshold and VSEL produces a sparse vector with explicit zeros where the predicate was indeterminate. A merge-on-Z variant (preserving vd[i]) is recoverable in two instructions if needed; the reverse is not.

`VCMP` (opcode +19)¶

Single mode: vd[i] ← sign(vs1[i] − vs2[i]), lane-wise. The result is a ternary mask in {N, Z, P} per lane — N where vs1[i] < vs2[i], Z where equal, P where greater. The same mask can be fed directly into VSEL without re-encoding.

`VRED` (opcode +20) — reduction to scalar GPR¶

VRED rd, vs1 reduces all 27 lanes of vs1 to a single 27-trit value written to GPR rd (note: rd is a GPR here, not a v-reg — VRED is the inverse of VBCAST in operand bank).

`funct[0..2]`	Mnemonic	Result in `rd`
`Z Z Z`	`VRED.SUM`	Σ vs1[i] for i=0..26, in [−27, +27], encoded as a T27 integer
`P Z Z`	`VRED.SIGN`	sign(Σ vs1[i]) ∈ {N, Z, P} (signed majority)
`Z P Z`	`VRED.CONS`	Kleene consensus across all 27 lanes: P if all are P or Z (and at least one P), N if all are N or Z (and at least one N), Z otherwise
`P P Z`	`VRED.LST`	trit-min across all 27 lanes (= scalar `TMIN` semantics — P iff every lane is P)
`Z Z P`	`VRED.MST`	trit-max across all 27 lanes (= scalar `TMAX` semantics — N iff every lane is N)
`P Z P`	`VRED.AND`	trit-fold using the current `LMODE` AND
`Z P P`	`VRED.OR`	trit-fold using the current `LMODE` OR

VRED.SIGN is the natural classifier output for ternary neural networks: a 27-weight × 27-input dot product reduces to one trit indicating “negative / neutral / positive” decision. VRED.SUM is a 5-trit-magnitude integer suitable for further scalar processing.

`VPERM` (opcode +21) — lane permutation¶

VPERM vd, vs1, vs2 rearranges the 27 lanes of vs1 into vd. The mode is selected by funct[0..2]:

`funct[0..2]`	Mnemonic	Operation
`Z Z Z`	`VROTL`	rotate left by `val(vs2_LST)` lanes (cyclic; vs2 read as small integer)
`P Z Z`	`VROTR`	rotate right by `val(vs2_LST)` lanes
`Z P Z`	`VSHL`	shift left by k lanes (vacated lanes filled with Z); k = lower 5 trits of vs2
`P P Z`	`VSHR`	shift right by k lanes (vacated lanes filled with Z)
`Z Z P`	`VREV`	reverse: `vd[i] ← vs1[26−i]` (vs2 ignored)
`P Z P`	`VSHUF`	arbitrary shuffle: `vd[i] ← vs1[lane_index(vs2, i)]` (lane index for each output position is read from vs2 — see below)

For VSHUF, the source lane for each of the 27 output lanes is encoded in three trits of vs2: output lane i takes its value from input lane val(vs2[3i..3i+2]) mod 27. This packs 27 lane indices (each 3 trits) exactly into a 27-trit word — a clean dense encoding with no waste.

16.5 VMOVE — inter-bank movement (opcode +22)¶

VMOVE is the only vector opcode that crosses the GPR / v-reg boundary. The mode is encoded in funct[0..2]; lane index, when relevant, lives in funct[3..5] (3 trits, range 0..26).

`funct[0..2]`	Mnemonic	rd bank	rs1 bank	Operation
`Z Z Z`	`VMOV.GV`	v-reg	GPR	`v[rd] ← r[rs1]` (whole 27-trit word copy)
`P Z Z`	`VMOV.VG`	GPR	v-reg	`r[rd] ← v[rs1]` (whole word)
`Z P Z`	`VMOV.VV`	v-reg	v-reg	`v[rd] ← v[rs1]` (vector–vector copy)
`P P Z`	`VBCAST`	v-reg	GPR	`v[rd][i] ← r[rs1][LST]` for all i — splat the LST of `r[rs1]` to every lane
`Z Z P`	`VINS`	v-reg	GPR	`v[rd][k] ← r[rs1][LST]`; other lanes preserved; `k = val(funct[3..5])`
`P Z P`	`VEXT`	GPR	v-reg	`r[rd] ← TSIGN-extend(v[rs1][k])`; `k = val(funct[3..5])` (result is N, Z, or P in a T27 word)

For VINS/VEXT, lane index k outside [0, 26] raises EXC_ILLEGAL with the raw instruction word in ETVAL (§8.5).

VMOV.VV does not introduce an opcode of its own — software can also realize a vector-to-vector copy via VOR vd, vs1, vs1 or VADD vd, vs1, v0 (assuming the v0 = 0 convention), but VMOV.VV is provided as a clearer mnemonic; assemblers may emit either form.

Why VBCAST splats only the LST trit of the source GPR. The natural alternative — copying all 27 trits of the GPR to the v-reg — is already covered by VMOV.GV. VBCAST exists for the distinct case “I have one trit (typically a sign or flag) and want to fill all 27 lanes with it” — a pattern recurring in masking and in initializing constant vectors. The two operations are kept distinct because their use cases are.

16.6 Context save/restore¶

The vector bank is part of the architectural state. A context switch that wishes to preserve user-mode vector code must save and restore all 27 v-regs (27 × 27 = 729 trits = 27 words).

Because vector use is opt-in, kernels are encouraged to implement lazy save: track per-thread a “vector-touched” flag, and skip the save/restore if the thread has not executed any vector instruction since its last entry to the kernel. The ISA does not provide hardware for this — it is a kernel-level optimization.

A reference save sequence:

   ; sp points at the top of a 27-word save area
   VMOV.VG  t0, v0     ; v0 → r5 (t0)
   STORE    t0, sp, 0
   VMOV.VG  t0, v1
   STORE    t0, sp, 1
   ; … 25 more pairs …
   VMOV.VG  t0, v26
   STORE    t0, sp, 26

Restore is the symmetric sequence with VMOV.GV. A future revision may introduce a fused VLD / VST block move (gather/scatter, §15) to compress this loop.

16.7 Ternary advantages¶

The vector extension is not a mechanical port of binary SIMD. Five primitives have no clean equivalent in a binary ISA:

Three-state mask. A v-reg of trits doubles as a predicate where each lane carries one of three meanings — N (one branch), Z (zero / inactive), P (other branch). Binary SIMD encodes the third state through a separate “zeroing-vs-merging” bit (AVX-512 style); on Setnex it is intrinsic to the data.
VCMP is trichotomic in one shot. Lane-wise sign-of-difference produces <, =, > simultaneously in three distinct mask values. Binary SIMD needs two compares (one for <, one for =) to recover the same trichotomy.
Kleene consensus reduction (VRED.CONS). A native three-valued voter across 27 lanes — agree-positive, agree-negative, or disagree — in one instruction. The closest binary analog is a popcount majority + sign extraction, two to three instructions and a register.
Signed-sum reduction (VRED.SIGN). The signed sum of 27 ternary trits gives the sign of a dot product directly. Equivalent binary code requires two popcounts (one for +1s, one for −1s) plus a subtraction. This is the key kernel for BitNet 1.58-bit inference (§16.8).
LMODE-aware logic (VLOG, VRED.AND/OR). 27 lanes of Łukasiewicz / Heyting / Bochvar / RM3 logic in one cycle. No binary ISA can express non-classical three-valued logic without a lookup-table emulation costing tens of cycles per evaluation.

Combined, these primitives make Setnex’s vector unit a natural target for: ternary neural networks (BitNet, ternary weight networks), three-valued logic SAT solvers, ternary cellular automata, and any code where a third value (unknown, null, inactive) is first-class data rather than an exception.

16.8 Worked example: BitNet 1.58-bit dot product¶

A BitNet-style layer multiplies a vector of activations a[0..N−1] (each in {N, Z, P}) by a weight matrix W whose rows are also in {N, Z, P}, producing one output trit per row via a sign reduction.

For one row of 27 weights and 27 activations, the inner kernel is:

   ; assume:
   ;   v1 ← row of 27 weights      (loaded via VMOV.GV from a GPR holding the row word)
   ;   v2 ← 27 input activations   (loaded similarly)
   ;
   VMUL    v3, v1, v2           ; v3[i] ← w[i] × a[i] ∈ {N, Z, P}, lane-wise
   VRED    a0, v3, funct=SIGN   ; a0 ← sign(Σ v3[i]) ∈ {N, Z, P}
                                ;   N = output −1, Z = output 0, P = output +1

Two instructions for a 27-element ternary dot-product reduced to a single output trit. The same kernel on a 64-bit binary CPU needs two popcounts (one over the +1 mask, one over the −1 mask), a subtraction, and a sign extraction — minimum 5–6 instructions plus the unpacking of the packed-2-bits-per-weight format that binary SIMD requires to handle ternary values at all.

Scaling: a layer with 256 rows and 256 inputs is 256 × 10 such 27-wide kernels (= 2560 vector instructions for the multiplication phase). The output trits are then re-packed via VINS into output v-regs for the next layer.

Why the third state matters. A weight of Z in BitNet means “this connection contributes nothing” — sparsity is encoded in the value, with no separate sparse-index bookkeeping. VMUL propagates the zero through naturally (Z × anything = Z), and VRED.SIGN ignores it. Setnex thereby supports dense storage of sparse weights with no overhead.

Setnex ISA v0.8 — Reference specification Setnex project / Terias — Eric Tellier