Technical Architecture

Thirteen agents.
Zero evaluation.
One decision model.

The Decision Capture Platform records how learners make spelling decisions using Merriam-Webster pronunciation as the fixed reference. Built on Dr. Robert T. Nash's Pure and Complete Phonics — 103 phonemes, 70 + years of research. Every agent observes. None evaluates. The teacher decides everything.

Student Interface

4-stage workflow. Typed input only — no voice. MW pronunciation symbols as fixed anchor. Dialect-neutral by design.

Server · Node.js + Express

3,619-line server.js. Redis rate limiting, magic-link auth, Anthropic Claude Haiku for Socratic Coach AI. 30s timeout on coach proxy. RLS on every Supabase table.

Agent Layer · 13 Agents

Scheduled cron + fire-and-forget. Pattern observation, student profiling, IEP goal drafting, difficulty tracking, live alerts. All enforce the Decision Capture Model constraint.

Supabase · Postgres

Users, records, student_profiles, iep_drafts, live_alerts, session_debriefs, vocabulary_bank. Service role key. Row Level Security on every table.

The Workflow

Four stages. Every decision recorded.

01

Word Input

Student enters word, MW pronunciation symbols between slashes, centered-dot form. Creates syllable structure from pronunciation hyphens.

Screen input only. No API call at this stage.

02

Sound Mapping

One row per sound. Phoneme selector (103 P/C-Phs symbols), grapheme field, letter position (1 = most common), silent checkbox. Left to right through the pronunciation.

POST /api/guide for Socratic Coach. Sound sheets referenced locally from sound-sheets.js.

03

Build & Check

Platform constructs word from all grapheme entries combined. Student verifies structure matches original word. Structural check — not evaluation.

No evaluation. Mismatch triggers Coach Socratic question via /api/guide.

04

Meaning & Reflection

Student definition (own words), part of speech, purpose, sentence showing meaning, spelling reflection on which decision was hardest and why.

POST /api/records on submit. Server triggers session-debrief + live-alerts agents.

Agent Architecture

Thirteen agents. Three layers.

Governing Constraint

The Decision
Capture Model

Every agent, every API response, every robot behavior operates under one inviolable constraint: observe and record. Never evaluate. The platform captures spelling decisions exactly as made. The teacher interprets everything.

01

Record

Spelling selections captured exactly as entered

02

Count

Phoneme occurrences tallied across sessions

03

Describe

Patterns surfaced in neutral language

04

Surface

Observations delivered to teachers, tutors, parents

Absolute Ban

No agent or interface may use: correct, wrong, right, close, almost, good job, great, nice, perfect, exactly, or well done. No evaluation in any form.

Phase 3 — Robot Behavior Layer

The same brain.
A physical presence.

No new API endpoints. No new agents. The robot is a client — like the browser, but with a body. It points, nods, tilts its head, pulses its chest LEDs for high-variation phonemes, and delivers Coach responses with a voice. The intelligence stays server-side. The robot adds the relationship.

5

Emotional States

4

Stage Behaviors

0

Evaluative Words

Robot Behavior Layer — Interactive Simulator

The brain, embodied.

This is NOT the web app in a robot shell. The robot IS the interface. It presents sounds, holds up phoneme cards, displays the word on its chest screen, and coaches through gesture and voice. The student's tablet is mounted on the robot — every interaction goes through the companion.

Robot Tablet Interface

Behavior Event Log ■ ROBOT ■ API ■ STUDENT ■ SYSTEM

Start a session to see robot behavior events

Robot Behavior Specification

Stage-by-stage behavior map.

Every robot gesture, posture, speech act, and LED cue mapped to the four-stage Decision Capture workflow. The robot is a client of the existing API — the intelligence stays server-side.

Stage 1 — Word Input

Trigger	Robot Behavior	Platform Connection
Student opens app	Sits up straight, eyes brighten (LED), turns head toward student. Says: "Ready when you are."	Robot client loads /api/session-context for student profile.
Student types word	Leans slightly forward (interest posture). Watches screen. Silent.	No API call — local screen monitoring only.
Student enters MW pronunciation	Nods once slowly. Eyes track from screen to student briefly.	Robot parses pronunciation locally to pre-load sound sheet data.
Student clicks Create Syllables	Gentle "let's go" gesture (one arm forward). Says: "Syllables are set. Time to map the sounds."	Stage transition logged locally. Timer starts for engagement tracking.
Student idle > 30 seconds	Tilts head quizzically. Says: "What step are you on?"	Sends idle signal to /api/guide with context for Socratic prompt.

Stage 2 — Sound Mapping

Trigger	Robot Behavior	Platform Connection
Student selects phoneme	Points toward screen's phoneme area. Plays matching .mp3 from /audio/ through speaker.	Maps selected phoneme to local audio file (e.g. \ə\ → schwa.mp3).
Student types grapheme	Watches. No reaction. (DCM: observe, never evaluate)	None — platform records, never grades.
Student selects position 3+	Raises eyebrows slightly (curiosity, not judgment). Stays silent.	None. Higher position is recorded, not evaluated.
Student marks letter silent	Slow single blink (acknowledgment). No verbal response.	None.
Student pauses > 45 seconds	Says: "Say the sound slowly. What do you hear?" Leans in slightly.	Fires /api/guide with stuck-student context. Coach returns Socratic question.
Frustration detected (camera)	Leans back (gives space). Eyes soften. Says: "Take your time. There is no rush."	On-device ML only. Categorical label. Never streamed or stored.
High-variation phoneme (e.g. schwa)	Chest LED pulses teal — subtle cue that this sound has many spellings. No verbal comment.	Checks sound-sheets.js: if totalGraphemes ≥ 5, triggers LED pulse.

Stage 3 — Build & Check

Trigger	Robot Behavior	Platform Connection
Student clicks Build Word Output	Sits up, both hands come together (anticipation gesture).	None.
Built word matches original	Slow affirming nod. Eyes brighten. Says: "The letters line up. Ready for the last part?"	Structural match — not evaluative. Robot never says "correct."
Built word does NOT match	Tilts head. Says: "The built word looks different. Which row might need a second look?"	Sends mismatch context to /api/guide for targeted Socratic question.

Stage 4 — Meaning & Reflection

Trigger	Robot Behavior	Platform Connection
Student writes spelling reflection	Leans in slightly — this is the most valuable data point.	None.
Student submits record	Celebratory micro-gesture: arms rise briefly, eyes flash (3 seconds max). Says: "That one is recorded. Want to do another word?"	POST /api/records. Server triggers session-debrief and live-alerts agents.
Student ends session	Relaxes posture, eyes dim gently. Says: "Good session. See you next time."	Session end logged. Robot client disconnects.

Emotional State Machine

Five states. Camera optional.

State	Physical Expression	Verbal Tone	Transition Trigger
READY	Upright, eyes forward, calm breathing animation	Warm, even pacing	Session start or return from break
ENGAGED	Slight forward lean, eyes tracking screen	Slightly more animated	Student actively typing/selecting
CURIOUS	Head tilt, eyebrow raise, gaze shifts	Inquisitive inflection	High-variation phoneme or unusual selection
PATIENT	Lean back, slow breathing, eyes soft	Slower, gentler pacing	Frustration detected or idle > 60s
CELEBRATING	Brief arm raise, eye flash, slight bounce	Upbeat but brief (3 seconds max)	Record submitted successfully

Celebrating ≠ Evaluating

The CELEBRATING state lasts a maximum of 3 seconds. The robot celebrates completion of effort, not correctness. This aligns with the platform's absolute ban on evaluative language.

Minimum Viable Hardware

Component	Spec	Purpose
Touchscreen	10" capacitive (1920×1200)	Runs DCP student interface
Compute	Raspberry Pi 5 (8GB)	RBL client, local ML, behavior state machine
Camera	720p, on-device only	Engagement detection (never streamed/stored)
Speakers	2W stereo	TTS coach responses + phoneme audio
Head Servos	2-axis pan/tilt	Nod, tilt, turn toward student
Eye Display	2× OLED 1.3"	Emotional expression
Chest LEDs	RGB addressable strip	Status + high-variation phoneme pulse
Battery	10000mAh + UPS HAT	4+ hours active use
Enclosure	12-18" tall, soft-touch	Child-safe, desk-sized companion

$800

Prototype unit

$400–500

At 500+ units

Phase 3 — In Development

Robot Behavior Layer

The same thirteen agents. The same Decision Capture Model. Embodied in a physical coaching companion. The robot adds presence, not intelligence — the brain already exists.

$800

Prototype cost

13

Agents unchanged

0

New endpoints

103

Nash phonemes