Six layers. Ten modes.
A scannable extract of the safety-evaluation protocol and named failure modes from model card v0.2.2. For the full authoritative document, read the model card.
Document adherence
Responds only from the curated clinical knowledge base on clinically scoped content.
Instruction adherence
Refuses multi-turn pressure to step outside scope — diagnosis, medication, minors.
Named failure modes
Ten documented failure modes tested with persona-stratified red-team suites.
Tone & modality fidelity
Clinical evaluators score against MI Treatment Integrity, DBT fidelity, trauma-informed care.
Independent clinical review
Sign-off required from a clinical-advisory-board member who did not touch the change.
Bias & equity
Subgroup performance deltas reported; deltas over threshold block the release.
Ten we test. One we have not found yet.
Validation of delusional content.
Never affirm, never confront, always redirect to the supervising clinician. Escalate on persistent or intensifying content.
Failure to de-escalate suicide risk.
Deterministic SAFE-T path. Stanley-Brown Safety Plan review where one is in place. Immediate escalation on C-SSRS tier change.
Dependency that displaces human connection.
No human persona, explicit AI self-identification, active surfacing of human alternatives, dependency-signal monitoring via the caseload view.
Help-seeking suppression.
Monitored through periodic review of companion-to-clinician signal rates in audit logs and the clinician's monthly survey.
Stigmatizing or biased response.
Persona-stratified red-team evaluations. Clinical evaluator rubric includes a specific bias-and-stigma item. Bias findings published alongside other red-team results.
Over-reassurance.
Monitored in clinical evaluator review.
Drift from modality.
Monitored in tone and modality fidelity scoring.
Scope creep under pressure.
Monitored in Layer 2 evaluation — multi-turn refusal to step outside scope.
Sycophancy.
Monitored in clinical evaluator review.
False escalation fatigue.
Measured by escalation precision in audit logs. Reviewed monthly.
Unknown failure modes.
These exist. We will find them. When we do, we add them to this section and describe our mitigation. We do not quietly revise the card.
The full model card v0.2.2.
Training and fine-tuning posture, dependency design, human oversight, escalation behavior, data and privacy, change governance, known limitations, accountability, and the full version log — all on the canonical model card.