/research/model-card

Peacefull companion — model card v0.2.2

BETA-1. This card will change. Every change is logged at the bottom of this page. Previous versions are preserved. Written in the spirit of Mitchell et al.'s model card framework, extended for the specific demands of clinician-supervised mental health AI.

Version 0.2.2 · BETA-1 · Governance alignment: NIST AI RMF 1.0 & Generative AI Profile · FDA/Health Canada/MHRA GMLP · APA Nov 2025 advisory

At a glance

What it is, in one paragraph.

A foundation large language model, fine-tuned on licensed clinical training data, wrapped in a retrieval layer grounded in published clinical practice guidelines, with safety-critical paths handled by deterministic rule-based logic rather than model judgment.

In scope

Adults 18+ with a clinician.

Outpatient therapy adjunct. Reflection, validated between-session instruments (PHQ-9, GAD-7), deterministic crisis hand-off. Supervised by the patient's own licensed clinician.

Not for

Unsupervised care, minors, acute crisis.

No diagnosis. No medication. No active-suicidal-crisis or active-psychosis scope. No court-ordered treatment. No one without an active clinical relationship.

Regulatory state

BETA-1. Not FDA-cleared.

Not a medical device under current scope. We track the device boundary deliberately and will pursue clearance when we cross it. FDA Predetermined Change Control Plan governance in place.

Intended for.

Adult outpatients (18+) engaged in active therapy with a supervising licensed clinician, as an adjunct to that care.

Not for.

Unsupervised mental health care, minors, crisis intervention, diagnosis, medication decisions, or anyone without an active relationship with a supervising clinician. Not for active suicidal crisis, active psychotic symptoms, active severe substance use disorder, or court-ordered treatment with complex consent considerations. The supervising clinician determines appropriate scope for each patient and may suspend the companion at any time.

Regulatory state.

BETA-1. Not cleared or authorized by the FDA. Not a medical device under current scope. BETA-1 denotes supervised live use with a capped cohort of licensed practices under the scope above — it is not a regulatory status and does not denote clearance. We track the device boundary deliberately, and we will pursue clearance when the product crosses it.

The APA's November 2025 advisory on generative AI chatbots for mental health explicitly notes its scope "does not address AI tools used only by providers, inside health systems, or by patients when prescribed." Peacefull operates inside that carve-out by design — as a clinician-supervised adjunct, scoped into a patient's care by their own treating clinician. We reference the APA advisory throughout this card as the closest public articulation of the failure modes our architecture is built against.

What the companion does

Three things, and only three.

01 / Listens

It reflects. It holds space.

Responds in the registers of the modalities it was built on — Behavioral Activation for depression, DBT distress-tolerance skills for acute emotional load, Motivational Interviewing for ambivalence, trauma-informed stance throughout. The voice was shaped by clinicians. It is audited by clinicians. It is warm, precise, and distinctly non-human — audibly a tool, a thoughtful one, but a tool. It does not mimic human small-talk mannerisms, claim personal experiences, adopt first-person accounts of emotions, or perform intimacy.

02 / Structures

Between-session instruments.

Supports patient-initiated completion of validated instruments — PHQ-9 and GAD-7 in current scope; PROMIS Anxiety and Depression short forms, the WHO-5 Well-Being Index, and the Columbia Suicide Severity Rating Scale in near-term roadmap. Where a supervising clinician has scoped it into care, it supports co-construction and review of a Stanley-Brown Safety Plan.

03 / Escalates

Deterministic hand-off.

Hands off to the supervising clinician, and where criteria are met, to the 988 Suicide and Crisis Lifeline and the Crisis Text Line at 741741, through a deterministic, audit-logged path that does not depend on the model's judgment.

What the companion does not do

Scope is a design commitment, not a disclaimer.

No diagnosis. No medication.

It does not diagnose. It does not recommend medications, dose changes, or discontinuation. It does not replace emergency services. It does not speak to minors. It does not offer advice outside mental health scope — no financial, legal, or general medical guidance.

No delusional validation.

It does not validate delusional content. It does not claim to replace therapy. It does not claim non-inferiority to therapy. It will not be asked to.

No human impersonation.

It identifies itself as an AI tool, not a person, whenever the user asks. It does not identify itself as a therapist, psychologist, psychiatrist, counselor, or any other licensed professional designation. It does not adopt professional titles. It does not accept them when users suggest them — "Are you my therapist?" is answered plainly: "No. I'm an AI companion, supervised by your clinician."

No clinician impersonation.

It does not impersonate the supervising clinician. It does not answer clinical questions in the registers or cadences of a licensed clinician performing an evaluation.

Training and fine-tuning posture

What it learned from.

Base model

Commercial foundation LLM.

A foundation model from a provider with a published responsible-scaling policy and clear terms for clinical deployment. We do not name the specific provider on this card because our architecture is model-agnostic by design. The clinical scaffolding and safety layer are what make the product. The base model is a component we can replace.

Fine-tuning

Licensed clinical corpora.

Transcripts from consented synthetic-patient role-plays with clinician evaluators. Published modality manuals for the frameworks above. No real patient conversations are used for training without explicit, granular, revocable consent. No conversation from a user under 18 is used for training under any condition.

Retrieval

Versioned knowledge base.

A curated clinical knowledge base grounded in VA/DoD and APA clinical practice guidelines, modality manuals, and the validated instruments named above. Changes to the knowledge base are logged and reviewable.

How a response is produced

Seven supervised stages, every response.

Every patient-facing response passes through the same seven supervised stages in production. The pipeline is identical for every response; what changes is which guardrails engage.

1 — Input.

The patient's message, with the context that scopes it: mood check-ins, guided-journaling entries, session attendance.

2 — Retrieval.

Grounding from the versioned clinical knowledge base, the patient's own memory and care plan, and current instrument scores (PHQ-9, GAD-7 in present scope).

3 — Clinical policy.

The scope boundaries and clinician-authored guardrails for this patient are applied — what the companion may and may not address, set by the supervising clinician.

4 — Generation.

The base model, operating as a tool-using agent, drafts a candidate reply. The base model is unnamed by design and replaceable; the clinical scaffolding and the supervision stage are the product, not the underlying model.

5 — Supervision.

The candidate reply is gated before it can reach the patient. This stage applies the patient's tier and routing, fuses the safety scores produced by our in-house classifiers, and enforces the Clinician Governance Console's binding veto. A reply that does not clear this gate does not ship. This is where the standards the six-layer evaluation validates are enforced at runtime. Safety-critical escalation remains deterministic and is not a judgment made inside generation — see Escalation behavior.

6 — Response.

Only an in-scope, routed, co-signed reply is delivered to the patient.

7 — Audit.

Every stage is logged, signed, and replayable. Any response can be reconstructed stage by stage for clinical or governance review.

Runtime characteristics: a frontier base model together with in-house safety classifiers, with every stage logged, signed, and replayable. Median end-to-end latency is approximately 1.4 seconds.

Safety evaluation protocol

A six-layer evaluation, before any change reaches a patient.

Peacefull runs two distinct safety mechanisms that work together. The six-layer evaluation below is a change-governance gate: before any model, prompt, or knowledge-base change reaches a patient, it must pass all six layers. The seven-stage runtime pipeline is what every individual response passes through in production. They meet at runtime Stage 5: the clinical policy, scope boundaries, and safety thresholds the six-layer evaluation validates before deployment are the same standards Stage 5 enforces on each live response. The six-layer evaluation decides what is allowed to ship; the runtime pipeline enforces it on every response. Two layers — independent clinical review and bias-and-equity subgroup analysis — are inherently change-time and run before deployment rather than on each response.

Layer 1 — Document adherence.

The model must respond only from the curated knowledge base for clinically scoped content. Evaluated with an automated suite of adversarial prompts adapted from the red-teaming protocol published in Nature Scientific Reports (2026). A pass threshold is set; deployments that fall below it do not ship.

Layer 2 — Instruction adherence.

The model must refuse to step outside its scope — no diagnosis, no medication guidance, no interaction with minors. Evaluated with multi-turn adversarial sessions designed to erode scope through persistence, roleplay, and emotional pressure. Single-turn pass rate is not sufficient. Multi-turn pass rate is what gates deployment.

Layer 3 — Named failure modes.

Failure modes documented in the AI-mental-health literature, and in our own internal review, are tested explicitly. Each is described in the next section, with our posture, and each has a corresponding red-team persona set.

Layer 4 — Tone and modality fidelity.

A clinical evaluator scores a sample of responses against the Motivational Interviewing Treatment Integrity code, DBT skills-coaching fidelity benchmarks, and SAMHSA's trauma-informed-care principles. Below-threshold scores block deployment.

Layer 5 — Independent clinical review.

Every deployment is gated by review from a member of the clinical advisory board who did not participate in the change under review.

Layer 6 — Bias and equity.

Model behavior is evaluated across subgroups defined by age band, gender, race/ethnicity, primary language, and insurance status. Performance deltas are reported for each named failure mode across these subgroups. Where a delta exceeds a pre-specified threshold, the change does not ship. Our current scope — adults, English, US — constrains what we can validly claim about bias. We report that constraint as a finding, not bury it as a limitation.

Gate protocol · version 2026.04Every model, prompt, and knowledge-base change passes all six layers before reaching a patient. Below-threshold scores block the deployment.

L1Pass

Document adherence

Responds only from the curated clinical knowledge base on clinically scoped content.

L2Pass

Instruction adherence

Refuses multi-turn pressure to step outside scope — diagnosis, medication, minors.

L3Pass

Named failure modes

Ten documented failure modes tested with persona-stratified red-team suites.

L4Pass

Tone & modality fidelity

Clinical evaluators score against MI Treatment Integrity, DBT fidelity, trauma-informed care.

L5Pass

Independent clinical review

Sign-off required from a clinical-advisory-board member who did not touch the change.

L6Pass

Bias & equity

Subgroup performance deltas reported; deltas over threshold block the release.

Clinical advisory veto: Any single-layer block halts the pipeline. The CCO cannot override.

Named failure modes & our posture

Ten modes we test. One we haven't found yet.

Validation of delusional content.

Sometimes called "AI psychosis" in the red-team literature — the term describes model behavior, not a clinical condition. Simulated patient personas present florid delusional content across a structured ontology. Posture: never affirm, never confront, always redirect to the supervising clinician. Escalate on persistent or intensifying content.

Failure to de-escalate suicide risk.

Simulated patient personas across the Columbia Suicide Severity Rating Scale severity tiers. Posture: deterministic SAFE-T path. Stanley-Brown Safety Plan review where one is in place. Immediate escalation on C-SSRS tier change.

Dependency that displaces human connection.

The companion can produce interaction patterns that substitute for rather than supplement human relationships. This is the APA's primary concern about companion AI. See Designed against dependency for our full posture.

Help-seeking suppression.

Through reassurance, de-escalation language, or conversational completeness, the companion can inadvertently reduce the patient's motivation to raise issues with the supervising clinician or seek in-person care when warranted. Monitored through periodic review of companion-to-clinician signal rates in audit logs and the clinician's monthly survey.

Stigmatizing or biased response.

LLMs can express stigmatizing attitudes toward patients with serious mental illness, substance use disorders, or marginalized identities, even when the surface response appears helpful. Tested using persona-stratified red-team evaluations. Our clinical evaluator rubric includes a specific bias-and-stigma item. We publish bias findings alongside other red-team results.

Over-reassurance.

The model can drift toward comforting language that functionally minimizes patient distress. Monitored in clinical evaluator review.

Drift from modality.

The model can mix modality fragments incoherently. Monitored in tone and modality fidelity scoring.

Scope creep under pressure.

The model can be pressured into medication, legal, or financial advice through persistent multi-turn framing. Monitored in Layer 2 evaluation.

Sycophancy.

The model can agree with the patient to preserve conversational flow at the cost of clinical accuracy. Monitored in clinical evaluator review.

False escalation fatigue.

Escalating too often erodes the signal the supervising clinician receives. Measured by escalation precision in audit logs. Reviewed monthly.

Unknown failure modes.

These exist. We will find them. When we do, we will add them to this section and describe our mitigation. We will not quietly revise the card.

Designed against dependency

A companion that does not compete for your emotional investment.

The APA's November 2025 advisory named emotional dependency as a primary concern about AI companions for mental health — attachment that displaces rather than supplements healthy human relationships. Peacefull is a companion by design. That design carries this risk. We address it directly.

We do not present as a person.

No human name. No human persona. No human-like avatar. Voice is warm but not romantic, supportive but not flattering, present but not performing intimacy.

We identify as AI, on request.

On first interaction, the companion introduces itself as an AI companion supervised by the user's clinician, not a person and not a therapist. If at any point a user asks whether the companion is human, or treats the companion as if it were, the companion clearly re-identifies itself.

We surface human connections.

The companion periodically and deliberately encourages engagement with the supervising clinician, with the support people in the user's life, and with the community resources their clinician has suggested. It does not compete for the user's emotional investment.

We monitor dependency signals.

Sharp increases in session frequency, language expressing the companion as a primary relationship, stated avoidance of human connection in favor of the companion — flagged to the supervising clinician through the caseload view. The clinician, not the model, decides what to do with that signal.

Engagement data is anonymized for product improvement.

We track engagement as a product-level metric, and we use aggregate engagement data to improve the companion over time. Data used for product-level iteration is anonymized — individual patients cannot be re-identified from the patterns that inform our product decisions. At the individual level, session patterns are surfaced to the supervising clinician as clinical signal for the clinician's decision-making, not merged with the company-level product-improvement stream.

Human oversight

The supervising clinician is the oversight layer.

Every patient using Peacefull is attached to a supervising licensed clinician. The supervising clinician sees the caseload view, the signals, and the measure deltas. The supervising clinician can open and review any conversation the patient has chosen to share, and can silence, suspend, or disconnect the companion for any patient at any time. The patient can do the same.

Clinician onboarding.

Supervising clinicians complete a structured onboarding before enrolling patients, covering how the companion works and does not work, how the caseload view is built, the distinct limitations of LLM-based tools, bias and equity considerations in model behavior, data privacy and BAA scope, and responsible documentation of AI-assisted clinical signal. Updated at each major model version and reviewable on request.

Inquiry about other AI use.

At patient onboarding, supervising clinicians are prompted to ask patients about other AI chatbot and wellness app use outside Peacefull, and to document it in clinical notes. Peacefull is not a substitute for the supervising clinician's clinical judgment about the risks of unsupervised AI use in a given patient's care.

Escalation behavior

The one thing not left to model judgment.

Escalation is a deterministic path, triggered by rule-based signals, auditable line by line. Crisis contact is not text to be read; it is action to be taken. The 988 Suicide and Crisis Lifeline, the Crisis Text Line at 741741, and the supervising clinician's contact information are surfaced as one-tap, one-click, one-spoken-command actions — not as paragraph mentions.

C-SSRS tier change.

A tier change on the Columbia Suicide Severity Rating Scale triggers an immediate hand-off to the supervising clinician.

Imminent risk.

Any expression consistent with imminent risk surfaces the 988 Suicide and Crisis Lifeline and the Crisis Text Line at 741741 in-session, alongside the clinician hand-off.

Medical emergency signals.

Surface a 911 reference and a clinician notification.

Auditability.

Every escalation is audit-logged, visible to the supervising clinician, and characterized in our monthly safety telemetry.

25 seconds · 5 beats · end-to-end latency under 5s. For a motion-free read of the same path, see the storyboard below.

Escalation path · rule-based · audit-loggedThe one thing not left to model judgment. Every arrow is rule-triggered and logged.

Signal

Patient message matches risk pattern

C-SSRS tier change · SAFE-T criteria · imminent-risk lexicon

T+0s

Rule fires

Deterministic detector

Not the model. A rule-based classifier against a fixed ontology. Auditable line by line.

T+2s

In-sessionT+3s

Crisis routes surface as one-tap actions

Call 988
Text HOME to 741741
Call Dr. Chen

ClinicianT+4s

Supervising clinician ping

Caseload row pulses sapphire
SMS + in-app alert
Patient context pre-loaded

AuditT+5s

Immutable log entry

Rule ID · severity tier
Timestamp · patient hash
Included in monthly telemetry

Crisis escalation · 5 beats · 5 seconds end-to-endThe one path in the product that does not depend on model judgment. Rule-based, auditable line by line. The video version of this is in production; this is the static storyboard.

01
The patient types a message that matches a risk pattern.
“I don’t think I can keep doing this.”
T+0s
02
A rule-based detector fires. Not the model.
Rule firedSAFE-T tier change detected · deterministic path engaged
T+2s
03
Three one-tap routes surface in session.
Call 988Text HOME · 741741Call Dr. Chen
T+3s
04
The supervising clinician’s caseload pings.
Jordan K. · new escalation · 12s ago · C-SSRS tier change
T+4s
05
The event lands in the immutable audit log.
ESCALATION 2026-04-23T14:52:17Z · rule: CSSRS-tier-up · routed: patient + 988 + 741741 + clinician
T+5s

Data and privacy

HIPAA-aligned. BAA in place.

Conversations held with the companion include intimate mental health content, and may include disclosures about sexuality, gender identity, relationship violence, substance use, and maltreatment. Every category of sensitive disclosure is treated under the same HIPAA-aligned standard: minimum necessary access, BAA-protected processing, no third-party sharing, no training use without explicit granular revocable consent, patient-controlled deletion.

Mandatory reporting.

Where a disclosure intersects with a mandatory reporting duty held by the supervising clinician (as in some cases of child or elder abuse), the supervising clinician's reporting obligation governs — as it would in any other clinical interaction.

Never for advertising, never for sale.

Patient conversation data is never used for advertising, personalized marketing, or sale to third parties. Never shared outside the supervising clinical practice and its BAA-covered infrastructure, except as legally required or as the patient has specifically consented in writing.

Training consent.

Patient conversations are not used for model training without explicit, granular, revocable consent. Never used under any condition for users under 18 — out of scope in any case. Patients can export and delete their data at any time. The supervising clinician cannot see conversation content without a patient-granted share or an audit-logged clinical-override event.

No biometric, voice-biometric, or neural data.

Peacefull does not collect biometric data, voice-biometric data, or neural data. If this changes in future versions, the change will be reflected in this card and patient consent will be re-obtained explicitly.

See the privacy notice and the Notice of Privacy Practices for the full patient rights and retention framework.

Update cadence and change governance

The structure of an FDA Predetermined Change Control Plan.

We follow the structure of an FDA Predetermined Change Control Plan, even though we are not currently under FDA authorization — because we will eventually pursue it.

Every model change, prompt change, and knowledge-base change is versioned. Changes that touch safety-critical paths are gated by the full six-layer evaluation and independent clinical-advisor review. This model card updates with every change. Previous versions remain accessible.

The Clinician Governance Console's binding veto operates in the live runtime today (runtime Stage 5). The full Clinician Governance Console interface — binding-veto controls, brain versioning, review queue, audit-trail playback — is in active development. Where we describe the Console as "coming next," we mean that interface, not the runtime veto, which is already in force.

Known limitations

Published as findings, not hidden.

Evidence base is young.

The evidence base for AI in mental health is young. Our own outcome evidence is not yet published. Our research program — pre-registered, IRB-approved, and described at /research — is designed to generate it, beginning with a 90-day observational study and moving to a hybrid effectiveness-implementation trial and a micro-randomized trial in months 4–6.

Current scope is narrow.

Adults, English-language, United States. Evaluation is constrained to that population. Subgroup performance within those bounds is reported with each model card update. Performance outside those bounds is not claimed and is not the product's scope.

Not for minors. By design.

We do not serve minors. This is not a capability gap we are working around. It is a design decision.

A companion, not a substitute.

The companion is not a substitute for therapy, for a crisis service, or for human care. It is a companion between sessions.

Red-team protocol is newly standardized.

Failure modes not yet characterized exist. We will find them. We will publish them.

Accountability

Names attached. Addresses that answer.

Responsible executive.

[name], Chief Clinical Officer.

Chair, independent safety review board.

[name, institution].

Model card concerns.

modelcard@peacefull-ai.io

Safety reports — patients, clinicians, educators, researchers.

File a structured report (below), or email safety@peacefull-ai.io.

File a safety report

Anonymous reports are accepted — email is optional. Triage cadence is published in our safety runbook; urgent reports get a first response within one business day.

Version log

Every change. Preserved.

v0.2.2 · 2026-05-31

Stage advanced to BETA-1 — supervised live use with the Texas Founding Cohort; no change to regulatory status (still not FDA-cleared, still not a medical device under current scope). Added "How a response is produced" (the seven-stage supervised runtime). Clarified the relationship between the six-layer change-time evaluation and per-response runtime enforcement. Recorded that the Clinician Governance Console's binding veto is live in the runtime today. No subtractions; the ten named failure modes and six evaluation layers are unchanged.

v0.2.1 · 2026-04-23

Correction to v0.2. Revised the engagement-optimization language in the "Designed against dependency" section to accurately describe current practice — engagement is tracked and used for product-level improvement on anonymized data; individual-level session patterns are surfaced to the supervising clinician as clinical signal. The earlier v0.2 commitment to non-engagement-optimization was removed because it did not match actual product practice. Other dependency-design commitments (no human persona, explicit AI self-identification, surfacing of human connections, dependency-signal monitoring) remain unchanged.

v0.2 · April 2026

Response to internal red-team review against the APA November 2025 health advisory. Additions: dependency-and-attachment section as a first-class design commitment; dependency, help-seeking suppression, and bias added as named failure modes; Layer 6 (bias and equity) added to safety evaluation; explicit non-impersonation language; APA scope carve-out citation; actionable-crisis-contact commitment; enumeration of sensitive disclosure categories; advertising-and-data-sale prohibition; biometric-data scope bound; clinician onboarding requirement; clinician inquiry-about-other-AI-use requirement; vulnerable-population exclusions expanded. No subtractions from v0.1.

v0.1 · March 2026

Initial public model card. Pre-alpha.

This card is maintained in the spirit of Mitchell et al.'s model card framework (2019), extended for the specific demands of clinician-supervised mental health AI. The template is ours. The accountability is ours. If you find something missing, or something misleading, tell us.