Citation
Does the harness anchor every claim to the correct pack citation? Does it refuse when no citation exists?
DueCare ships nothing without an evaluation pass. Every pack release runs a regression suite. Every adapter retrain runs a comparison set against the previous adapter. The A07 Kaggle notebook turns Persona + GREP + RAG + Tools traces into SFT data and DPO preference pairs, then re-benchmarks before any model artifact is published.
Nothing skips a stage. A failed stage either bounces back to revision or rejects the candidate.
Pull public-source updates from the research monitor. Pull reviewed partner submissions. Pull evaluation-pack proposals from researchers.
Curators review proposals, attach citations, and shape pack-diffs or new evaluation cases.
Optional: run A07 to train a corridor LoRA adapter with SFT, then DPO over harness-on vs. raw-Gemma answers. Base Gemma 4 weights never change.
Run the regression suite + new cases against the candidate adapter / pack. Compare to last vetted release.
If evaluation passes, curators approve the pack and / or adapter release. Old approvals remain in the public log.
Append-only release to the hub. Audit row emitted. Subscribers on the “Pack updates” topic notified.
npl-qat-construction@1.4.0-rc1Compared against last vetted release 1.3.0. Run on Gemma 4 base + corridor adapter npl-qat-cons.lora@0.7.1.
One regression on stale.citation.detect: candidate is more permissive than 1.3.0 with citations older than 18 months. Curator review queued; release blocked until fixed or explicitly waived.
Every case is a pack-versioned JSON object with a fixed input, an expected behaviour, and a citation rule. Researchers can pull and run the same suite locally.
Does the harness anchor every claim to the correct pack citation? Does it refuse when no citation exists?
Does the harness correctly identify patterns it should: fee requests, passport handling, identity mismatch?
Does the harness refuse where it should: legal advice, emergency action, anything outside the pack?
Are tool calls well-formed? Do they hit the right tool with valid arguments? Do they handle tool errors?
Are corridor-language responses faithful, idiomatic, and numerically correct?
Source poisoning, prompt injection, citation laundering. Cases authored by partners and external researchers.
DueCare does not retrain Gemma 4 base weights. We train two named LoRA adapters on top — SafetyJudge (anti-exploitation reasoning, trained by A-07 bench-and-tune via Unsloth SFT + DPO) and PrivacyRedactor (PII anonymization for the local-intake path, trained by A-12 pii-fine-tune-eval). Both train on curated public, synthetic, composite, or anonymized data. Never on raw worker chats or raw case content.
| Layer | Retrained? | Cadence | Training data |
|---|---|---|---|
| Gemma 4 base weights | No | Track upstream releases | n/a. we use Google’s checkpoint |
| SafetyJudge adapter (LoRA, A-07) | Yes | Quarterly or on material pack change | A-06 graded prompts + Persona+GREP+RAG+Tools traces — Unsloth SFT then DPO (harness-on chosen, raw rejected) |
| PrivacyRedactor adapter (LoRA, A-12) | Yes | On gold-data refresh from A-10 | A-10 PII synthetic composite intake / redaction pairs; placeholders only, no raw PII |
| Translation adapter (per language) | Yes | Twice yearly | Public bilingual corpora; partner-reviewed terminology |
| Tool-call adapter | Rarely | On registry-schema bump | Synthetic tool traces; no real call data |
| Knowledge packs (data, not weights) | Continuously | As public sources change | Research monitor + reviewed partner submissions |
| Worker-chat content | Never | Never | Forbidden. Raw chats stay local unless transformed into an approved, anonymized training example. |
RAG, GREP, tools, and persona responses are useful training signals only after they become approved examples. The public claim is SFT + preference optimization, not a hidden RL loop.
Run an approved prompt through Persona + GREP + RAG + Tools. Store the bare prompt as the user turn and the cited harness answer as the assistant target.
Use the harness-on answer as chosen and the raw Gemma answer as rejected. This is the first preference-training path before any PPO/GRPO-style RL is added.
A07 re-runs stock, SFT, and DPO variants. A11 regenerates harness-lift reports. Any PII leak, citation regression, or unsafe-help increase blocks release.
Pull the vetted pack, pull the matching adapter, run the evaluation suite locally, or rerun A07 on Kaggle with the same git SHA and dataset version. Mismatches against the published numbers are bug reports.