Reference architectures

One system, four ways to deploy it.

There is no single architecture diagram that covers every use case honestly. Platform safety moderation, NGO caseworker support, anonymized data sharing, and public chatbot deployments each have different inputs, privacy boundaries, and rules for what can or cannot leave the runtime. This page documents each one separately, plus a note on how the underlying model is tuned.

01

Platform safety moderation

A platform that already handles user-generated content runs the DueCare runtime inside its own data plane. The runtime classifies and explains content against a curated rule set; the platform's existing systems decide what to do with the result.

Where it runs
Platform VPC, on a self-hosted GPU.
What's exposed
Nothing leaves the platform environment.
What it returns
A risk score and a suggested action (delete, escalate, notify). The platform's enforcement systems make the call.
Input Platform content stream Messages, posts, listings, and attachments flowing through the platform's existing pipeline.
Runtime · in platform VPC DueCare moderation runtime Model + safety guidance + rule packs. Classifies, explains, and cites.
Output Risk score + suggested action A per-item risk score plus a suggested action (e.g. delete, escalate to review, notify reporting authority). The platform's existing systems decide what to enforce.

Inside the runtime (platform VPC)

ModelGemma 4, local inference
GuidancePersona, GREP rules, RAG
PacksPlatform-specific rule packs
TestsQuality testing on every pack update
No content leaves the platform environment. The runtime never calls back to DueCare.

Stays on platform side

  • All user-generated content
  • All risk scores & suggested actions
  • All audit logs

Never leaves

  • Raw posts, messages, or attachments
  • User identifiers
  • Enforcement decisions
02

NGO caseworker implementation

An NGO caseworker runs the DueCare runtime on their own laptop or workstation. They paste, type, or summarize a situation and the system produces a cited draft against the relevant signed knowledge packs. No case content leaves the device.

Where it runs
The caseworker's own laptop or workstation (local GPU recommended).
What's exposed
Nothing leaves the device.
What it returns
A cited draft for the caseworker to review.
Input Caseworker UI on device Caseworker types or summarizes a situation. No upload.
Runtime · on device DueCare local runtime Model + guidance + vetted packs. All inference is local.
Output Cited draft for caseworker A draft answer with public-source citations + pack version stamp. Caseworker decides.

Inside the runtime (caseworker's device)

ModelGemma 4, local inference
GuidancePersona, GREP rules, RAG
PacksSigned corridor & jurisdiction packs
UpdatesPack pulls only; no case push
No case content ever leaves the device. Pack updates are pulled in, signed, and one-way.

Stays on device

  • Anything the caseworker types
  • All drafts, edits, and decisions
  • All local logs

Never leaves

  • Worker / claimant identifying details
  • Free-text case content
  • Anything that could re-identify a person
03

NGO implementation with anonymized data sharing

An NGO that runs DueCare locally (as in deployment 02) can also opt in to contributing anonymized patterns, not cases, to a shared insights server. The local anonymization module is the gate; only k-anonymous, identifier-stripped signals can leave the device.

Where it runs
On-device runtime, plus an opt-in signal channel to a shared insights server.
What's exposed
Only k-anonymous, identifier-stripped signals; never case content.
What it returns
The same local draft as deployment 02; sharing is purely additive.
Input Caseworker UI on device Caseworker types the situation locally; nothing leaves the device unprompted.
Runtime · on device DueCare local runtime Same model + guidance + packs. Produces draft.
Output Cited draft for caseworker Local draft with rule citations. The caseworker decides what to do with it.
Opt-in anonymized side-channel
Same runtime · on device DueCare local runtime Identifies anonymized patterns (e.g. fee anomaly type X in jurisdiction Y).
Local Gemma 4 anonymization · on device Local anonymization module Anonymizes sensitive PII before submission, enforces the k-anonymity floor, and drops anything that fails.
Off-device Shared insights server Aggregated, anonymized signals only. No raw cases.
Opt-in. Only signals that pass the local anonymization module ever leave the device.

What may leave (opt-in only)

  • Anonymized pattern type (no free text)
  • Jurisdiction-level location (no precise location)
  • K-anonymous bucket counts

Never leaves, even with opt-in

  • Worker / claimant names or IDs
  • Employer or recruiter names
  • Free-text case descriptions
  • Anything that fails the k-anonymity floor
04

Chatbot endpoint

For partners that want to expose DueCare through an existing chat surface (a regulator hotline web widget, a partner-operated messenger bot, or a labour-rights helpline). The endpoint is hosted on a GPU-enabled server; the channel adapter handles the chat surface. The system still drafts; the partner still decides.

Where it runs
A partner-hosted endpoint with a channel adapter (e.g. WhatsApp, web).
What's exposed
The hosted endpoint sees inbound questions; raw case intake is not accepted.
What it returns
A drafted response; the partner still decides what to send.
External channel Partner chat surface Web widget, messenger app, or partner-operated chatbot.
Adapter · partner side Channel adapter Normalizes message in/out. Strips channel-specific identifiers before forwarding.
Runtime · partner-hosted GPU DueCare endpoint Same runtime as 01–03. Drafts cited replies. Returns through adapter.

Endpoint posture

AuthPartner-issued tokens · per-channel
LogsHashed inputs · drop after retention window
RatePer-channel quotas · partner controls
RefusalOut-of-scope & no-pack cases handled explicitly
Endpoint refuses raw case intake. Free-text identifiers are stripped at the adapter.

Handled at endpoint

  • Cited drafts based on vetted packs
  • Out-of-scope refusals with reason
  • Pack-version stamps on every reply

Endpoint will not do

  • File complaints, send messages, or call services
  • Store raw user messages past retention
  • Answer without a relevant vetted pack

Cost-per-token (estimates) for platform safety implementations

An order-of-magnitude comparison of the per-million-token cost of using a hosted commercial LLM API versus running DueCare's self-hosted Gemma 4 runtime for a moderation-style workload. Numbers are placeholders pending our own benchmark. see caveat below.

Option Input cost Output cost Privacy posture Latency profile
Frontier hosted API e.g. top-tier commercial chat model ~$3–15/1M tokens (in) ~$10–60/1M tokens (out) Data leaves customer perimeter unless contractual carve-outs Network round-trip · subject to provider
Mid-tier hosted API e.g. general-purpose hosted SLM ~$0.50–2/1M tokens (in) ~$1.50–8/1M tokens (out) Varies by tier and contract Network round-trip · usually faster
DueCare · self-hosted Gemma 4 customer / partner GPU · in-VPC ~$0.05–0.30/1M tokens (amortized) ~$0.05–0.30/1M tokens (amortized) Stays in customer / partner environment Local inference · no external round-trip
DueCare · on-device caseworker laptop / workstation Marginal cost ≈ device electricity/per request No network egress for inference Bound by device capability
Why this matters for platform safety: moderation workloads are high-throughput. At platform-scale token volumes, the gap between a hosted API and a self-hosted small model compounds quickly. and an in-VPC deployment removes the privacy carve-outs that hosted APIs require for sensitive content streams.

Caveat. these ranges are illustrative, not authoritative. Hosted API prices change frequently and vary by tier, region, and contract. Self-hosted amortized cost depends heavily on GPU choice, batch size, utilization, and electricity. We will publish a benchmark with specific assumptions, request shapes, and measured throughput alongside the v1 release; this section will be updated to reflect that. Treat the numbers above as orientation, not quotation.

How the model is tuned (and when it's not)

For most deployments, no fine-tuning is required. the base Gemma 4 model is steered by the safety guidance layer (persona, GREP rules, RAG over vetted packs). Fine-tuning is reserved for cases where the safety guidance layer cannot reach acceptable behavior on its own, and is always followed by the full quality testing framework.

Input · public only Curated public material Laws, advisories, judgments. passed through the local anonymization module before training.
Dataset Tuning corpus Versioned. Public-source only. Reviewed before release.
Module Fine-tuning module Targeted updates · LoRA-style adapters or full fine-tune depending on goal.
Module Fine-tuning module Reads the curated corpus, applies tuning recipe, emits a candidate model artifact.
Gate Quality testing framework Behavioural + regression evals · LLM-judge spot-checks · refusal coverage.
Artifact Signed model artifact Versioned. Pinned to a pack range it was tested against. Distributable.
No worker case data is ever used for tuning. Tuning corpora are public-source only.

▸ Default posture: no fine-tuning. The safety guidance layer is the first lever; fine-tuning is the last.