Safety rules · DueCare AI

01 · How a single turn works

Paste → rules fire → context injected → grounded answer.

Every turn passes through the same four stages. The diagram below is annotated against a labeled composite worker message.

01 · Paste

User input arrives locally

A worker, caseworker, or moderator pastes raw text into the local DueCare runtime. Nothing leaves the device.

Recruiter: pay 120,000 NPR before flight. We keep your passport until you sign. Job is construction in Doha, 6-day weeks.

→ stored locally, never transmitted

02 · Fire

Grep rules classify in <1 ms

Pattern rules + small classifiers tag the message. Each tag is a citation handle for the next step.

fee_request · matched “120,000 NPR before flight”

rule R-104 · confidence 0.97

passport_handover · matched “keep your passport”

rule R-211 · confidence 0.99

corridor:qa-np · matched “Doha … flight”

rule R-061 · context

03 · Inject

Grounded context attached

The runtime pulls vetted corridor pack snippets matching the fired tags, and constrains the model to cite them.

# injected into system prompt
corridor: qa-np v2026.05.06
rules_fired: [fee_request,
passport_handover]
cite_from: [pack/qa-np §3.2,
ILO C181]
refuse_if: "legal counsel"
draft_only: true

04 · Answer

Gemma 4 responds, grounded

The model writes a draft using the injected context. Citations are required; refusal phrases are honored.

DueCare draft · grounded

Two patterns in this message look like recruitment violations under your corridor pack: worker-paid fees and passport retention. Both are prohibited under the Qatar–Nepal corridor rules.

↗ pack/qa-np §3.2 ↗ ILO C181

Draft only. DueCare drafts; the user or trusted caseworker decides.

02 · Rule categories

Multiple families. 100+ rules.

Rules are versioned alongside the corridor packs they cite. Each rule has a public ID, a deterministic match, and a required citation.

Recruitment 56 rules

Fees, contracts, identity documents.

Catches the most common pre-departure violations: worker-paid fees, contract substitution, passport retention, identity-document handover.

R-104fee_request

R-128contract_substitution

R-211passport_handover

R-219id_card_retention

R-235deceptive_role_label

Workplace 43 rules

Wage, hours, conditions.

In-country patterns: unpaid overtime, withheld wages, hours beyond legal cap, denial of rest day, dormitory conditions.

R-302withheld_wages

R-318overtime_unpaid

R-322hours_above_cap

R-340rest_day_denied

R-355dorm_conditions

Coercion 33 rules

Patterns indicating force or threat.

Highest-priority signals. When fired, the runtime widens citation scope and biases output toward referral language. never auto-action.

R-401movement_restriction

R-414debt_bondage_signal

R-422threat_of_deportation

R-431communications_blocked

Anti-jailbreak 29 rules

Keep the model on topic and honest.

Refuse legal counsel framing, reject auto-report requests, keep generation within the cited pack scope.

J-501legal_counsel_refusal

J-512auto_report_refusal

J-520out_of_pack_refusal

J-538role_impersonation

03 · Anatomy of a rule

Every rule is one auditable file.

Rules are TOML files in the public repo. Pattern, scope, citations, confidence threshold, and required model behavior. all in one place.

rules/recruitment/R-211.passport_handover.toml ↗ open in GitHub

# R-211 · passport_handover
# Catches recruiter or employer language indicating the worker
# must surrender their passport. Prohibited under most receiving
# jurisdictions and ILO C181.

id = "R-211"
family = "recruitment"
version = "2026.05.06"

match.any = [
  "keep your passport",
  "hold (your |the )?passport",
  "hand over (your )?passport",
  "surrender passport",
]
match.confidence_floor = 0.85

scope = ["corridor:*"]   # applies to all corridors
cite_from = [
  "pack/<corridor>.recruitment",
  "ilo/C181",
]

required_in_answer = [
  "draft_only",
  "caseworker_handoff_option",
]
refuses = [
  "legal_counsel",
  "auto_report",
]

Sensitive data handling

The grep layer is local, always.

No grep rule fires on the hub. No matched text, classification result, or confidence score is ever sent upstream. The only thing that can. optionally be submitted is an anonymized aggregate pattern_id × corridor × sector tuple, after local Gemma 4 anonymization and the server-side PII detector.

The grep layer never sends matched text to the public hub.

×Matched message text crosses the boundary

×Per-user confidence scores are stored centrally

×Rules are pulled from a private hub registry

✓Rules ship with the public open-source release

✓Tuples cross only above k-anon ≥ 30

Rules fire before the model speaks.