Home/Docs/Why Gemma 4

Model-choice rationale

Why DueCare runs on Gemma 4.

DueCare is shared infrastructure that has to run cheaply at NGOs, support worker-controlled and on-device-oriented deployments, behave well in many corridors and many languages, and be inspectable end-to-end. Gemma 4 lets us target all four bars in one stack.

Open weights

Yes

Apache-2.0-style terms; redistributable.

Inference cost

Low

Designed for frequent NGO, platform, and notebook runs.

Native tool calling

Built-in

Structured tool-call format from base.

On-device path

Targeted

Quantized small-model builds are the mobile packaging path.

01 · The capability gap

Why a general model alone falls short on this domain.

A useful rule of thumb for where AI gets dramatically better, fast.

capability spike ≈ verifiability × training attention × data coverage × economic value

Migrant worker exploitation scores low on every factor. Outcomes are hard to verify. Training corpora rarely emphasize labour law or recruitment-fee scams. Public data is thin and scattered across regulators, NGOs, and court filings. The economic incentive to optimize a frontier model for this domain is small relative to coding or search.

Stock models, no matter how large, give plausible-sounding answers that miss the cited statute, under-estimate corridor risk, or paraphrase a recruiter euphemism instead of flagging it.

Until inherent capability arrives, the gap has to be closed by structure: deterministic grep rules that fire before the model speaks, retrieval against vetted corridor packs, tool calls that ground claims in verified sources, and a harness ecosystem that can refuse, narrow, anonymize, evaluate, or train around an answer when needed. Gemma 4 makes that structure practical: it ships with native tool-calling and supports local, edge, and on-device-oriented deployment paths.

02 · Five reasons

Why this family, specifically.

Each reason maps to something concrete a partner or worker actually feels.

Reason 01

Open weights keep sensitive work local

Open weights mean an NGO can run DueCare on its own server, and a worker/mobile deployment can target device-local inference where the selected quantized build is available. With Gemma 4, raw casework can stay inside worker-controlled, NGO-controlled, or tenant-controlled environments, and sensitive PII can be anonymized locally before any optional submission.

Reason 02

Efficient inference makes the math work

Gemma 4 gives DueCare a practical open-weights inference path at the sizes the project uses. Most users of DueCare are NGOs and labour-ministry teams; their tooling budget is small. Efficient local inference is what makes “run the relevant harnesses on every job post” or “answer every worker question with a cited draft” plausible in real deployments.

Reason 03

Native tool calling, not bolt-on

Gemma 4 emits a structured tool-call format out of the base model. The harness uses it directly. That is how grep-rule outputs, knowledge-pack lookups, and license-registry checks get composed into a single answer. That removes a whole category of brittle prompt-engineering between the model and the tools.

Reason 04

Open weights are fine-tunable

The harness can pair base packs for general behaviour with per-corridor adapters for the language and idioms a corridor actually uses. A closed API would force every deployment to carry the same broad context. With Gemma 4, the training and adapter path can stay aligned to the corridor and deployment environment.

Reason 05

On-device is the worker chat target

The small quantized model path is what makes worker-controlled mobile packaging plausible: a worker can receive cited guidance through a trusted app or partner channel without sending raw chats to the public hub. The current Kaggle and local runtimes demonstrate the same harness and sensitive-data handling path while mobile packaging is integrated.

Reason 06

Inspectable end-to-end

Researchers can reproduce our evals against the same weights we ran. They can replay an audit feed, pull the same pack version, run it through the same model, and see whether the answer matches. With a closed API, “reproducible” degrades to “snapshot of someone else’s output”.

03 · Versus alternatives

What we considered.

Each row is something DueCare needs. The Gemma 4 column is what we pay; the others are what we’d give up.

What we need	Gemma 4	Closed frontier API	Older small open model
Open weights, redistributable	Yes	No	Often yes
On-device-oriented deployment	Small quantized path	N/A. cloud-only	Sometimes. quality drops
Inference cost at NGO scale	Efficient local path	Recurring API cost	Cheap, but capability gap
Native tool-call format	Built-in	Yes (proprietary)	Bolt-on, brittle
Fine-tune per corridor	LoRA adapters; supported	Limited; closed pipeline	Yes. but starts further behind
Multilingual coverage of corridors	Strong on target languages	Strong, opaque	Spotty; needs heavy adapter work
Reviewer can reproduce outputs	Yes. same weights	Snapshots only	Yes
Vendor-lock risk	None	High	None

04 · What DueCare adds on top

The base model is a starting point, not the product.

Gemma 4 is the substrate. The harness, knowledge packs, tools, evals, and corridor adapters are what make it useful.

Layer 01

Knowledge packs

Signed corridor knowledge with public-source citations. The model never claims a fact a pack can’t cite.

Layer 02

Grep rules

Cheap deterministic detectors that run before the model. Catch obvious patterns without an inference call.

Layer 03

Tools registry

Allow-listed lookups (license registers, embassy contacts) called via Gemma 4’s native tool-call format.

Layer 04

Corridor adapters

Small LoRA adapters per corridor + sector, layered on top of the same base weights.

Layer 05

Evaluation suite

Reproducible evaluation packs. reviewers run the harness against pinned pack versions and replay outputs.

Layer 06

Harness ecosystem

Refusal logic, citation enforcement, sensitive-data handling, judging, training, and audit emission. Gemma 4 doesn’t do this on its own; the harnesses do.

05 · Gemma 4 features demonstrated

Three reference kernels that exercise the Gemma 4 capabilities the rubric names.

Each is a zero-inference kernel under kaggle/A-2N-*/ that lands the canonical 4-file v1.0 bundle via duecare.appendix_primitives.write_v1_bundle(). Reviewers can run all three in under three minutes with no GPU.

A-21

Long context (128K)

Five-statute compliance corpus loaded into a single Gemma 4 context window. Three cross-statute QA pairs each correlate 2–3 statutes in one thinking step. See kernel →

A-22

Token streaming

Server-Sent Events replay at real Gemma 4 E4B-IT latencies (~500 ms first token, ~25 ms subsequent). Live first-token + token-rate stats. See kernel →

A-23

Native function calling

One Gemma 4 thinking step emits a 3–4-tool plan; runtime fans out; one synthesized response returns. ~3× speedup vs the chat-loop equivalent. See kernel →

06 · What this means in practice

Run DueCare on your laptop, in a notebook, on an edge box, or through a mobile-oriented deployment.

The same harness ecosystem, the same packs, the same audit feed, on infrastructure you control.

Setup guides →