Home/DueCare Hub/Research monitor
Public-source acquisition

Public-source scraping. Proposals, never automatic truth.

The research monitor watches a curated list of public sources for changes, drafts proposed pack diffs from what it finds, and routes them to the curator review queue. Crawlers never publish; curators do.

01 · Source list

What the monitor watches.

Public, citable URLs only. No private channels, no scraping behind login walls, no third-party aggregators that strip provenance.

Government

Labour ministries & consulates

Source-country and host-country labour-ministry pages, embassy advisories, official wage-protection notices.

Inter-governmental

ILO, UN, IOM

Public reports, corridor advisories, regional bilateral agreements. Cited by document id.

NGO

Verified advisory orgs

Allow-listed NGO publications with public, dated advisories. Allow-list is curator-controlled.

Court & policy

Court & policy archives

Public court rulings, policy documents, gazettes affecting recruitment or labour conditions.

Platform

Platform policy pages

Recruitment-platform terms-of-service and policy pages, used to detect platform-side rule changes.

News

News on corridor changes

News only as a discovery signal. The actual citation is always the underlying public document.

02 · Crawler flow

Discover, extract, propose.

The crawler runs on a schedule and emits proposals. Every proposal carries a source URL, content hash, and detected change.

01

Discovery

Pull each allow-listed source on schedule. Compute a content hash; compare against the last seen hash.

02

Change detection

If the hash changed, run a structural diff to summarize what moved (new section, removed clause, updated number).

03

Fact extraction

The harness extracts dated, citable facts from the diff with a strict schema: claim, source URL, source hash, date.

04

Privacy & policy gate

Reject if the document or extracted facts contain PII or anything resembling a private case. Only public, structural facts pass.

05

Pack-diff proposal

Approved facts become a draft pack diff against the relevant corridor pack. Status: proposed.

06

Curator review

Curator reviews the diff, asks for revisions or rejects, or approves. No automatic publish path exists.

07

Regression run

If approved, the new pack version runs through the test suite before publication.

03 · Endpoints

What partners can call.

Partners with a curator key can submit proposals directly; everyone can read the queue and review status.

Method
Path
Description
Auth
POST/api/hub/opencrawl/updatesSubmit a public-source URL with a proposed change summary.Partner key
GET/api/hub/opencrawl/updatesRead the queue with status filters: proposed, in-review, approved, rejected.Public
GET/api/hub/opencrawl/sourcesRead the allow-listed source list and last-seen hashes.Public
GET/api/hub/opencrawl/diff/<id>Read the human-readable diff for a single proposal.Public