Every data source the JIL platform ingests, by LOB and refresh cadence.
JIL is a verification network. The integrity of every CREB™ we seal traces back to the data we ingested, when we ingested it, where the source is, and which line of business consumed it. This page lists all of them. Federal public datasets (no contract required), commercial subscriptions (when the customer engagement requires depth a public source cannot reach), and customer-supplied records (under BAA / GLBA basis). Replay-grade transparency, not vague positioning.
Legend: LIVE ingested in production. WIRED code in place; pending DUA, key, or live data. PENDING not yet implemented. PAID commercial subscription required.
Why so many WIRED rows? Three reasons, all by design. (1) Several federal portals (USCIS, DOJ, UN Comtrade) refuse anonymous bulk pulls; the integration is coded and a synthetic backstop runs the POC, but full ingest waits on a DUA or paid endpoint. (2) Commercial subscriptions (Bloomberg, Refinitiv, Plaid, IRS IVES, ATTOM, Etherscan Pro) are engagement-funded. We do not pay speculatively. The integration is built so a customer can flip a key on Day 1 of an engagement. (3) Customer-owned tools (Chainalysis KYT, TRM Labs) federate via webhook. Customers ride their own subscriptions; we never proxy. WIRED is a credibility signal, not a gap.
2026-05-04 update: NHTSA FARS (60,762 vehicle-crash rows, FY2022 National) and BLS SOII (1,848 NAICS x year x state injury rows) flipped to LIVE; pc-poc and wc-poc are now backed by real public-data feeds.
2026-05-03 update: Pulled 8 federal/free sources to LIVE - SEC EDGAR (32,971 filings across 10 institutions), TreasuryDirect (493 active securities), FFIEC bulk-data index, FINRA BrokerCheck + disciplinary index, CFTC press-release scrape, CMS PECOS provider enrollment + ownership (10 datasets, 2,000 row sample) - ~35,700 records seeded. Plus a Pre-Clearance Stage 1 address-intelligence sweep: 6 sources merged into a single in-memory lookup (33,547 unique labeled addresses across sanctioned / DPRK / mixer / scam / DeFi / bridge / exchange categories, loaded by the preclearance service at boot). All anonymously pullable, no key required. See the pre-clearance architecture doc for how Stage 1 uses these.
Public, free, no contract required.
Every row below pulls live from a federal data publisher. No subscription, no DUA, no per-record licensing. JIL ingests, hashes the source file for replay, indexes into postgres, and runs the LOB-specific check pack. These are the sources behind the eight live POC pages and the CMS attestation backbone.
| Source | Provider | Refresh | Format | LOB(s) | Status | Live POC |
|---|---|---|---|---|---|---|
| Fails-to-Deliver Register | SEC FOIA | Monthly (a/b half-files) | pipe-delimited | capmarkets | LIVE | capmarkets-poc (339K rows) |
| USAspending.gov API | Treasury / OMB | Real-time / daily | JSON REST | grants, federal-investigator | LIVE | grants-poc (1K awards / $2.96T) |
| DOL OFLC LCA Disclosure | DOL ETA | Quarterly | XLSX | h1b | LIVE | h1b-poc (337K real LCAs) |
| UN Comtrade API | UN Statistics Division | Annual / Quarterly | JSON REST | trade-finance | WIRED | trade-finance-poc (rate-limited; synthetic backstop) |
| USCIS Regional Centers | USCIS | Ad-hoc | HTML / PDF | eb5, federal-investigator | WIRED | eb5-poc (anon-blocked; synthetic backstop) |
| USCIS Data Hub processing-times | USCIS | Quarterly | JSON REST | eb5 | WIRED | eb5-poc |
| NHTSA FARS (Fatality Analysis Reporting System) | NHTSA | Annual | CSV (zipped) | pc | LIVE | pc-poc (60,762 vehicle-crash rows, FY2022 National) |
| BLS Occupational Injuries (SOII) | Bureau of Labor Statistics | Annual | CSV / API | wc | LIVE | wc-poc (1,848 NAICS x year x state injury rows) |
| CMS Medicare Inpatient by Provider+Service | CMS | Annual | CSV | MCO, federal-investigator | LIVE | ava-poc (145K rows / $90.94B) |
| CMS Outpatient by Provider+APC | CMS | Annual | CSV | MCO | LIVE | ava-poc (117K rows) |
| CMS DMEPOS by Referring Provider | CMS | Annual | CSV | MCO | LIVE | ava-poc (498K rows) |
| CMS Part D Prescriber by Drug | CMS | Annual | CSV | MCO | LIVE | ava-poc (476K rows) |
| CMS Provider of Services (POS) file | CMS | Quarterly | CSV | MCO, federal-investigator | LIVE | ava-poc (44K rows) |
| CMS Hospice utilization | CMS | Annual | CSV | MCO | LIVE | ava-poc (5,772 rows) |
| NPPES (NPI Registry) | CMS | Weekly | CSV bulk | MCO, all KYC | LIVE · ~9.37M providers | ava-poc |
| CERT FY2024 detector library | CMS | Annual | internal seed | MCO, federal-investigator | LIVE | ava-poc |
| CMS Owners file (regional centers, ownership) | CMS | Quarterly | data.cms.gov API | MCO | LIVE · 5 ownership datasets seeded 2026-05-03 | UBO graph |
| PECOS (Provider Enrollment Chain & Ownership) | CMS | Quarterly | data.cms.gov API | MCO, federal-investigator | LIVE · 5 enrollment datasets seeded 2026-05-03 | UBO graph |
| MAC jurisdiction map | CMS | Quarterly | internal seed | MCO | LIVE | ava-poc |
| Etherscan public API (token transfers) | Etherscan | Block-level (~12s) | JSON REST | p2p, wallet-intel | LIVE | p2p-poc (1K USDC transfers) |
| SEC EDGAR (filings) | SEC | Real-time | JSON / XBRL | capmarkets, asset-intel | LIVE · 32,971 filings across 10 institutions seeded 2026-05-03 | capmarkets-poc |
| TreasuryDirect auctioned securities | US Treasury | Real-time | JSON REST | capmarkets | LIVE · 493 active Bills/Bonds/TIPS/FRN seeded 2026-05-03 Not Started | capmarkets-poc |
| FFIEC bulk-data download (call reports) | FFIEC | Quarterly | CSV bulk | capmarkets | LIVE · index seeded 2026-05-03; full bulk pull on customer engagement | capmarkets-poc |
Cross-vertical compliance feeds.
These feed every LOB. Identity, sanctions, exclusions, beneficial-ownership lookups. Most are free; OpenCorporates carries a free tier for low volume and a paid tier for entity-resolution at scale.
| Source | Provider | Refresh | Type | LOB(s) | Status |
|---|---|---|---|---|---|
| OFAC SDN List | Treasury OFAC | Daily | Public free | all KYC, p2p, trade-finance | LIVE · ~37.9K + ~720 crypto addresses |
| UN Consolidated Sanctions | UN Security Council | Daily | Public free | all KYC | LIVE · ~1K entries |
| HMT (UK) Consolidated List | HM Treasury UK | Daily | Public free | all KYC | LIVE · ~39.5K |
| US Consolidated Screening List (CSL) | Commerce + State + Treasury (trade.gov) | Daily | Public free | all KYC, all vendor | LIVE · ~25.6K |
| EU Consolidated Financial Sanctions | EU Council (FSF/FSD direct) | Daily | Public free | all KYC | LIVE · ~6K (direct + OpenSanctions federation) |
| OpenSanctions / Yente | OpenSanctions | Daily | Public free | all KYC | LIVE · ~74.6K |
| FATF High-Risk + Monitored Jurisdictions | FATF | Triannual | Public free | all KYC, p2p, trade-finance | LIVE · ~20 jurisdictions |
| OIG LEIE (excluded individuals) | HHS OIG | Monthly | Public free | MCO, federal-investigator | LIVE · ~83K |
| SAM.gov exclusions | GSA | Daily | API-keyed (approved 2026-05-11; key valid 72 days; daily expiry alerter on Hetzner cron) | grants, federal-investigator, all vendor | LIVE · 110,000 of 167,456 records in Postgres (66% coverage; daily incremental catches up); /entity-information/v4/exclusions; uplift to Snowflake in flight |
| SAM.gov entity registration (UEI, CAGE, registration status) | GSA | On-demand lookup | API-keyed (same key as above) | all vendor, federal-investigator, grants | LIVE · /entity-information/v3/entities verified 2026-05-11; uplift to Snowflake in flight |
| CMS PECOS Provider Enrollment | CMS | Monthly | Public free | MCO, federal-investigator | LIVE · loader seeded 2026-05-03 (Hospital, Hospice, SNF, HHA, FFS) |
| Treasury DNP (Do Not Pay) | US Treasury BFS | Real-time | Authorized only | federal-investigator, grants | BLOCKED · needs gov customer DUA |
| GLEIF LEI Registry | GLEIF | Daily | Public free | all institutional | LIVE · ~3.3M LEIs |
| FinCEN BOI Reporting (when published) | FinCEN | Real-time | Public free | all KYB | PENDING |
| FINRA BrokerCheck (individual + firm) | FINRA | Real-time | Public free JSON | capmarkets | LIVE · BrokerCheck API seeded 2026-05-03 (50 individual records across 5 surnames) Not Started |
| FINRA disciplinary database | FINRA | Real-time | Public free | capmarkets | LIVE · index seeded 2026-05-03; per-case scraper on engagement |
| CFTC enforcement database | CFTC | Real-time | HTML scrape (Drupal 10 migration killed RSS) | capmarkets, trade-finance | LIVE · 37 press-release URLs seeded 2026-05-03 |
| DOJ enforcement / qui tam relator records | DOJ | Real-time | Public free | federal-investigator, MCO | WIRED |
| OpenCorporates (entity registry) | OpenCorporates | Real-time API | Free + paid tier | eb5, all KYB | WIRED |
| RDAP domain age + WHOIS | ICANN / registrars | Real-time | Public free | all BEC | LIVE |
| OFAC SDN crypto address mirror (multi-chain) | 0xB10C mirror | Daily (mirror updates) | Public free | pre-clearance Stage 1 | LIVE · 121 addresses across ETH/BSC/BCH/XMR/LTC/ZEC/DASH |
| ScamSniffer community phishing/scam database | ScamSniffer | Real-time (community-reported) | Public free | pre-clearance Stage 1 | LIVE · 2,530 EVM addresses |
| MEW Ethereum darklist (mixer / phishing / fraud) | MyEtherWallet | Real-time (community-curated) | Public free | pre-clearance Stage 1 | LIVE · 715 curated entries |
| Etherscan label cloud (CEX / DEX / Bridge / Mixer) | brianleect mirror | Periodic scrape | Public free | pre-clearance Stage 1 | LIVE · 29,945 labeled addresses |
| DefiLlama protocols (DeFi protocol contract registry) | DefiLlama | Real-time API | Public free | pre-clearance Stage 1 | LIVE · 1,811 on-chain protocol contracts (7,429 protocols total) |
| DPRK / Lazarus Group attribution seed | Curated from public OFAC + Chainalysis + TRM + Elliptic incident reports | On incident publication | Curated seed | pre-clearance Stage 1 | LIVE · 6 publicly-attributed Lazarus wallets (Ronin, Atomic, Stake.com, CoinEx, Alphapo) |
Paid feeds for engagement-grade depth.
Tier 2 of the JIL economic model brings these in on a per-engagement basis. We do not carry the subscription cost as a fixed overhead; the customer engagement either funds the data path or chooses a public-data-only Tier 1 baseline. Every paid feed below has a public-data fallback or is optional for the verticals that consume it.
| Source | Provider | Refresh | Cost band | LOB(s) | Status |
|---|---|---|---|---|---|
| Bloomberg Terminal data | Bloomberg | Real-time | $$ | capmarkets, asset-intel | PENDING · engagement-funded |
| Refinitiv (LSEG) market reference | LSEG | Real-time | $$ | capmarkets | PENDING · engagement-funded |
| Chainalysis KYT / Reactor | Chainalysis | Real-time | $$ | wallet-intel, p2p | PENDING · customer rides their own |
| TRM Labs | TRM Labs | Real-time | $$ | wallet-intel, p2p | PENDING · customer rides their own |
| ATTOM Property + Address Intelligence | ATTOM Data | Daily | $ | MCO, pc | WIRED |
| Etherscan Pro (higher rate limit) | Etherscan | Real-time | $ | p2p, wallet-intel | WIRED · using free tier today |
| Helius RPC + DAS API | Helius | Real-time | $ | wallet-intel, p2p | WIRED |
| Plaid (banking data) | Plaid | Real-time | $ | Money Passport | PENDING |
| IRS 4506-C IVES | IRS | On-demand | $ per request | Money Passport | PENDING · IVES participant approval |
| MCG Care Guidelines (clinical criteria) | Hearst Health | Annual (versioned) | $$ | UM, MCO | PENDING · engagement-funded license |
| InterQual (clinical criteria) | Optum / Change Healthcare | Annual (versioned) | $$ | UM, MCO | PENDING · engagement-funded license |
Under BAA, GLBA, or comparable basis.
Customer-supplied records never leave the customer's perimeter. Verdict-engine ingestion runs inside the customer's tenant or against a read-only adapter on the customer's side. JIL receives only the signed verdict record and case-file artifacts, not the underlying data.
Settlement records
Trade records, SWIFT 5xx messages, FIX, ISO 20022 sese. Custodian / broker / fund-admin sources. Real-time stream when paid engagement is active.
Position files
Daily position records from each system that should agree (custodian, broker, fund-admin, CSD). Cross-system reconciliation runs against this set.
Bank wire records
Outbound wire instructions intercepted before release. Sub-2-second YES / NO / REVIEW gate.
MCO claim records
Provider claim files, encounter records, prior-authorization decisions. PHI; under BAA. Tier 2 claim integrity work.
UM determinations & appeals
Authorization, concurrent-review, and denial decisions plus appeal and IRO outcomes, ingested via X12 278 / 837 / 835, FHIR, HL7v2, and NCPDP from the plan's own auth, claims, and appeals systems. PHI; under BAA. The determination record the verdict engine anchors to the applied criteria and seals as court-ready evidence.
H-1B beneficiary documents
Sponsor-supplied labor condition files, payroll attestations. Optional Tier 2 deepening.
Workers' comp claims
Carrier-supplied claim event records, medical bills, employer records. Tier 2 only.
P&C claim files
Carrier-supplied claim event records, repair estimates, photos, telematics. Tier 2 only.
Trade finance documents
Letters of credit, bills of lading, customs declarations. Bank-supplied under BAA-equivalent for cross-border ops.
How fresh the verdict is, by source class.
Seconds to minutes
Etherscan, EDGAR, OFAC SDN delta, USAspending API, OpenSanctions, OpenCorporates API, GLEIF, RDAP. Latency from publication to JIL findings: seconds to minutes.
Standard cron pulls
OFAC SDN full refresh, HMT UK, EU, NPPES delta, SAM.gov exclusions, ATTOM. Standard cron pulls.
Tuesday-night cron
NPPES bulk, OIG LEIE delta, sanctions consolidation. Tuesday-night cron.
Calendar-month rollover
SEC fails-to-deliver register (a/b half-files), OIG LEIE full, MAC jurisdiction. Calendar-month rollover.
Calendar-quarter rollover
DOL OFLC LCA disclosure, USCIS Data Hub processing-times, CMS POS file, PECOS, FFIEC bank financials. Calendar-quarter rollover.
Calendar-year rollover
NHTSA FARS, BLS SOII, CMS Inpatient / Outpatient / DMEPOS / Part D / Hospice / SNF, CERT detector library. Lag of 6-18 months from end of year.
Which sources each LOB consumes.
capmarkets
SEC FTD, EDGAR, FFIEC, FINRA, CFTC, GLEIF + customer settlement records. Optional Tier 2: Bloomberg / Refinitiv.
grants
USAspending.gov, SAM.gov exclusions, OFAC SDN, GLEIF, OpenCorporates + customer-supplied awardee records.
h1b
DOL OFLC LCA, USCIS, OFAC, GLEIF, OpenCorporates + sponsor-supplied wage records.
eb5
USCIS Data Hub, USCIS regional centers, SEC EDGAR, OFAC, OpenCorporates + investor source-of-funds documentation.
p2p
Etherscan, OFAC SDN crypto-address attribution, OpenSanctions + customer transaction records. Optional: Chainalysis / TRM.
trade-finance
UN Comtrade, OFAC, GLEIF + bank-supplied trade documents.
pc
NHTSA FARS (60,762 vehicle-crash rows) + carrier-supplied claim records. Optional ATTOM for premise.
wc
BLS SOII (1,848 NAICS x year x state injury rows), NPPES (medical providers) + carrier-supplied claim records.
MCO - Medicare / Medicaid
Full CMS stack (Inpatient, Outpatient, DMEPOS, Part D, POS, NPPES, OIG LEIE, MAC, CERT) + customer claim records under BAA.
Every CREB™ carries the source manifest.
Each CREB™-anchored finding embeds a reproducibility manifest that lists the exact source-file hash, ingest timestamp, code version, and signal threshold used. A regulator, auditor, or counterparty can replay the analysis bit-identically using the same federal source file plus the manifest. The data-source pages above are indexed by the same manifest fields.