01 - Platform - data sources

Every data source the JIL platform ingests, by LOB and refresh cadence.

JIL is a verification network. The integrity of every CREB™ we seal traces back to the data we ingested, when we ingested it, where the source is, and which line of business consumed it. This page lists all of them. Federal public datasets (no contract required), commercial subscriptions (when the customer engagement requires depth a public source cannot reach), and customer-supplied records (under BAA / GLBA basis). Replay-grade transparency, not vague positioning.

54
Total catalogued sources.
38
Free public + sanctions feeds.
11
Commercial subscription paths.
8
LOBs served on the same backbone.

Legend: LIVE ingested in production. WIRED code in place; pending DUA, key, or live data. PENDING not yet implemented. PAID commercial subscription required.

Why so many WIRED rows? Three reasons, all by design. (1) Several federal portals (USCIS, DOJ, UN Comtrade) refuse anonymous bulk pulls; the integration is coded and a synthetic backstop runs the POC, but full ingest waits on a DUA or paid endpoint. (2) Commercial subscriptions (Bloomberg, Refinitiv, Plaid, IRS IVES, ATTOM, Etherscan Pro) are engagement-funded. We do not pay speculatively. The integration is built so a customer can flip a key on Day 1 of an engagement. (3) Customer-owned tools (Chainalysis KYT, TRM Labs) federate via webhook. Customers ride their own subscriptions; we never proxy. WIRED is a credibility signal, not a gap.

2026-05-04 update: NHTSA FARS (60,762 vehicle-crash rows, FY2022 National) and BLS SOII (1,848 NAICS x year x state injury rows) flipped to LIVE; pc-poc and wc-poc are now backed by real public-data feeds.

2026-05-03 update: Pulled 8 federal/free sources to LIVE - SEC EDGAR (32,971 filings across 10 institutions), TreasuryDirect (493 active securities), FFIEC bulk-data index, FINRA BrokerCheck + disciplinary index, CFTC press-release scrape, CMS PECOS provider enrollment + ownership (10 datasets, 2,000 row sample) - ~35,700 records seeded. Plus a Pre-Clearance Stage 1 address-intelligence sweep: 6 sources merged into a single in-memory lookup (33,547 unique labeled addresses across sanctioned / DPRK / mixer / scam / DeFi / bridge / exchange categories, loaded by the preclearance service at boot). All anonymously pullable, no key required. See the pre-clearance architecture doc for how Stage 1 uses these.

02 - Public federal datasets

Public, free, no contract required.

Every row below pulls live from a federal data publisher. No subscription, no DUA, no per-record licensing. JIL ingests, hashes the source file for replay, indexes into postgres, and runs the LOB-specific check pack. These are the sources behind the eight live POC pages and the CMS attestation backbone.

Source Provider Refresh Format LOB(s) Status Live POC
Fails-to-Deliver RegisterSEC FOIAMonthly (a/b half-files)pipe-delimitedcapmarketsLIVEcapmarkets-poc (339K rows)
USAspending.gov APITreasury / OMBReal-time / dailyJSON RESTgrants, federal-investigatorLIVEgrants-poc (1K awards / $2.96T)
DOL OFLC LCA DisclosureDOL ETAQuarterlyXLSXh1bLIVEh1b-poc (337K real LCAs)
UN Comtrade APIUN Statistics DivisionAnnual / QuarterlyJSON RESTtrade-financeWIREDtrade-finance-poc (rate-limited; synthetic backstop)
USCIS Regional CentersUSCISAd-hocHTML / PDFeb5, federal-investigatorWIREDeb5-poc (anon-blocked; synthetic backstop)
USCIS Data Hub processing-timesUSCISQuarterlyJSON RESTeb5WIREDeb5-poc
NHTSA FARS (Fatality Analysis Reporting System)NHTSAAnnualCSV (zipped)pcLIVEpc-poc (60,762 vehicle-crash rows, FY2022 National)
BLS Occupational Injuries (SOII)Bureau of Labor StatisticsAnnualCSV / APIwcLIVEwc-poc (1,848 NAICS x year x state injury rows)
CMS Medicare Inpatient by Provider+ServiceCMSAnnualCSVMCO, federal-investigatorLIVEava-poc (145K rows / $90.94B)
CMS Outpatient by Provider+APCCMSAnnualCSVMCOLIVEava-poc (117K rows)
CMS DMEPOS by Referring ProviderCMSAnnualCSVMCOLIVEava-poc (498K rows)
CMS Part D Prescriber by DrugCMSAnnualCSVMCOLIVEava-poc (476K rows)
CMS Provider of Services (POS) fileCMSQuarterlyCSVMCO, federal-investigatorLIVEava-poc (44K rows)
CMS Hospice utilizationCMSAnnualCSVMCOLIVEava-poc (5,772 rows)
NPPES (NPI Registry)CMSWeeklyCSV bulkMCO, all KYCLIVE · ~9.37M providersava-poc
CERT FY2024 detector libraryCMSAnnualinternal seedMCO, federal-investigatorLIVEava-poc
CMS Owners file (regional centers, ownership)CMSQuarterlydata.cms.gov APIMCOLIVE · 5 ownership datasets seeded 2026-05-03UBO graph
PECOS (Provider Enrollment Chain & Ownership)CMSQuarterlydata.cms.gov APIMCO, federal-investigatorLIVE · 5 enrollment datasets seeded 2026-05-03UBO graph
MAC jurisdiction mapCMSQuarterlyinternal seedMCOLIVEava-poc
Etherscan public API (token transfers)EtherscanBlock-level (~12s)JSON RESTp2p, wallet-intelLIVEp2p-poc (1K USDC transfers)
SEC EDGAR (filings)SECReal-timeJSON / XBRLcapmarkets, asset-intelLIVE · 32,971 filings across 10 institutions seeded 2026-05-03capmarkets-poc
TreasuryDirect auctioned securitiesUS TreasuryReal-timeJSON RESTcapmarketsLIVE · 493 active Bills/Bonds/TIPS/FRN seeded 2026-05-03 Not Startedcapmarkets-poc
FFIEC bulk-data download (call reports)FFIECQuarterlyCSV bulkcapmarketsLIVE · index seeded 2026-05-03; full bulk pull on customer engagementcapmarkets-poc
03 - Public sanctions, identity, exclusions

Cross-vertical compliance feeds.

These feed every LOB. Identity, sanctions, exclusions, beneficial-ownership lookups. Most are free; OpenCorporates carries a free tier for low volume and a paid tier for entity-resolution at scale.

Source Provider Refresh Type LOB(s) Status
OFAC SDN ListTreasury OFACDailyPublic freeall KYC, p2p, trade-financeLIVE · ~37.9K + ~720 crypto addresses
UN Consolidated SanctionsUN Security CouncilDailyPublic freeall KYCLIVE · ~1K entries
HMT (UK) Consolidated ListHM Treasury UKDailyPublic freeall KYCLIVE · ~39.5K
US Consolidated Screening List (CSL)Commerce + State + Treasury (trade.gov)DailyPublic freeall KYC, all vendorLIVE · ~25.6K
EU Consolidated Financial SanctionsEU Council (FSF/FSD direct)DailyPublic freeall KYCLIVE · ~6K (direct + OpenSanctions federation)
OpenSanctions / YenteOpenSanctionsDailyPublic freeall KYCLIVE · ~74.6K
FATF High-Risk + Monitored JurisdictionsFATFTriannualPublic freeall KYC, p2p, trade-financeLIVE · ~20 jurisdictions
OIG LEIE (excluded individuals)HHS OIGMonthlyPublic freeMCO, federal-investigatorLIVE · ~83K
SAM.gov exclusionsGSADailyAPI-keyed (approved 2026-05-11; key valid 72 days; daily expiry alerter on Hetzner cron)grants, federal-investigator, all vendorLIVE · 110,000 of 167,456 records in Postgres (66% coverage; daily incremental catches up); /entity-information/v4/exclusions; uplift to Snowflake in flight
SAM.gov entity registration (UEI, CAGE, registration status)GSAOn-demand lookupAPI-keyed (same key as above)all vendor, federal-investigator, grantsLIVE · /entity-information/v3/entities verified 2026-05-11; uplift to Snowflake in flight
CMS PECOS Provider EnrollmentCMSMonthlyPublic freeMCO, federal-investigatorLIVE · loader seeded 2026-05-03 (Hospital, Hospice, SNF, HHA, FFS)
Treasury DNP (Do Not Pay)US Treasury BFSReal-timeAuthorized onlyfederal-investigator, grantsBLOCKED · needs gov customer DUA
GLEIF LEI RegistryGLEIFDailyPublic freeall institutionalLIVE · ~3.3M LEIs
FinCEN BOI Reporting (when published)FinCENReal-timePublic freeall KYBPENDING
FINRA BrokerCheck (individual + firm)FINRAReal-timePublic free JSONcapmarketsLIVE · BrokerCheck API seeded 2026-05-03 (50 individual records across 5 surnames) Not Started
FINRA disciplinary databaseFINRAReal-timePublic freecapmarketsLIVE · index seeded 2026-05-03; per-case scraper on engagement
CFTC enforcement databaseCFTCReal-timeHTML scrape (Drupal 10 migration killed RSS)capmarkets, trade-financeLIVE · 37 press-release URLs seeded 2026-05-03
DOJ enforcement / qui tam relator recordsDOJReal-timePublic freefederal-investigator, MCOWIRED
OpenCorporates (entity registry)OpenCorporatesReal-time APIFree + paid tiereb5, all KYBWIRED
RDAP domain age + WHOISICANN / registrarsReal-timePublic freeall BECLIVE
OFAC SDN crypto address mirror (multi-chain)0xB10C mirrorDaily (mirror updates)Public freepre-clearance Stage 1LIVE · 121 addresses across ETH/BSC/BCH/XMR/LTC/ZEC/DASH
ScamSniffer community phishing/scam databaseScamSnifferReal-time (community-reported)Public freepre-clearance Stage 1LIVE · 2,530 EVM addresses
MEW Ethereum darklist (mixer / phishing / fraud)MyEtherWalletReal-time (community-curated)Public freepre-clearance Stage 1LIVE · 715 curated entries
Etherscan label cloud (CEX / DEX / Bridge / Mixer)brianleect mirrorPeriodic scrapePublic freepre-clearance Stage 1LIVE · 29,945 labeled addresses
DefiLlama protocols (DeFi protocol contract registry)DefiLlamaReal-time APIPublic freepre-clearance Stage 1LIVE · 1,811 on-chain protocol contracts (7,429 protocols total)
DPRK / Lazarus Group attribution seedCurated from public OFAC + Chainalysis + TRM + Elliptic incident reportsOn incident publicationCurated seedpre-clearance Stage 1LIVE · 6 publicly-attributed Lazarus wallets (Ronin, Atomic, Stake.com, CoinEx, Alphapo)
04 - Commercial subscriptions

Paid feeds for engagement-grade depth.

Tier 2 of the JIL economic model brings these in on a per-engagement basis. We do not carry the subscription cost as a fixed overhead; the customer engagement either funds the data path or chooses a public-data-only Tier 1 baseline. Every paid feed below has a public-data fallback or is optional for the verticals that consume it.

Source Provider Refresh Cost band LOB(s) Status
Bloomberg Terminal dataBloombergReal-time$$capmarkets, asset-intelPENDING · engagement-funded
Refinitiv (LSEG) market referenceLSEGReal-time$$capmarketsPENDING · engagement-funded
Chainalysis KYT / ReactorChainalysisReal-time$$wallet-intel, p2pPENDING · customer rides their own
TRM LabsTRM LabsReal-time$$wallet-intel, p2pPENDING · customer rides their own
ATTOM Property + Address IntelligenceATTOM DataDaily$MCO, pcWIRED
Etherscan Pro (higher rate limit)EtherscanReal-time$p2p, wallet-intelWIRED · using free tier today
Helius RPC + DAS APIHeliusReal-time$wallet-intel, p2pWIRED
Plaid (banking data)PlaidReal-time$Money PassportPENDING
IRS 4506-C IVESIRSOn-demand$ per requestMoney PassportPENDING · IVES participant approval
MCG Care Guidelines (clinical criteria)Hearst HealthAnnual (versioned)$$UM, MCOPENDING · engagement-funded license
InterQual (clinical criteria)Optum / Change HealthcareAnnual (versioned)$$UM, MCOPENDING · engagement-funded license
Why not subscribe to everything up-front. Each commercial feed is a fixed cost JIL would have to spread across customers regardless of whether their engagement actually exercises that path. Instead we run Tier 1 entirely on public-data feeds, surface findings, and only activate the relevant paid feeds at Tier 2 when the customer engagement specifically calls for them. This keeps the platform's gross-margin profile institutional-grade and aligned with the four-SKU pricing model (no contingency, no per-recovery percentage).
05 - Customer-supplied records

Under BAA, GLBA, or comparable basis.

Customer-supplied records never leave the customer's perimeter. Verdict-engine ingestion runs inside the customer's tenant or against a read-only adapter on the customer's side. JIL receives only the signed verdict record and case-file artifacts, not the underlying data.

capmarkets

Settlement records

Trade records, SWIFT 5xx messages, FIX, ISO 20022 sese. Custodian / broker / fund-admin sources. Real-time stream when paid engagement is active.

capmarkets

Position files

Daily position records from each system that should agree (custodian, broker, fund-admin, CSD). Cross-system reconciliation runs against this set.

all Pre-Settlement

Bank wire records

Outbound wire instructions intercepted before release. Sub-2-second YES / NO / REVIEW gate.

MCO

MCO claim records

Provider claim files, encounter records, prior-authorization decisions. PHI; under BAA. Tier 2 claim integrity work.

UM

UM determinations & appeals

Authorization, concurrent-review, and denial decisions plus appeal and IRO outcomes, ingested via X12 278 / 837 / 835, FHIR, HL7v2, and NCPDP from the plan's own auth, claims, and appeals systems. PHI; under BAA. The determination record the verdict engine anchors to the applied criteria and seals as court-ready evidence.

h1b

H-1B beneficiary documents

Sponsor-supplied labor condition files, payroll attestations. Optional Tier 2 deepening.

wc

Workers' comp claims

Carrier-supplied claim event records, medical bills, employer records. Tier 2 only.

pc

P&C claim files

Carrier-supplied claim event records, repair estimates, photos, telematics. Tier 2 only.

trade-finance

Trade finance documents

Letters of credit, bills of lading, customs declarations. Bank-supplied under BAA-equivalent for cross-border ops.

06 - Refresh cadence summary

How fresh the verdict is, by source class.

Real-time / block-level

Seconds to minutes

Etherscan, EDGAR, OFAC SDN delta, USAspending API, OpenSanctions, OpenCorporates API, GLEIF, RDAP. Latency from publication to JIL findings: seconds to minutes.

Daily

Standard cron pulls

OFAC SDN full refresh, HMT UK, EU, NPPES delta, SAM.gov exclusions, ATTOM. Standard cron pulls.

Weekly

Tuesday-night cron

NPPES bulk, OIG LEIE delta, sanctions consolidation. Tuesday-night cron.

Monthly

Calendar-month rollover

SEC fails-to-deliver register (a/b half-files), OIG LEIE full, MAC jurisdiction. Calendar-month rollover.

Quarterly

Calendar-quarter rollover

DOL OFLC LCA disclosure, USCIS Data Hub processing-times, CMS POS file, PECOS, FFIEC bank financials. Calendar-quarter rollover.

Annual

Calendar-year rollover

NHTSA FARS, BLS SOII, CMS Inpatient / Outpatient / DMEPOS / Part D / Hospice / SNF, CERT detector library. Lag of 6-18 months from end of year.

07 - Per-LOB source breakdown

Which sources each LOB consumes.

LIVE

capmarkets

SEC FTD, EDGAR, FFIEC, FINRA, CFTC, GLEIF + customer settlement records. Optional Tier 2: Bloomberg / Refinitiv.

LIVE

grants

USAspending.gov, SAM.gov exclusions, OFAC SDN, GLEIF, OpenCorporates + customer-supplied awardee records.

LIVE

h1b

DOL OFLC LCA, USCIS, OFAC, GLEIF, OpenCorporates + sponsor-supplied wage records.

LIVE

eb5

USCIS Data Hub, USCIS regional centers, SEC EDGAR, OFAC, OpenCorporates + investor source-of-funds documentation.

LIVE

p2p

Etherscan, OFAC SDN crypto-address attribution, OpenSanctions + customer transaction records. Optional: Chainalysis / TRM.

LIVE

trade-finance

UN Comtrade, OFAC, GLEIF + bank-supplied trade documents.

LIVE

pc

NHTSA FARS (60,762 vehicle-crash rows) + carrier-supplied claim records. Optional ATTOM for premise.

LIVE

wc

BLS SOII (1,848 NAICS x year x state injury rows), NPPES (medical providers) + carrier-supplied claim records.

LIVE

MCO - Medicare / Medicaid

Full CMS stack (Inpatient, Outpatient, DMEPOS, Part D, POS, NPPES, OIG LEIE, MAC, CERT) + customer claim records under BAA.

08 - Replay and audit

Every CREB™ carries the source manifest.

Each CREB™-anchored finding embeds a reproducibility manifest that lists the exact source-file hash, ingest timestamp, code version, and signal threshold used. A regulator, auditor, or counterparty can replay the analysis bit-identically using the same federal source file plus the manifest. The data-source pages above are indexed by the same manifest fields.

Cross-references. See CMS Data Source Map for the deep CMS-specific federal stack. See Attestation Checks (500+) for the per-check data-source dependency. See Sample CREB™ for the manifest format.