How TSK works

Each section is labeled Live (capability complete), Partial (capability works end-to-end with named limitations enumerated below), or Deferred (capability on roadmap, not yet implemented). Five of the six sections below are Partial — read the limitations to understand exactly which surfaces are scoped today.

A methodology document for sustainability practitioners. Each section explains what TSK does today, names the source-of-truth code path, and calls out honest limits.

What TSK produces

TSK turns a supplier's uploaded evidence into structured truth once, then renders it two ways. The extraction, confidence, factor, scope, verification, and chain-of-custody methods described below apply to both modes — they describe how a value becomes trustworthy, regardless of where it is surfaced.

Mode 1 — Questionnaire help(the launch focus)

Helps a supplier answer every question in a sustainability questionnaire (CDP, EcoVadis, GRI Universal, or a buyer's custom ask). Each answer is either auto-extracted from uploaded evidence with page-level provenance, or surfaced for the supplier to confirm, edit, or self-declare — with full edit history and an attribution tag showing where the answer came from.

Mode 2 — Automated framework export(partial coverage)

Renders the confirmed evidence into structured disclosure files for a defined subset of standards: CDP Climate XML (3 of 13 modules, ~23%), GRI 300-series CSV (6 of 31 disclosures, ~19%), and a PACT JSON envelope (organisation-level summary, not a product-level PCF). Honest and present, but not the launch story — the roadmap states exactly what each export does and does not cover.

Provenance discipline applies to both modes

Whether a number becomes a questionnaire answer (Mode 1) or an exported disclosure field (Mode 2), it traces back to a source document and page, or is explicitly marked as supplier-attested / self-declared. No value is invented by the AI — deterministic logic owns every number, and anything we cannot evidence is shown as needing review rather than silently asserted.

1. Extraction logic

Live

TSK processes uploaded supplier documents in two passes. The first pass uses regex patterns to extract structured values — energy readings, invoice totals, meter references — from plain text and native-PDF text layers. A pre-validation step rejects documents that contain no parseable text before any extraction begins, so garbage documents are discarded early. When an API key is configured (via OPENAI_API_KEY or TSK_LLM_PROVIDER), a second LLM extraction pass runs in parallel: the provider chain tries Ollama (local, free) first, then OpenAI (cloud), then falls back to regex if both fail. The best result from any provider is selected for the evidence record.

Honest limits

LLM extraction is OFF by default. Without an API key the system uses regex only. Regex covers the most common UK utility bill formats; documents with non-standard layouts or heavily scanned content may produce partial extractions, which are flagged for review rather than silently accepted.

Source trace

tools/ingestion/extraction_chooser_v1.py — native vs OCR routing decision (lines 1–40)
tools/ingestion/llm_extract_v1.py — provider chain: Ollama → OpenAI → regex fallback
config/feature_flags.py:110-114 — FF_INGEST_LLM_EXTRACT_V1 (default False; auto-enabled when an API key is detected)

2. Confidence scoring

Partial

Each extracted evidence item carries a source_status signal (unconfirmed → confirmed) that reflects how the value was obtained and whether a human reviewer has validated it. A user-confirmed flag is set when a supplier explicitly accepts a value in the review interface. Before any item reaches the supplier pack, a dimensional sanity gate checks that extracted quantities are unit-consistent — for example, that a reading expressed in kWh is not paired with a price denominated in cubic metres. Items that fail this gate are quarantined into 3_AUDIT/dimensional_quarantine_v1.json and marked for manual review rather than being silently dropped or promoted.

Honest limits

The dual-LLM verification signal — produced when two independent AI models are run on the same extraction — is not yet surfaced in supplier-facing confidence badges. The field dual_llm_verified exists in the evidence schema but is never populated in the current output. When this wiring ships it will add a second layer of machine-validation to confidence scoring; until then, badges reflect regex and single-LLM extraction quality plus the dimensional sanity gate only.

Source trace

tools/evidence/evidence_summary_v1.py — source_status enum definition and dual_llm_verified field (near top of file)
tools/extraction/dimensional_sanity_gate_v1.py:32-150 — dimensional quarantine logic; writes to 3_AUDIT/dimensional_quarantine_v1.json

3. Emission factor sources

Partial

TSK uses the DEFRA 2025 conversion factors for all emissions calculations — electricity, natural gas, liquid fuels, water, and waste. The factor set is pinned in contracts/emission_factors/uk_defra_2025_v1.json, where the vintage year is hard-coded in the file header (line 6: "year": 2025) and the calculation method is declared as "method": "location-based". The calculate_emissions() function selects this factor set as its default at line 27 of tools/calc/emissions_calculator_v1.py.

Honest limits

The factor library is DEFRA 2025, UK only. Non-UK geographies fall back to the UK factor today — this is a known limitation. Calculations applied to non-UK consumption data using the UK factor will misrepresent location-based emissions and should be re-run when a regional factor set ships. We pin a specific DEFRA edition deliberately to ensure reproducibility of historical calculations — emissions calculated today against DEFRA 2025 will calculate identically a year from now. When DEFRA releases the next annual update, refreshing the pinned factor set is on the post-launch roadmap. For purchased-electricity Scope 2, TSK reports both GHG Protocol methods: a location-based figure (DEFRA grid-average factor) as the headline total, and a parallel market-based figure (UK residual fuel mix, reflecting a contractual instrument such as a renewable tariff where the evidence shows one). The two are shown side by side and never summed. Both are UK-only, and the market-based figure is CO₂-only — it excludes CH₄/N₂O and transmission & distribution.

Source trace

contracts/emission_factors/uk_defra_2025_v1.json — year hard-coded in header (lines 1–16); "method": "location-based" declared here
tools/calc/emissions_calculator_v1.py:27 — default factor set selection

4. GHG Protocol scope rules

Partial

TSK follows GHG Protocol scope boundaries for all emissions derivation. Scope 1 covers direct combustion from owned or controlled sources — natural gas, diesel, petrol, LPG, and fuel oil are all supported with live factor lookups. Scope 2 uses a location-based approach for purchased electricity: the grid emission factor is applied to metered kWh consumption, following the "method": "location-based" declaration in contracts/emission_factors/uk_defra_2025_v1.json. For that same electricity, TSK also derives a parallel market-based Scope 2 figure — using the UK residual fuel mix, and reflecting a contractual instrument such as a renewable tariff where the evidence shows one — and reports the two side by side under GHG Protocol dual reporting, never summing them. For Scope 3, activity keys are mapped from supplier document data using KEYWORD_METRIC_MAP in tools/ingestion/mapping_candidates_v1.py, and five of those categories now derive emissions with DEFRA 2025 factors (see honest limits).

Honest limits

The mapping schema addresses 9 of the 15 GHG Protocol Scope 3 categories, and five of those now derive emissions with DEFRA 2025 factors: Cat 4 (upstream transport), Cat 5 (operational waste), Cat 6 (business travel), Cat 7 (employee commuting), and Cat 9 (downstream transport) — using documented estimation methods where an activity figure has to be derived (for example, flight count to passenger-km). The other mapped categories — Cat 1 (purchased goods & services), Cat 2 (capital goods), Cat 3 (fuel & energy-related), and Cat 12 (end-of-life treatment) — are schema-ready but not yet calculated. The remaining 6 categories — Cat 8 (upstream leased assets), Cat 10 (processing of sold products), Cat 11 (use of sold products), Cat 13 (downstream leased assets), Cat 14 (franchises), and Cat 15 (investments) — are not addressed in the current schema. Market-based Scope 2 is UK-only and CO₂-only (it uses the UK residual fuel mix and excludes CH₄/N₂O and transmission & distribution). Scope 1 coverage is limited to the five fuels listed above — refrigerants, fugitive emissions, and process-combustion sources are not yet in scope.

Source trace

tools/calc/emissions_derivation_v1.py:64-86 — METRIC_KEY_TO_FACTOR mapping table: metric keys for Scope 1 fuels (natural gas, diesel, petrol, LPG, fuel oil) and Scope 3 water/waste keys, mapped to their DEFRA factor names and expected units
contracts/emission_factors/uk_defra_2025_v1.json — Scope 2 location-based method declared at line 16 ("method": "location-based"); tools/calc/emissions_calculator_v1.py surfaces this via factor.get("method")
KEYWORD_METRIC_MAP in tools/ingestion/mapping_candidates_v1.py (lines 62–83) — Scope 3 activity key mapping schema (no emissions calculation)

5. Dual-AI verification

Partial

TSK's production profile runs two independent AI models on extractions where the dual-LLM adjudication module is engaged. Today, model disagreements are logged internally for operations review; the verification result is not yet surfaced in supplier-facing confidence badges. Surfacing the signal in badges is the next step in dual-LLM visibility. This module is implemented in tools/ingestion/dual_llm_adjudication_v1.py and is enabled in production via the FF_DUAL_LLM_ADJUDICATION feature flag.

Honest limits

The dual_llm_verified field exists in tools/evidence/evidence_summary_v1.py but is never populated in the current supplier output. When this wiring ships, Section 2 (Confidence scoring) will also be updated to reflect the added signal.

Source trace

config/feature_flags.py:171-175 — FF_DUAL_LLM_ADJUDICATION (default False; enabled via "quality" group — the flag belongs to the quality group, so it does not need to be listed by name in production.json; group membership is sufficient for ON status)
tools/ingestion/dual_llm_adjudication_v1.py — adjudication logic (~180 lines)
config/beta_profiles/production.json — includes "quality" in its groups array, which enables all flags in that group including FF_DUAL_LLM_ADJUDICATION
tools/evidence/evidence_summary_v1.py — dual_llm_verified field defined but not yet populated in supplier-facing output

6. Chain of custody

Partial

Every supplier pack produced by TSK is tamper-evident. When the pipeline finalises a run, it writes a manifest to 3_AUDIT/manifest.json inside the supplier_pack.zip archive. Each file in the pack receives a SHA256 hash entry in this manifest, so any post-delivery modification to an individual evidence file is detectable by recomputing and comparing hashes. The manifest is written by tools/pack/supplier_pack_zip_v1.py.

Honest limits

The manifest provides integrity (SHA256 hashes per file). Cryptographic signing — auditor-verifiable authenticity with a private key — is on the post-launch roadmap. Your completed pack benefits from terminal-state durability (H2 Item 2.5): once a run is finalised, it survives a server restart. Mid-pipeline resume is not promised — if a run is interrupted before reaching the 5-artifact threshold, it may be left in an incomplete state and will not be automatically re-completed. The orphan-recovery mechanism is a heuristic, not a guarantee — runs with fewer than 5 artifacts are not recovered.

Source trace

tools/pack/supplier_pack_zip_v1.py — manifest written to 3_AUDIT/manifest.json; SHA256 per file computed at ~line 42+
web_service/run_manager.py:217 — orphan recovery threshold (5+ artifacts → terminal-state completion; fewer than 5 → not recovered)