TSKSign in →

How TSK works

Each section is labeled Live (capability complete), Partial (capability works end-to-end with named limitations enumerated below), or Deferred (capability on roadmap, not yet implemented). Five of the six sections below are Partial — read the limitations to understand exactly which surfaces are scoped today.

A methodology document for sustainability practitioners. Each section explains what TSK does today, names the source-of-truth code path, and calls out honest limits.

1. Extraction logic

Live

TSK processes uploaded supplier documents in two passes. The first pass uses regex patterns to extract structured values — energy readings, invoice totals, meter references — from plain text and native-PDF text layers. A pre-validation step rejects documents that contain no parseable text before any extraction begins, so garbage documents are discarded early. When an API key is configured (via OPENAI_API_KEY or TSK_LLM_PROVIDER), a second LLM extraction pass runs in parallel: the provider chain tries Ollama (local, free) first, then OpenAI (cloud), then falls back to regex if both fail. The best result from any provider is selected for the evidence record.

Honest limits

LLM extraction is OFF by default. Without an API key the system uses regex only. Regex covers the most common UK utility bill formats; documents with non-standard layouts or heavily scanned content may produce partial extractions, which are flagged for review rather than silently accepted.

Source trace
  • tools/ingestion/extraction_chooser_v1.py — native vs OCR routing decision (lines 1–40)
  • tools/ingestion/llm_extract_v1.py — provider chain: Ollama → OpenAI → regex fallback
  • config/feature_flags.py:110-114 FF_INGEST_LLM_EXTRACT_V1 (default False; auto-enabled when an API key is detected)

2. Confidence scoring

Partial

Each extracted evidence item carries a source_status signal (unconfirmed → confirmed) that reflects how the value was obtained and whether a human reviewer has validated it. A user-confirmed flag is set when a supplier explicitly accepts a value in the review interface. Before any item reaches the supplier pack, a dimensional sanity gate checks that extracted quantities are unit-consistent — for example, that a reading expressed in kWh is not paired with a price denominated in cubic metres. Items that fail this gate are quarantined into 3_AUDIT/dimensional_quarantine_v1.json and marked for manual review rather than being silently dropped or promoted.

Honest limits

The dual-LLM verification signal — produced when two independent AI models are run on the same extraction — is not yet surfaced in supplier-facing confidence badges. The field dual_llm_verified exists in the evidence schema but is never populated in the current output. When this wiring ships it will add a second layer of machine-validation to confidence scoring; until then, badges reflect regex and single-LLM extraction quality plus the dimensional sanity gate only.

Source trace
  • tools/evidence/evidence_summary_v1.py source_status enum definition and dual_llm_verified field (near top of file)
  • tools/extraction/dimensional_sanity_gate_v1.py:32-150 — dimensional quarantine logic; writes to 3_AUDIT/dimensional_quarantine_v1.json

3. Emission factor sources

Partial

TSK uses the DEFRA 2025 conversion factors for all emissions calculations — electricity, natural gas, liquid fuels, water, and waste. The factor set is pinned in contracts/emission_factors/uk_defra_2025_v1.json, where the vintage year is hard-coded in the file header (line 6: "year": 2025) and the calculation method is declared as "method": "location-based". The calculate_emissions() function selects this factor set as its default at line 27 of tools/calc/emissions_calculator_v1.py.

Honest limits

The factor library is DEFRA 2025, UK only. Non-UK geographies fall back to the UK factor today — this is a known limitation. Calculations applied to non-UK consumption data using the UK factor will misrepresent location-based emissions and should be re-run when a regional factor set ships. We pin a specific DEFRA edition deliberately to ensure reproducibility of historical calculations — emissions calculated today against DEFRA 2025 will calculate identically a year from now. When DEFRA releases the next annual update, refreshing the pinned factor set is on the post-launch roadmap. Only location-based Scope 2 is supported; market-based calculation is deferred.

Source trace
  • contracts/emission_factors/uk_defra_2025_v1.json — year hard-coded in header (lines 1–16); "method": "location-based" declared here
  • tools/calc/emissions_calculator_v1.py:27 — default factor set selection

4. GHG Protocol scope rules

Partial

TSK follows GHG Protocol scope boundaries for all emissions derivation. Scope 1 covers direct combustion from owned or controlled sources — natural gas, diesel, petrol, LPG, and fuel oil are all supported with live factor lookups. Scope 2 uses a location-based approach for purchased electricity: the grid emission factor is applied to metered kWh consumption, following the "method": "location-based" declaration in contracts/emission_factors/uk_defra_2025_v1.json. Scope 3 activity keys are mapped from supplier document data using KEYWORD_METRIC_MAP in tools/ingestion/mapping_candidates_v1.py so that category assignments are ready when calculation is added.

Honest limits

Scope 3 category mapping is schema-ready; per-category emissions calculation is NOT yet implemented. The mapping schema currently addresses 9 of the 15 GHG Protocol Scope 3 categories: Cat 1 (purchased goods & services), Cat 2 (capital goods), Cat 3 (fuel & energy-related), Cat 4 (upstream transport), Cat 5 (operational waste), Cat 6 (business travel), Cat 7 (employee commuting), Cat 9 (downstream transport), and Cat 12 (end-of-life treatment). The remaining 6 categories — including Cat 8 (upstream leased assets), Cat 10 (processing of sold products), Cat 11 (use of sold products), Cat 13 (downstream leased assets), Cat 14 (franchises), and Cat 15 (investments) — are not addressed in the current schema. Scope 2 market-based calculation is deferred. Scope 1 coverage is limited to the five fuels listed above — refrigerants, fugitive emissions, and process-combustion sources are not yet in scope.

Source trace
  • tools/calc/emissions_derivation_v1.py:64-86 METRIC_KEY_TO_FACTOR mapping table: metric keys for Scope 1 fuels (natural gas, diesel, petrol, LPG, fuel oil) and Scope 3 water/waste keys, mapped to their DEFRA factor names and expected units
  • contracts/emission_factors/uk_defra_2025_v1.json — Scope 2 location-based method declared at line 16 ("method": "location-based"); tools/calc/emissions_calculator_v1.py surfaces this via factor.get("method")
  • KEYWORD_METRIC_MAP in tools/ingestion/mapping_candidates_v1.py (lines 62–83) — Scope 3 activity key mapping schema (no emissions calculation)

5. Dual-AI verification

Partial

TSK's production profile runs two independent AI models on extractions where the dual-LLM adjudication module is engaged. Today, model disagreements are logged internally for operations review; the verification result is not yet surfaced in supplier-facing confidence badges. Surfacing the signal in badges is the next step in dual-LLM visibility. This module is implemented in tools/ingestion/dual_llm_adjudication_v1.py and is enabled in production via the FF_DUAL_LLM_ADJUDICATION feature flag.

Honest limits

The dual_llm_verified field exists in tools/evidence/evidence_summary_v1.py but is never populated in the current supplier output. When this wiring ships, Section 2 (Confidence scoring) will also be updated to reflect the added signal.

Source trace
  • config/feature_flags.py:171-175 FF_DUAL_LLM_ADJUDICATION (default False; enabled via "quality" group — the flag belongs to the quality group, so it does not need to be listed by name in production.json; group membership is sufficient for ON status)
  • tools/ingestion/dual_llm_adjudication_v1.py — adjudication logic (~180 lines)
  • config/beta_profiles/production.json — includes "quality" in its groups array, which enables all flags in that group including FF_DUAL_LLM_ADJUDICATION
  • tools/evidence/evidence_summary_v1.py dual_llm_verified field defined but not yet populated in supplier-facing output

6. Chain of custody

Partial

Every supplier pack produced by TSK is tamper-evident. When the pipeline finalises a run, it writes a manifest to 3_AUDIT/manifest.json inside the supplier_pack.zip archive. Each file in the pack receives a SHA256 hash entry in this manifest, so any post-delivery modification to an individual evidence file is detectable by recomputing and comparing hashes. The manifest is written by tools/pack/supplier_pack_zip_v1.py.

Honest limits

The manifest provides integrity (SHA256 hashes per file). Cryptographic signing — auditor-verifiable authenticity with a private key — is on the post-launch roadmap. Your completed pack benefits from terminal-state durability (H2 Item 2.5): once a run is finalised, it survives a server restart. Mid-pipeline resume is not promised — if a run is interrupted before reaching the 5-artifact threshold, it may be left in an incomplete state and will not be automatically re-completed. The orphan-recovery mechanism is a heuristic, not a guarantee — runs with fewer than 5 artifacts are not recovered.

Source trace
  • tools/pack/supplier_pack_zip_v1.py — manifest written to 3_AUDIT/manifest.json; SHA256 per file computed at ~line 42+
  • web_service/run_manager.py:217 — orphan recovery threshold (5+ artifacts → terminal-state completion; fewer than 5 → not recovered)