P2-06: Ensemble thresholds (0.65/0.35) are arbitrary magic numbers — not calibrated #16

Closed
opened 2026-06-16 13:57:01 +00:00 by Artur · 0 comments
Owner

Severity: P2 (Medium)
File: decider/ensemble.py lines 83-95

Problem

if weighted_score >= 0.65:
    decision = "auto_approve"
elif weighted_score >= 0.35:
    decision = "ask_user"
else:
    decision = "abort"

These thresholds determine whether a tool executes autonomously or gets blocked. There is:

  • No calibration dataset
  • No historical accuracy analysis
  • No justification documented
  • No config-driven override (hardcoded)

Similarly, matrix thresholds (auto_execute: 0.75, ask_user: 0.45) in config.py are arbitrary.

Fix

  1. Move thresholds to config (decider.conf.json)
  2. Add a calibration framework: track all decisions + outcomes, periodically optimize thresholds against historical accuracy
  3. Document calibration methodology
**Severity**: P2 (Medium) **File**: `decider/ensemble.py` lines 83-95 ## Problem ```python if weighted_score >= 0.65: decision = "auto_approve" elif weighted_score >= 0.35: decision = "ask_user" else: decision = "abort" ``` These thresholds determine whether a tool executes autonomously or gets blocked. There is: - No calibration dataset - No historical accuracy analysis - No justification documented - No config-driven override (hardcoded) Similarly, matrix thresholds (`auto_execute: 0.75`, `ask_user: 0.45`) in `config.py` are arbitrary. ## Fix 1. Move thresholds to config (`decider.conf.json`) 2. Add a calibration framework: track all decisions + outcomes, periodically optimize thresholds against historical accuracy 3. Document calibration methodology
Artur closed this issue 2026-06-16 14:24:07 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
glow-all/decider#16
No description provided.