Keeps this page in sync as the body changes. Pause it any time for a quieter view.
Path /specs/provider-health-alerting
Last refresh never
Spec: Provider Health Alerting from Last-5 Success Rate
inventory_source local | spec_api /api/spec-registry/provider-health-alerting | registry_updated 2026-04-09T03:10:09.005287Z
potential_value 0.00 | actual_value 1.00 | value_gap 0.00
estimated_cost 0.00 | actual_cost 1.00 | cost_gap 1.00
estimated_roi 0.00 | actual_roi 1.00
Missing contributor linkage. Submit a change request with contributor attribution.
Open process view for this spec
task_ids -
branches -
source_files specs/cross-task-outcome-correlation.md, specs/prompt-ab-roi-measurement.md, specs/provider-health-alerting.md, specs/provider-usage-coalescing-timeout-resilience.md, specs/runner-auto-contribution.md, specs/tool-failure-awareness.md
evidence_refs -
Open implementation view for this spec
implementation_refs spec-registry:cross-task-outcome-correlation, spec-registry:prompt-ab-roi-measurement, spec-registry:provider-health-alerting, spec-registry:provider-usage-coalescing-timeout-resilience, spec-registry:runner-auto-contribution, spec-registry:tool-failure-awareness
lineage_ids -
public_endpoints -
summary Provider reliability can degrade quickly while still appearing "configured" and partially functional. This spec adds a deterministic health alerting contract so when a provider's last-5 execution success rate drops below 50%, the system automatically records a friction event and can optionally push a notification through existing channels, reducing silent failure loops and response latency.
process_summary Compute provider health from execution outcomes using a fixed `last_5` window and trigger only when `last_5_success_rate; Automatically write a friction event when the threshold is breached, with provider identity and evidence in event notes.; De-duplicate repeated friction writes while the provider remains degraded; create a new event only on a fresh degradatio; Support optional outbound notification through existing channels (Telegram adapter) behind configuration flags, with no ; Keep usage/readiness alert payloads aligned with the health state so API consumers can observe degraded providers withou
pseudocode_summary -
implementation_summary api/app/services/automation_usage_service.py (provider health evaluation); api/app/services/collective_health_service.py (health aggregation); api/app/routers/agent_status_routes.py (health endpoints)