Skip to content

v0.3.3 — closes 6 silent-failure findings

Latest

Choose a tag to compare

@bluet bluet released this 07 May 15:25
· 6 commits to master since this release

What's New in v0.3.3

Patch release — closes 6 silent-failure issues surfaced by the post-v0.3.1 code review (#36#41). All fixes preserve semver (no public API changes); the SDK is a drop-in upgrade and docker compose runs identically.

🪲 Bug Fixes — Silent Failure Elimination

#37 — Benchmark task: nested except no longer hides DB failures

When a benchmark task fails AND the secondary "mark run as failed" DB write also fails, the inner except Exception: pass silently swallowed it. Runs could be left forever in running. Now logger.exception(...) surfaces both failures so on-call sees the full picture.

#40 — WebSocket broadcast no longer treats serialization errors as disconnects

except Exception: disconnected.append(ws) masked serialization bugs and runtime state issues as silent client disconnects. Now WebSocketDisconnect is caught cleanly; other exceptions are logged with run_id context and then drop the client.

#41 — Empty custom_name correctly normalized to None

HTML forms POST cleared inputs as "" by default. Saving "" to the DB silently broke the custom_name or model_id fallback used at all 8 display sites — the user saw the name disappear with no error. Now a Pydantic field_validator on ModelCreate and ModelUpdate normalizes empty/whitespace strings to None (matches the PATCH endpoint's documented intent: "including None to clear it").

#39 — LiteLLM client preserves original exception types

except Exception as e: raise RuntimeError(...) from e lost the original exception type, breaking caller dispatch on ConnectionError, MemoryError, etc. Now the original exception propagates via bare raise after a logger.exception(...) log line. Defensive "exhausted retries" RuntimeErrors are kept intentionally — those genuinely indicate a retry-loop bug.

#38 — Bedrock discovery raises typed DiscoveryError instead of returning []

AWS auth failures, missing boto3, and network timeouts were indistinguishable from "account has no models" (both returned []). Now bedrock raises the new DiscoveryError class (defined in arguslm/server/discovery/base.py) with a structured cause. The caller in providers.py translates this to HTTP 500 with the human-readable message — the UI can show "AWS authentication failed: " instead of "0 models discovered."

Migration note: other discovery sources (anthropic, azure, openai, ollama, google_ai_studio) still follow the legacy return [] contract. They'll be migrated incrementally; the protocol docstring documents the new contract.

#36 — Monitoring background task uses heartbeat semantics

Previously, a catastrophic task-level failure (DB lost, scheduler bug) was logged as a one-line message and otherwise invisible — same outcome as a healthy run. Per-model failures were already captured by UptimeCheck.error, so the gap was specifically whole-task crashes.

The fix uses heartbeat semantics instead of adding new schema columns:

  • last_run_at advances in a finally block on success OR failure (true heartbeat)
  • MonitoringPage renders an amber "Last run stale — check server logs" badge when last_run_at is older than 2× interval_minutes
  • Replaces fragile timezone derivation with datetime.now(UTC)
  • Zero schema migration

Install

# SDK (PyPI) — unchanged from v0.3.x
pip install arguslm[server]

# Server (Docker Hub) — multi-arch: linux/amd64 + linux/arm64
docker pull bluet/arguslm:0.3.3
docker pull bluet/arguslm-frontend:0.3.3

docker compose up -d

Full Changelog: v0.3.2...v0.3.3