fix(install): use public Docker Hub image nousresearch/hermes-agent#4
Conversation
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces server listing and deletion capabilities, integrates GitHub Container Registry (GHCR) authentication for pulling the Hermes agent image, and adds a detailed installation log viewer to the server detail page. Feedback on these changes highlights a potential command injection vulnerability in the GHCR login command construction due to unescaped environment variables. Additionally, performance and efficiency improvements are suggested for the new InstallLogCard component, specifically gating the log polling on the card's expanded state and memoizing the search for the last error line in the log output.
| if (!username || !token) { | ||
| return ""; | ||
| } | ||
| return `printf '%s' '${token}' | sudo docker login ghcr.io -u '${username}' --password-stdin`; |
There was a problem hiding this comment.
The GHCR_TOKEN and GHCR_USERNAME environment variables are injected directly into the shell command string using single quotes. If either variable contains a single quote, it will break the shell command syntax or potentially lead to command injection. To prevent this, escape single quotes in both variables before constructing the command.
const safeToken = token.replace(/'/g, "'\\''");\n const safeUsername = username.replace(/'/g, "'\\''");\n return "printf '%s' '" + safeToken + "' | sudo docker login ghcr.io -u '" + safeUsername + "' --password-stdin";| useEffect(() => { | ||
| if (install?.status !== "running") { | ||
| if (pollTimerRef.current) { | ||
| clearInterval(pollTimerRef.current); | ||
| pollTimerRef.current = null; | ||
| } | ||
| return; | ||
| } | ||
|
|
||
| pollTimerRef.current = setInterval(() => { | ||
| void fetchLog(); | ||
| }, POLL_INTERVAL_MS); | ||
|
|
||
| return () => { | ||
| if (pollTimerRef.current) { | ||
| clearInterval(pollTimerRef.current); | ||
| pollTimerRef.current = null; | ||
| } | ||
| }; | ||
| }, [install, fetchLog]); |
There was a problem hiding this comment.
The polling interval is currently set up to fetch the installation log every 3 seconds whenever the installation status is 'running', regardless of whether the log card is expanded (isExpanded). This results in unnecessary network requests and database queries when the log is hidden. Polling should be gated on isExpanded being true.
useEffect(() => {\n if (install?.status !== "running" || !isExpanded) {\n if (pollTimerRef.current) {\n clearInterval(pollTimerRef.current);\n pollTimerRef.current = null;\n }\n return;\n }\n\n pollTimerRef.current = setInterval(() => {\n void fetchLog();\n }, POLL_INTERVAL_MS);\n\n return () => {\n if (pollTimerRef.current) {\n clearInterval(pollTimerRef.current);\n pollTimerRef.current = null;\n }\n };\n }, [install, isExpanded, fetchLog]);
| const lastErrorLine = logData?.log | ||
| ?.split("\n") | ||
| .filter(Boolean) | ||
| .reverse() | ||
| .find((line) => line.toLowerCase().includes("error")); |
There was a problem hiding this comment.
The lastErrorLine is computed by splitting, filtering, reversing, and searching the entire log string on every single render. If the log is large, this can cause noticeable performance lag. Wrap this computation in useMemo so it only runs when logData?.log actually changes.
const lastErrorLine = useMemo(() => {\n return logData?.log\n ?.split("\\n")\n .filter(Boolean)\n .reverse()\n .find((line) => line.toLowerCase().includes("error"));\n }, [logData?.log]);
The Hermes agent image (ghcr.io/hermes-agent/hermes) is private. The install workflow and server actions (update/rollback) ran docker pull without authenticating, causing "error from registry: denied". Add a shared helper (server/ghcr.ts) that reads GHCR_USERNAME and GHCR_TOKEN env vars and conditionally prepends a docker login command to pull operations. When the env vars are unset, the helper is a no-op so local development remains unaffected.
80136ee to
5c9b567
Compare
The CI pipeline for the official NousResearch/hermes-agent repo publishes the Docker image to Docker Hub, not GitHub Container Registry. The previous reference ghcr.io/hermes-agent/hermes:latest never existed, causing `docker compose pull` to fail with "not found" after all other install steps succeeded. Switch to nousresearch/hermes-agent:latest which is publicly available and updated on every main push.
The Hermes agent image is public on Docker Hub so the GHCR login step and prependGhcrLogin wrappers were dead code. Also revert the dynamic buildInstallSteps() back to a plain static array since the only conditional branch (login-ghcr) is gone.
…onnect Invert the requireHostKeyPin flag to allowMissingHostKeyPin. The secure behavior (fail closed when no fingerprint is stored) is now the default. Only verifyServerConnection (first-connect) opts out by passing allowMissingHostKeyPin: true. This eliminates requireHostKeyPin: true from 15 call sites and removes the field from DeployComposeInput, ManagedComposeDeployInput, and SshConnectionConfig intermediate types. A future caller that forgets the flag is safe by default. Also extracts parseOptionId helper and generic keepLatestBy to eliminate copy-paste. Uses composition for DeployComposeInput (extends SshConnectionInput instead of duplicating its fields). Review fixes #1, #2, #4, #5.
* feat: add Telegram model/provider switch endpoint
Add POST /api/telegram/model-switch endpoint that allows switching the
AI model and/or provider on a deployed Hermes server via SSH.
What:
- New switchModelProvider handler in server/telegram.ts
- Accepts { model?, provider? } in request body
- Validates provider against known ApiProviderId values
- Validates model string format and provider-model compatibility
- Uses SSH to execute hermes config set commands on the remote server
- Writes audit logs for success and failure cases
Why:
- Closes #34
- Users currently need to use the web UI to change model/provider
- This enables programmatic switching (e.g., from Telegram bot commands)
How:
- Reuses existing setProviderModel/setProviderInferenceProvider from runtime.ts
- Reuses PROVIDER_ENV_CONFIGS for provider-to-hermes-provider mapping
- Follows same auth + SSH + audit log pattern as other Telegram endpoints
- 8 new tests covering auth, validation, SSH execution, and error handling
* fix: address thermo-nuclear review findings for model-switch endpoint
Fixes:
- CRITICAL: Silent no-op when provider mapping missing — now validates
hermesProvider exists and returns 400 if not
- CRITICAL: Unsafe provider as ApiProviderId cast — now uses typed
validatedProvider variable throughout
- MAJOR: Extract resolveTelegramSshContext helper to eliminate 30-line
SSH boilerplate duplication
- MAJOR: Trim model before validation (not after) to handle whitespace
* fix: biome formatting in server/telegram.ts
* fix: biome formatting in renovate.json and server/telegram.test.ts
* fix: TS errors in switchModelProvider — SshAuthMethod type and status code cast
* fix: biome formatting — import order and multi-line context.json call
* fix: add switchModelProvider to telegram mock in app.test.ts
* chore: trigger CI
* fix: address PR review feedback for switchModelProvider
* feat(telegram): rewrite compose and restart container on model/provider switch
- Extract getModelValidationError helper for cleaner validation
- Validate model against subscription backends in addition to API providers
- Rewrite managed compose file and restart hermes container after SSH deploy
so new provider env vars take effect without manual intervention
- Use cached pre-switch backend for DB persistence instead of re-fetching,
ensuring DB is never updated if remote configuration fails
- Split subscription model persistence into credential vs oauth cases
* feat(telegram): add model-access-options API and rewrite model-switch to use optionId
- Add GET /api/telegram/model-access-options that returns the latest saved
deployable option per backend (API providers, credential subscriptions,
OAuth subscriptions) with opaque optionId: api-provider:<uuid>,
credential-subscription:<uuid>, oauth-subscription:<uuid>
- Rewrite POST /api/telegram/model-switch to accept { optionId, model }
payload, resolve the saved row, validate model against the backend, deploy
via SSH compose, and activate the selected row while deactivating others
in a DB transaction
- Add shared contracts for ModelAccessOption, ModelAccessOptionsResponse,
and ModelSwitchPayload
- Add query helpers for listing and resolving saved deployable options
- Update backend tests for the new option-based switch API
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
* feat(telegram): add model-access switcher to Telegram page UI
- Add TelegramModelAccessSection component that fetches model access options,
shows active provider/model, lets users pick a saved backend and model
(fixed dropdown or custom input), and switches via the API
- Wire section into TelegramSettings, shown only when Telegram is deployed
- Empty state links users to /ai-provider to save a config first
- Update frontend tests to provide model-access-options fetch mock
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
* refactor(telegram): fix model-access queries, type safety, and extract model-switch service
- Fix ResolvedOption.provider type lie: change from ApiProviderId to string,
remove unsafe as unknown as ApiProviderId casts
- Replace N+1 per-type DB queries with batch inArray queries in
getModelAccessOptions (4+ queries → 3 total)
- Remove isDeployable pre-filter; fold decrypt into builders via
decryptAndGetLast4 helper to avoid double decryption
- Change findActiveOptionIds to return typed { providerIds, subscriptionIds }
so deactivation uses single UPDATE WHERE id IN (...) per table
- Extract switchModelProvider orchestration (SSH + compose + DB transaction)
into server/telegram/model-switch.ts; handler is now a thin wrapper
- Refactor testTelegramBot to reuse resolveTelegramSshContext helper
- Replace dynamic import("../providers/config") with static import
- Remove unused ModelSwitchPayload export from shared contract
* refactor(telegram): replace useState with useReducer and fix react-doctor warnings
Convert TelegramModelAccessSection from 6 useState calls to a single
useReducer, which batches related state updates into one React render
instead of 6. The reducer groups form state (options, selection) and
UI state (loading, switching, messages) into typed actions.
Also switches from useEffect to useMountEffect for the initial data
fetch, matching the established pattern in telegram-pairing-section
and other components. This eliminates the react-doctor/no-event-handler
false positive — isDeployed is set by the parent based on external
server state, not a user event.
React Doctor score: 79 → 100/100.
* test(telegram): update switchModelProvider tests for extracted service
- Add executeModelSwitch mock for ./telegram/model-switch module
- Update activeOptionIds format from flat array to { providerIds, subscriptionIds }
- Update success assertions to verify executeModelSwitchMock args instead
of checking SSH runtime calls directly (those are now tested in the
service layer)
- Update SSH failure test to mock executeModelSwitch rejection
* refactor(dashboard): extract polling logic into useDashboardPolling hook
DashboardStatusOverview was 313 lines — just over the react-doctor
no-giant-component threshold of 300. The component already had good
subcomponent extraction (ServerInventoryCard, StatusCard, VpsHealthCard,
etc.), but ~125 lines of polling state machine logic (7 refs, 4 useState,
5 functions, 1 useMountEffect) was mixed into the component body.
Extract all polling machinery into a useDashboardPolling custom hook
that returns { snapshot, fetchState, fetchError, pollingPaused,
refreshStatus, handleManualRetry }. The main component drops to ~190
lines and focuses purely on rendering.
No behavior change — the polling logic is identical, just relocated.
React Doctor: no-giant-component warning resolved.
* refactor(providers): replace boolean props with explicit status enum in AccessSelectionActions
AccessSelectionActions took isSaving and isTesting as separate booleans,
creating an implicit 4-state system (idle, saving, testing, both) when
only 3 are valid. The \"both\" state is impossible in practice but the
type system allowed it.
Replace with status: \"idle\" | \"saving\" | \"testing\" — a single prop
that makes the valid states explicit and eliminates the dead 4th state
at the type level. Call sites derive the status from the existing
booleans with a ternary.
React Doctor: prefer-explicit-variants warning resolved.
Also: added react-doctor GitHub Actions CI workflow (npx react-doctor install).
* test(crypto): add unit tests for AES-256-GCM secret helpers
Pins round-trip, tamper rejection, malformed payload, missing key, and
all three decryptApiServerKey branches (empty, legacy plaintext,
encrypted round-trip, malformed-with-dot). Uses real node:crypto — no
mocking — since testing the actual implementation is the point.
Plan 001 from audit at 29c3b46.
* fix(agent-skills): fail skill install when remote download fails
The curl | sudo tee pipeline masked curl non-zero exit code because
in a shell pipeline only the last command exit status propagates. A
failed download (404, DNS, network drop) would write an empty SKILL.md
and report success.
Replace with download-to-temp + test -s + mv: curl writes to a .download
temp file, test -s verifies it is non-empty, then mv atomically moves it
to the final path. Any failure in the && chain aborts the install.
Plan 002 from audit at 29c3b46.
* fix(ssh): fail closed when host-key pin missing on credential ops
Add requireHostKeyPin flag to SshConnectionInput. When set, the
hostVerifier throws host_key_missing if no expectedFingerprint is
stored, instead of silently accepting any key (fail-open).
Every post-registration withSshConnection caller now passes
requireHostKeyPin: true. The first-connect flow (verifyServerConnection)
is unchanged — it still tolerates a missing fingerprint so the pin can
be stored on initial connection.
Covers 15 call sites across 13 files. Plan 003 from audit at 29c3b46.
* test(request-guards): cover auth and ownership boundary
Pins status codes for requireAuthSession (401), requireOwnedServer
(400 missing id, 401 unauthenticated, 404 not owned), and
requireOwnedServerSsh (400 SSH failure, 404 not owned). Asserts that
getOwnedServerRecord is called with the session user.id — the IDOR
guard. Safety net for the later guard-consolidation refactor (DEBT-01).
Plan 004 from audit at 29c3b46.
* test(server-records): cover credential resolution branches
Pins resolveServerCredential branches: stored-present, stored-missing,
session-missing-id, session-expired, session-valid. Covers
normalizeAuthMethod (password, ssh-key, unsupported) and both
resolveServerSshConfigOrError branches (ok + error). Mocks crypto and
credentials — no DB or real crypto needed.
Plan 005 from audit at 29c3b46.
* chore: mark all audit plans as DONE
* docs: add audit plan files
* refactor(ssh): default to requiring host-key pin, opt out for first-connect
Invert the requireHostKeyPin flag to allowMissingHostKeyPin. The secure
behavior (fail closed when no fingerprint is stored) is now the default.
Only verifyServerConnection (first-connect) opts out by passing
allowMissingHostKeyPin: true.
This eliminates requireHostKeyPin: true from 15 call sites and removes
the field from DeployComposeInput, ManagedComposeDeployInput, and
SshConnectionConfig intermediate types. A future caller that forgets
the flag is safe by default.
Also extracts parseOptionId helper and generic keepLatestBy to eliminate
copy-paste. Uses composition for DeployComposeInput (extends
SshConnectionInput instead of duplicating its fields).
Review fixes #1, #2, #4, #5.
* chore(plans): remove completed audit plan documents
All five audit plans (001-005) were fully implemented and verified:
- 001: crypto helper tests in server/crypto.test.ts
- 002: pipefail fix for skill install downloads
- 003: host-key pinning fail-closed on credential SSH ops
- 004: request guard tests in server/request-guards.test.ts
- 005: credential resolution tests in server/server-records.test.ts
These planning artifacts served their purpose and are now cleaned up.
* feat(server): handle SSH host-key errors on provider deploy endpoints
Catch SshConnectError with host_key_missing and host_key_mismatch
codes during deploy flow and return a structured 409 response with
observed fingerprint, algorithm, and (on mismatch) expected fingerprint.
Update test mocks to use importOriginal so SshConnectError is available
for new test assertions.
* feat(server): handle SSH host-key errors on Telegram model switch
Catch SshConnectError with host_key_missing and host_key_mismatch
during switchModelProvider and return a structured 409 response.
Add test coverage for both host-key error codes and the generic
SSH error fallback (502).
* feat(providers): add host-key trust UI for provider deploy
Surface host-key errors from the deploy endpoint as a warning panel
showing observed and expected fingerprints. Add "Trust host key and
retry" flow that POSTs to /api/servers/:id/host-key/accept then
re-attempts the deploy. Wire the new state, reducer actions, and
controller method into ProviderSettings.
* feat(server): extract shared host-key error response utility
Deduplicate the repetitive SSH host-key error handling that was inlined
across deploy endpoints. The new isRecoverableHostKeyError type guard and
hostKeyErrorResponse builder produce a consistent 409 JSON payload with
structured hostKey (observed fingerprint/algorithm, expected on mismatch).
* feat(client): add shared host-key recovery types and trust panel UI
Extract HostKeyErrorPayload type and parseHostKeyErrorPayload parser to a
shared module under features/servers. The HostKeyTrustPanel component
renders the host-key trust prompt (fingerprint display, trust/retry and
cancel buttons) so it can be reused across provider deploy and Telegram
model-switch views.
* refactor(server): deduplicate host-key error handling in deploy endpoints
Replace the duplicated host_key_missing / host_key_mismatch inline blocks
in deployProviderToHermes, deployToHermesAgent, and switchModelProvider
with the shared isRecoverableHostKeyError / hostKeyErrorResponse utilities.
On the client side, swap the inline host-key error parsing in
provider-access-actions for parseHostKeyErrorPayload and replace the
inline host-key alert UI in provider-settings-aside with the reusable
HostKeyTrustPanel component.
* refactor(telegram): extract ModelAccessForm sub-component
Pull the inline option selector, model picker, and switch/refresh buttons
into a self-contained ModelAccessForm component to reduce clutter in the
TelegramModelAccessSection render method.
* refactor(contracts,providers): extract shared host-key types and acceptHostKey action
Create shared/contracts/host-key-error.ts with canonical HostKeyErrorCode and HostKeyErrorResponsePayload types, replacing duplicate definitions on both sides of the boundary.\n\nAdd acceptHostKey() to provider-access-actions.ts as a reusable action for trusting server host keys, using the existing readJsonBody helper.\n\nRestore try/catch in handleTrustAndRetryDeploy to prevent unhandled rejections on network errors.\n\nUpdate server/lib/host-key-error-response.ts, src/features/servers/host-key-recovery.ts, and src/features/providers/provider-access-actions.ts to import from the shared contract.
* refactor(server): extract resolveSwitchOption branch resolvers
Split the large multi-branch resolveSwitchOption function in server/telegram/model-access.ts into three self-contained resolver functions (resolveApiProviderOption, resolveCredentialSubscriptionOption, resolveOAuthSubscriptionOption) with a switch-based dispatcher. Each resolver handles its own DB query, validation, and ResolvedOption construction. No behavioral changes.
* feat(telegram): add host-key trust-and-retry to model switch
Extract state machine and data fetching from telegram-model-access-section.tsx into use-model-access-controller.ts with host-key recovery support.\n\nThe hook now parses 409 host-key error responses via parseHostKeyErrorPayload and exposes a HostKeyTrustPanel flow (handleTrustAndRetrySwitch) instead of showing a generic error. This brings the Telegram model switch UX to parity with the AI provider deploy flow for host-key recovery.\n\nThe section component delegates to the hook and renders HostKeyTrustPanel when a host-key error is detected during model switching.
* chore(pre-commit): move typecheck hook to push stage
Move TypeScript typecheck from commit stage to push stage so partial staging is not blocked by type errors in unstaged files. Full typecheck still runs on every git push.
* fix(server): add host key error recovery to deploy and test endpoints
Add isRecoverableHostKeyError checks before generic 502 fallbacks in three server paths that were missing host key recovery:\n\n- deployTelegramToServer: initial Telegram bot deploy to a server\n- testTelegramBot: test-bot flow over SSH\n- deployToTelegramLinkedHermes: skills/agent deploy to Telegram-linked server\n\nWhen a server has hostKeyFingerprint NULL in the DB, SSH connections throw 'host key pin required but not stored'. These paths previously returned a generic 502; they now return a 409 with a hostKeyErrorResponse payload so the client can surface the trust-and-retry panel.\n\nPattern matches the existing handlers: switchModelProvider, deployToHermesAgent, deployProviderToHermes.
* feat(telegram): add host key trust panel to deploy and test sections
Add HostKeyTrustPanel with trust-and-retry flow to the Telegram deploy section (initial bot deploy) and test section (test-bot flow).\n\nBoth sections now parse 409 responses via parseHostKeyErrorPayload and surface a HostKeyTrustPanel when a host key error is detected. On trust, the host key is accepted via POST /api/servers/:id/host-key/accept and the operation retries automatically.\n\nState management consolidated into useReducer (with useRef for stale-closure avoidance) to match the pattern in use-model-access-controller.ts.\n\nBrings the Telegram deploy and test UIs to parity with the model switch and provider deploy flows for host key recovery.
* test(telegram): add host key error recovery integration tests
Add integration tests for host key recovery in deployTelegramToServer and testTelegramBot endpoints.\n\nVerifies that both endpoints return 409 with hostKey recovery payload when SSH fails with host_key_missing, instead of the previous generic 502. Confirms no audit log is written for recoverable errors and no deploy state is persisted.
* docs(api): document host key recovery flow and new endpoints
Add Host Key Recovery section to the API reference explaining the 409 response format, error codes (host_key_missing/host_key_mismatch), recovery flow, and list of affected endpoints.\n\nDocument new POST /api/servers/:id/host-key/accept endpoint.\n\nAdd previously undocumented Telegram endpoints: POST /api/telegram/deploy, POST /api/telegram/test, POST /api/telegram/model-switch.\n\nUpdate error response tables on all 8 deploy endpoints (telegram deploy/test/model-switch, providers deploy, persona/mcp/agent-skills deploy, web-ui deploy) to include 409 for host key errors.
* docs(glossary): add host key pin and recovery terminology
Add two glossary entries to CONTEXT.md:\n\n- Host Key Pin: explains the stored SHA256 fingerprint, its role in SSH verification, and the host_key_missing/host_key_mismatch error contract\n- Host Key Recovery: explains the 409 -> trust panel -> accept -> retry flow and lists all 8 endpoints that support it
---------
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
What
Fix the Hermes agent Docker image reference in the install workflow and server actions (update/rollback). The CI pipeline at NousResearch/hermes-agent publishes to Docker Hub, not GitHub Container Registry.
Why
Install was failing with
not foundbecauseghcr.io/hermes-agent/hermes:latestnever existed -- the image is atnousresearch/hermes-agenton Docker Hub (public, ~1.1GB multi-arch). The previous GHCR auth fix added login but still pulled from the wrong registry.How
server/constants.ts-- single source of truth forhermesImageRepositoryanddefaultHermesImageserver/install.ts-- compose file now referencesnousresearch/hermes-agent:latestserver/server-actions.ts-- rollbackdocker pullandseduse the shared repo nameserver/ghcr.ts-- dead GHCR auth code; the public Docker Hub image needs no credentialsbuildInstallSteps()to a plain static arrayChecklist
Summary by CodeRabbit