Conversation
WalkthroughConfiguration version defaults updated for Akash components (v0.38.1→v1.0.0, chart versions bumped). Two new operator version attributes added. JSON parsing key for node sync status renamed (SyncInfo→sync_info). Operator upgrade flow modified to configure image tags from Config and insert 20-second waits for pod settlement. Changes
Sequence Diagram(s)sequenceDiagram
participant Upgrade as Upgrade Service
participant Config as Config
participant Helm as Helm
participant Pods as Operator Pods
participant Verify as Verification
Upgrade->>Config: Fetch PROVIDER_HOSTNAME_OPERATOR_VERSION
Config-->>Upgrade: v0.10.0
Upgrade->>Config: Fetch PROVIDER_INVENTORY_OPERATOR_VERSION
Config-->>Upgrade: v0.10.0
Upgrade->>Helm: helm upgrade --set image.tag=v0.10.0 (hostname)
Helm-->>Upgrade: Command executed
Upgrade->>Helm: helm upgrade --set image.tag=v0.10.0 (inventory)
Helm-->>Upgrade: Command executed
rect rgb(200, 220, 255)
Note over Upgrade,Pods: Wait for pod rollout (20s)
Upgrade->>Pods: sleep 20 seconds
Pods-->>Upgrade: Ready
end
Upgrade->>Verify: Verify pod versions
Verify->>Pods: Capture pod version output
Pods-->>Verify: Version info
Verify-->>Upgrade: Verification complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
application/service/upgrade_service.py (1)
172-174: Bug confirmed: SSH connection is closed prematurely insidecheck_upgrade_status, then reused by calling methods.Both
upgrade_network(line 178) andupgrade_provider(line 283) callcheck_upgrade_status(ssh_client), then immediately reuse the samessh_clientin subsequentrun_ssh_command()calls (lines 190+ and 297+ respectively). Thefinallyblock at lines 172-174 closes the connection, causing these subsequent calls to fail.Fix: Remove the
ssh_client.close()fromcheck_upgrade_status's finally block. Callers already have their own finally blocks (verified at line 279) that properly close the connection after they are done using it.finally: - ssh_client.close() + # Caller is responsible for closing ssh_client
🧹 Nitpick comments (4)
application/config/config.py (1)
37-38: New operator version envs: confirm tag format and centralize canonicalization.
You strip a leading "v" later before --set image.tag. Either store canonical tags here (without "v") or add a small helper (e.g., canonical_tag(v)->strip leading v). Reduces repeated lstrip and avoids mismatches.If tags in your registry already include "v", confirm whether charts expect "v"-less tags; behavior differs per image.
application/service/provider_service.py (2)
423-430: Make sync parsing backward-compatible and resilient.
Support both "sync_info" and legacy "SyncInfo"; guard against missing/str values to avoid KeyError/ValueError loops.Apply:
- node_height = int(node_status["sync_info"]["latest_block_height"]) + sync = node_status.get("sync_info") or node_status.get("SyncInfo") or {} + node_height = int(str(sync.get("latest_block_height", "0"))) ... - if node_status["sync_info"]["catching_up"]: + if bool(sync.get("catching_up", True)):
407-411: Avoid TTY for non-interactive exec.
Drop-itto prevent TTY artifacts in JSON parsing and flaky hangs over SSH.- "kubectl exec -it akash-node-1-0 -n akash-services -c akash-node -- akash status", + "kubectl exec akash-node-1-0 -n akash-services -c akash-node -- akash status",application/service/upgrade_service.py (1)
365-366: Replace fixed sleep with rollout status checks.
Static 20s waits are brittle. Wait for deployments to finish rolling out instead.- log.info("Waiting for 20 seconds to allow pods to be upgraded...") - time.sleep(20) # Wait for the pods to be upgraded + log.info("Waiting for pods to complete rollout...") + run_ssh_command(ssh_client, "kubectl -n akash-services rollout status deploy/akash-provider --timeout=2m", True, task_id=task_id) + run_ssh_command(ssh_client, "kubectl -n akash-services get pods -o custom-columns='NAME:.metadata.name,IMAGE:.spec.containers[*].image' | grep -v akash-node", True, task_id=task_id)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
application/config/config.py(1 hunks)application/service/provider_service.py(1 hunks)application/service/upgrade_service.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
application/service/upgrade_service.py (2)
application/config/config.py (1)
Config(4-57)application/utils/ssh_utils.py (1)
run_ssh_command(138-204)
🔇 Additional comments (3)
application/service/upgrade_service.py (2)
6-6: No concerns.
Import required for waits.
329-337: Address lstrip edge case and version normalization inconsistency.The code correctly strips the "v" prefix from config values that default to "v0.10.0" format. However, two issues need attention:
Inconsistent version normalization: Line 607 in
provider_service.pyuses.replace("v", "")while lines 329 and 335 use.lstrip("v"). The latter is risky—if a version ever contains leading "v" characters (e.g., "vv0.10.0"),lstrip()removes all of them, whereas.replace()is more explicit. Standardize to.replace("v", "")for clarity and robustness.lstrip() edge case: Although Config defaults include "v" prefix, document or validate that versions cannot start with multiple "v" characters; otherwise, use explicit string operations.
Confirm that the akash-hostname-operator and akash-inventory-operator helm charts (from akash/akash-hostname-operator and akash/akash-inventory-operator repos) expect image.tag without the "v" prefix. Review the helm chart values or chart documentation to ensure this assumption is correct.
application/config/config.py (1)
28-36: Version bumps require manual verification in your cluster—provided script cannot run in sandbox.The specified Helm chart versions cannot be verified in the sandbox environment due to network restrictions. You must verify these versions exist in your target Helm repositories (akash vs akash-dev) by running the provided verification script in your actual cluster or session to prevent installation failures.
Summary by CodeRabbit
Bug Fixes
Chores