Skip to content

A cockroachdb service failed to startup after mupdate to c8f8332bc #7221

@leftwo

Description

@leftwo

Dogfood rack was mupdated to omicron commit c8f8332

After the mupdate, everything came online expect for one cockroachdb zone on sled 17:

BRM42220017 # svcs -xZ
svc:/oxide/cockroachdb:default (CockroachDB)
  Zone: oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739
 State: maintenance since Tue Dec 10 16:33:26 2024
Reason: Restarting too quickly.
   See: http://illumos.org/msg/SMF-8000-L5
   See: /pool/ext/ae56280b-17ce-4266-8573-e1da9db6c6bb/crypt/zone/oxz_cockroachdb_3237a532-acaa-4ebe-bf11-dde794fea739/root/var/svc/log/oxide-cockroachdb:default.log
Impact: This service is not running.

Looking at the log, it does not provide much info other than processes have exited and then restarted:

[ Dec 10 16:31:15 Enabled. ]                                                                                       
[ Dec 10 16:31:15 Rereading configuration. ]
[ Dec 10 16:31:16 Rereading configuration. ]                                                                       
[ Dec 10 16:31:28 Executing start method ("/opt/oxide/lib/svc/manifest/cockroachdb.sh"). ]
+ set -o errexit                          
+ set -o pipefail                                                                                                  
+ . /lib/svc/share/smf_include.sh                                                                                  
++ SMF_EXIT_OK=0                                                                                                                                                                                                                      
++ SMF_EXIT_NODAEMON=94        
++ SMF_EXIT_ERR_FATAL=95                           
++ SMF_EXIT_ERR_CONFIG=96        
++ SMF_EXIT_MON_DEGRADE=97     
++ SMF_EXIT_MON_OFFLINE=98 
++ SMF_EXIT_ERR_NOSMF=99                                                                                           
++ SMF_EXIT_ERR_PERM=100                        
++ svcprop -c -p config/listen_addr svc:/oxide/cockroachdb:default
+ LISTEN_ADDR='[fd00:1122:3344:109::3]:32221'                                                                      
++ svcprop -c -p config/store svc:/oxide/cockroachdb:default
+ DATASTORE=/data                                                                                                  
++ /opt/oxide/internal-dns-cli/bin/dnswait cockroach
++ head -n 5     
++ tr '\n' ,                     
note: configured to log to "/dev/stderr"
16:31:28.240Z INFO dnswait: using system configuration
+ JOIN_ADDRS=3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide
.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
+ [[ -z 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.inte
rnal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221, ]]
+ args=('--insecure' '--listen-addr' "$LISTEN_ADDR" '--http-addr' '127.0.0.1:8080' '--store' "$DATASTORE" '--join' "$JOIN_ADDRS")
+ exec /opt/oxide/cockroachdb/bin/cockroach start --insecure --listen-addr '[fd00:1122:3344:109::3]:32221' --http-addr 127.0.0.1:8080 --store /data --join 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:322
21,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.
:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
[ Dec 10 16:31:28 Method "start" exited with status 0. ]
*                
* WARNING: ALL SECURITY CONTROLS HAVE BEEN DISABLED!
*                                                
* This mode is intended for non-production testing only.
*                                       
* In this mode:                                       
* - Your cluster is open to any client that can access fd00:1122:3344:109::3.                                                                                                                                                         
* - Intruders with access to your machine or network can observe client-server traffic.                                                                                                                                               
* - Intruders can log in without password and read or write any data in the cluster.                                                                                                                                                  
* - Intruders can consume all your server's resources and cause unavailability.                                                                                                                                                       
*                                                                                                                                                                                                                                     
*                                                                                                                  
* INFO: To start a secure server without mandating TLS for clients,                                                                                                                                                                   
* consider --accept-sql-without-tls instead. For other options, see:                                                                                                                                                                  
*                                                                                                                  
* - https://go.crdb.dev/issue-v/53404/v22.1
* - https://www.cockroachlabs.com/docs/v22.1/secure-a-cluster.html
*
CockroachDB node starting at 2024-12-10 16:31:59.237331244 +0000 UTC (took 30.1s)
build:               OSS v22.1.22-27-g76e176e260 @ 2024/10/23 21:38:21 (go1.17.13)
webui:               http://127.0.0.1:8080
sql:                 postgresql://root@[fd00:1122:3344:109::3]:32221/defaultdb?sslmode=disable
sql (JDBC):          jdbc:postgresql://[fd00:1122:3344:109::3]:32221/defaultdb?sslmode=disable&user=root
RPC client flags:    /opt/oxide/cockroachdb/bin/cockroach <client cmd> --host=[fd00:1122:3344:109::3]:32221 --insecure
logs:                /data/logs
temp dir:            /data/cockroach-temp3047379273
external I/O path:   /data/extern
store[0]:            path=/data
storage engine:      pebble
clusterID:           2a348c29-7ccb-4d77-9afd-f1e37b9abb40 
status:              restarted pre-existing node
nodeID:              1
[ Dec 10 16:33:25 Stopping because all processes in service exited. ]
[ Dec 10 16:33:25 Executing stop method (:kill). ]
[ Dec 10 16:33:25 Executing start method ("/opt/oxide/lib/svc/manifest/cockroachdb.sh"). ]
+ set -o errexit
+ set -o pipefail
+ . /lib/svc/share/smf_include.sh
++ SMF_EXIT_OK=0
++ SMF_EXIT_NODAEMON=94
++ SMF_EXIT_ERR_FATAL=95
++ SMF_EXIT_ERR_CONFIG=96
++ SMF_EXIT_MON_DEGRADE=97
++ SMF_EXIT_MON_OFFLINE=98
++ SMF_EXIT_ERR_NOSMF=99
++ SMF_EXIT_ERR_PERM=100
++ svcprop -c -p config/listen_addr svc:/oxide/cockroachdb:default
+ LISTEN_ADDR='[fd00:1122:3344:109::3]:32221'
++ svcprop -c -p config/store svc:/oxide/cockroachdb:default
+ DATASTORE=/data
++ ++ ++ head -n 5
/opt/oxide/internal-dns-cli/bin/dnswait cockroach
tr '\n' ,
note: configured to log to "/dev/stderr"
16:33:25.957Z INFO dnswait: using system configuration
+ JOIN_ADDRS=3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide
.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,
+ [[ -z 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.inte
rnal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:32221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221, ]]
+ args=('--insecure' '--listen-addr' "$LISTEN_ADDR" '--http-addr' '127.0.0.1:8080' '--store' "$DATASTORE" '--join' "$JOIN_ADDRS")
+ [ Dec 10 16:33:26 Method "start" exited with status 0. ]
exec /opt/oxide/cockroachdb/bin/cockroach start --insecure --listen-addr '[fd00:1122:3344:109::3]:32221' --http-addr 127.0.0.1:8080 --store /data --join 3237a532-acaa-4ebe-bf11-dde794fea739.host.control-plane.oxide.internal.:32221
,4c3ef132-ec83-4b1b-9574-7c7d3035f9e9.host.control-plane.oxide.internal.:32221,8bbea076-ff60-4330-8302-383e18140ef3.host.control-plane.oxide.internal.:32221,a3628a56-6f85-43b5-be50-71d8f0e04877.host.control-plane.oxide.internal.:3
2221,e86845b5-eabd-49f5-9a10-6dfef9066209.host.control-plane.oxide.internal.:32221,

No core files were found in the expected places.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions