You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 1, 2022. It is now read-only.
when we submit a Job and assign a namespace, it cannot work,
submit like this:
"
kubectl create -f xgboost-operator/config/samples/xgboost-dist/xgboostjob_v1_iris_train.yaml -n aisys
"
and the error message like this:
"
starting the train job
starting to extract system env
extract the Rabit env from cluster : xgboost-dist-iris-test-train-master-0, port: 9991, rank: 0, word_size: 3
start the master node
start listen on 0.0.0.0:9991
RabitTracker Setup Finished
Rabit rank setup with below envs
DMLC_NUM_WORKER=3
DMLC_TRACKER_URI=xgboost-dist-iris-test-train-master-0
DMLC_TRACKER_PORT=9991
DMLC_TASK_ID=0
retry connect to ip(retry time 1): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 2): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 3): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 4): [xgboost-dist-iris-test-train-master-0]
connect to (failed): [xgboost-dist-iris-test-train-master-0]
Socket Connect Error:Connection refused
"