Skip to content
This repository was archived by the owner on Feb 1, 2022. It is now read-only.
This repository was archived by the owner on Feb 1, 2022. It is now read-only.

cannot work in namespace #121

@daniel985

Description

@daniel985

when we submit a Job and assign a namespace, it cannot work,
submit like this:
"
kubectl create -f xgboost-operator/config/samples/xgboost-dist/xgboostjob_v1_iris_train.yaml -n aisys
"

and the error message like this:
"
starting the train job
starting to extract system env
extract the Rabit env from cluster : xgboost-dist-iris-test-train-master-0, port: 9991, rank: 0, word_size: 3
start the master node
start listen on 0.0.0.0:9991

RabitTracker Setup Finished
Rabit rank setup with below envs

DMLC_NUM_WORKER=3
DMLC_TRACKER_URI=xgboost-dist-iris-test-train-master-0
DMLC_TRACKER_PORT=9991
DMLC_TASK_ID=0
retry connect to ip(retry time 1): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 2): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 3): [xgboost-dist-iris-test-train-master-0]
retry connect to ip(retry time 4): [xgboost-dist-iris-test-train-master-0]
connect to (failed): [xgboost-dist-iris-test-train-master-0]
Socket Connect Error:Connection refused
"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions