A potential refinement on document

When I started to deploy xgboost-operator on my kubeflow cluster, I referred to https://github.com/kubeflow/xgboost-operator/blob/master/config/samples/xgboost-dist/utils.py#L47 to implement my own version to read my own data. It's very common I follow this function to read parts of the whole data according to the rank manually. 

However, I found that dmatrix already has an internal logic to only read parts of data when it detects distributed mode. Then my manual data reading causes each rank to only read 1/N*N instead of 1/N data.

I think it could be better if adding a comment in that function to guide the users to rewrite it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A potential refinement on document #123

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A potential refinement on document #123

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions