Skip to content

Simplified the metrics#3182

Merged
google-oss-prow[bot] merged 13 commits into
kubeflow:masterfrom
kunal-511:improve-metrics
Jul 15, 2025
Merged

Simplified the metrics#3182
google-oss-prow[bot] merged 13 commits into
kubeflow:masterfrom
kunal-511:improve-metrics

Conversation

@kunal-511

Copy link
Copy Markdown
Contributor

Pull Request Template for Kubeflow Manifests

✏️ Summary of Changes

  1. Separated Knative from Istio
  2. Created Storage & Experimental For SeaweedFS, Ray, and experimental components

✅ Contributor Checklist

  • I have tested these changes with kustomize. See Installation Prerequisites.
  • All commits are signed-off to satisfy the DCO check.
  • I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).

You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Comment thread tests/metrics-server_resource_table.py Outdated
Comment on lines +301 to +308
print("- **Other**: Miscellaneous Kubeflow components including:")
print(" - PVC Viewer Controller")
print(" - Tensorboard Controller and Web App")
print(" - Multi-tenancy and RBAC components")
print(" - Network policies")
print(" - Custom resources and CRDs")
print(" - User namespace resources")
print(" - Additional Kubeflow utilities and tools")

@juliusvonkohout juliusvonkohout Jul 4, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the components here https://github.com/kubeflow/manifests#kubeflow-components-versions . It should be covered in the same detail and not under other. i would even just extend and merge the two tables with more columns and remove https://github.com/kubeflow/manifests#resource-usage-by-components

networkpolicies and other things without CPU/MEMORY/STORAGE you can ignore anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the table in the main readme after fixing the other open issues

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Comment thread tests/metrics-server_resource_table.py Outdated
'keywords': ['pvcviewer']
},
'Storage & Experimental': {
'keywords': ['seaweedfs', 'ray', 'minio-tenant']

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by minio-tenant? I think you can just ignore the experimental folder for now entirely.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seaweedfs can also be counted to kfp if needed.

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Comment thread tests/metrics-server_resource_table.py Outdated
# Storage fallback values when YAML parsing is unavailable
STORAGE_FALLBACK = {
'Katib': 3,
'Metadata': 40,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata also belongs to KFP. And please check the storage numbers manually. I remember less than 30 GB in total or so. Since it is either minio XOR seawedfs please do not count them twice.

@kunal-511 kunal-511 Jul 9, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it manually it is 25

kunal-511 added 2 commits July 9, 2025 18:29
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Comment thread README.md Outdated
| Istio | 850m | 2464Mi | 0GB |
| Katib | 4m | 107Mi | 3GB |
| Kubeflow Core | 17m | 828Mi | 0GB |
| Training Operator | 2m | 27Mi | 0GB |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed please remove this table and merge the columns with the existing table at the beginning of the document.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kserve and a few other components are missing numbers in the table and let's merge istio, Knative etc. Into the same table.

kunal-511 and others added 6 commits July 15, 2025 00:04
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please separate out Dex and oauth2-proxy in a follow up PR. I already separated it in the table in the readme

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
@juliusvonkohout

Copy link
Copy Markdown
Member

ah and the numbers for Tensorboards Web Application are missing in the table

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Comment thread README.md
| Profiles + KFAM | applications/profiles/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/profile-controller/config) | 7m | 129Mi | 0GB |
| PodDefaults Webhook | applications/admission-webhook/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/admission-webhook/manifests) | 1m | 14Mi | 0GB |
| Jupyter Web Application | applications/jupyter/jupyter-web-app/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/crud-web-apps/jupyter/manifests) | 4m | 231Mi | 0GB |
| Tensorboards Web Application | applications/tensorboard/tensorboards-web-app/upstream | [v1.10.0](https://github.com/kubeflow/kubeflow/tree/v1.10.0/components/crud-web-apps/tensorboards/manifests) | | | |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are the values for tensorboards?

Comment thread README.md
| Katib | applications/katib/upstream | [v0.18.0](https://github.com/kubeflow/katib/tree/v0.18.0/manifests/v1beta1) | 13m | 476Mi | 13GB |
| KServe | applications/kserve/kserve | [v0.15.0](https://github.com/kserve/kserve/releases/tag/v0.15.0/install/v0.15.0) | 600m | 1200Mi | 0GB |
| KServe Models Web Application | applications/kserve/models-web-app | [v0.14.0](https://github.com/kserve/models-web-app/tree/v0.14.0/config) | 6m | 259Mi | 0GB |
| Kubeflow Pipelines | applications/pipeline/upstream | [2.5.0](https://github.com/kubeflow/pipelines/tree/2.5.0/manifests/kustomize) | 970m | 3552Mi | 100GB |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100 GB for pipelines does not seem correct

Comment thread README.md
| KServe | applications/kserve/kserve | [v0.15.0](https://github.com/kserve/kserve/releases/tag/v0.15.0/install/v0.15.0) | 600m | 1200Mi | 0GB |
| KServe Models Web Application | applications/kserve/models-web-app | [v0.14.0](https://github.com/kserve/models-web-app/tree/v0.14.0/config) | 6m | 259Mi | 0GB |
| Kubeflow Pipelines | applications/pipeline/upstream | [2.5.0](https://github.com/kubeflow/pipelines/tree/2.5.0/manifests/kustomize) | 970m | 3552Mi | 100GB |
| Kubeflow Model Registry | applications/model-registry/upstream | [v0.2.19](https://github.com/kubeflow/model-registry/tree/v0.2.19/manifests/kustomize) | 510m | 2112Mi | 20GB |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 20 GB for model registry correct ?

if kind == 'PersistentVolumeClaim':
storage_str = doc.get('spec', {}).get('resources', {}).get('requests', {}).get('storage', '0')
storage_gb = parse_resource_value(storage_str, 'storage')
component_storage[component] += storage_gb

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please compare the storage with the live pvcs on the cluster. 100 GB for pipeline is not right.

@juliusvonkohout

Copy link
Copy Markdown
Member

/lgtm
/approve

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@juliusvonkohout

juliusvonkohout commented Jul 15, 2025

Copy link
Copy Markdown
Member

For the follow up PR:

  1. please compare the storage with the live pvcs on the cluster. 100 GB for pipeline is not right.
  2. 100 GB for pipelines does not seem correct
  3. is 20 GB for model registry correct ?
  4. Total 134 GB doe snot look correct
  5. where are the values for tensorboards?

@google-oss-prow google-oss-prow Bot merged commit a4bf1ee into kubeflow:master Jul 15, 2025
7 of 8 checks passed
@kunal-511 kunal-511 mentioned this pull request Jul 17, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants