Skip to content
This repository was archived by the owner on Jul 7, 2025. It is now read-only.

fix(kfserving): add istio local gateway deployment in istio-system#177

Closed
jal06 wants to merge 4 commits into
GoogleCloudPlatform:masterfrom
jal06:master
Closed

fix(kfserving): add istio local gateway deployment in istio-system#177
jal06 wants to merge 4 commits into
GoogleCloudPlatform:masterfrom
jal06:master

Conversation

@jal06

@jal06 jal06 commented Nov 30, 2020

Copy link
Copy Markdown

For solving issue 176

@k8s-ci-robot

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jal06
To complete the pull request process, please assign bobgy after the PR has been reviewed.
You can assign the PR to them by writing /assign @bobgy in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot

Copy link
Copy Markdown

Hi @jal06. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bobgy Bobgy left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix!

kind: Kustomization
namespace: istio-system
resources:
- ../../../upstream/manifests/istio/cluster-local-gateway/base

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jal06, if the resource doesn't need any user modification, we can skip putting it in instance folder. You can just call kustomize build from the upstream folder directly.

@Bobgy Bobgy changed the title Add istio local gateway deployment in istio-system fix(kfserving): add istio local gateway deployment in istio-system Dec 7, 2020
@jal06

jal06 commented Dec 8, 2020

Copy link
Copy Markdown
Author

@Bobgy,

Please could you elaborate more ?
From my understanding,

  • the step hydrate-kubeflow does a copy of the manifests from instance to upstream.
  • The step apply-kubeflow does a copy of the manifests from upstream to .build before applying the manifests from .build

If I just call kustomize build in the step apply-kubeflow without hydrating cluster-local-gateway, I get the error below during the step make apply

error: the path "./.build/cluster-local-gateway" does not exist
Makefile:191: recipe for target 'apply-kubeflow' failed
make: *** [apply-kubeflow] Error 1

That's why I don't understand how to skip putting cluster-local-gateway in instance.

@Bobgy

Bobgy commented Dec 8, 2020

Copy link
Copy Markdown
Contributor

/ok-to-test

Comment thread kubeflow/Makefile Outdated
mkdir -p $(BUILD_DIR)/kubeflow-issuer
kustomize build --load_restrictor none -o $(BUILD_DIR)/kubeflow-issuer $(KF_DIR)/kubeflow-issuer
mkdir -p $(BUILD_DIR)/cluster-local-gateway
kustomize build --load_restrictor none -o $(BUILD_DIR)/cluster-local-gateway $(KF_DIR)/cluster-local-gateway

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jal06 I was suggesting changing this line to

kustomize build --load_restrictor none -o $(BUILD_DIR)/cluster-local-gateway $(MANIFESTS_DIR)/path/to/cluster-local-gateway

because there's no need to add user customizations to that package.

@jal06 jal06 Dec 8, 2020

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bobgy,

@jal06 I was suggesting changing this line to

I didn't test it, but I am not sure it could work because make apply starts by cleaning .build so cluster-local-gateway would be cleaned before being applied.

.PHONY: apply
apply: clean-build check-name check-iap apply-gcp wait-gcp create-ctxt apply-asm apply-kubeflow iap-secret

...

clean-build:
	# Delete build because we want to prune any resources which are no longer defined in the manifests
	rm -rf $(BUILD_DIR)/
	mkdir -p $(BUILD_DIR)/

Moreover, IMHO, when putting cluster-local-gateway in instance, the process of cluster-local-gateway is consistent with the process of other applications (kubeflow-issuer, metacontroller, kubeflow, ...) so it could be easier to maintain.
WDYT ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bobgy, I am sorry, I missed th commit of cluster-local-gateway/kustomization.taml. I think it is OK now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, consistency is a good point. Let me think about it more.

I was previously suggesting directly refer to upstream instead, because everything in instance folder needs user's manual maintenance (especially when upgrading), so if something doesn't need any customization, I preferred not putting it in instance folder.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if something doesn't need any customization, I preferred not putting it in instance folder.

Agree; that makes sense
But IMHO deleting build because we want to prune any resources which are no longer defined in the manifests makes also sense and I don't see any easy solution to do it.

Let me think about it more.

OK; thanks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jal06
I guess you did not understand how this works

kustomize build --load_restrictor none -o $(BUILD_DIR)/cluster-local-gateway $(MANIFESTS_DIR)/path/to/cluster-local-gateway

Note that the -o $(BUILD_DIR)/cluster-local-gateway part is the output, and the $(MANIFESTS_DIR)/path/to/cluster-local-gateway part is the input.

I suggested changing $(MANIFESTS_DIR)/path/to/cluster-local-gateway to ....upstream/path/to/cluster-local-gateway, the command will hydrate it everytime you apply to build folder. So that essentially means changing source of the hydration from instance to upstream, but build folder will always be hydrated each time. There won't be a problem there.

@Bobgy

Bobgy commented Dec 8, 2020

Copy link
Copy Markdown
Contributor

Let me clarify what the two steps are doing.

  • the step hydrate-kubeflow does a copy of the manifests from instance to upstream.

The step hydrate-kubeflow runs kustomize on instance/* folders (right now) and generates hydrated manifests to .build/*. (but it doesn't matter where the source of the kustomize folder lives, we can change each kustomize command)

  • The step apply-kubeflow does a copy of the manifests from upstream to .build before applying the manifests from .build

The step apply-kubeflow applies resources in .build/* to your cluster.

@Bobgy

Bobgy commented Dec 8, 2020

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@jal06

jal06 commented Dec 8, 2020

Copy link
Copy Markdown
Author

/retest

@jal06

jal06 commented Dec 9, 2020

Copy link
Copy Markdown
Author

/retest

1 similar comment
@jal06

jal06 commented Dec 9, 2020

Copy link
Copy Markdown
Author

/retest

@Bobgy

Bobgy commented Dec 11, 2020

Copy link
Copy Markdown
Contributor

presubmit test (not related to your change) is failing, I'll try to fix it next week
See #180

@jal06

jal06 commented Dec 15, 2020

Copy link
Copy Markdown
Author

/retest

@k8s-ci-robot

Copy link
Copy Markdown

@jal06: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-gcp-blueprints-presubmit d7b45c0 link /test kubeflow-gcp-blueprints-presubmit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jal06

jal06 commented Dec 15, 2020

Copy link
Copy Markdown
Author

@Bobgy,

Thanks for your explanations.
I think it is OK now (and I understand better how it works).

FYI, I did a test with Kubeflow 1.2.
In version 1.2, there is a new manifest in base_v3 subfolder for cluster-local-gateway (issue 1426 ).
So I also kustomized using kustomization.yaml base_v3.

Everything looks good but I get an error "error while evaluating the ingress spec: BackendConfig istio-system/iap-backendconfig is not valid: error retrieving secret kubeflow-oauth: secrets "kubeflow-oauth" not found; BackendConfig istio-system/iap-backendconfig is not valid: error retrieving secret kubeflow-oauth: secrets "kubeflow-oauth" not found"

Maybe this error is because I kustomized cluster-local-gateway from base_v3 ?
Do we need to kustomize from base or from base_v3 ?
WDYT ?

CaptureErrEnvoyIngress

@jal06

jal06 commented Dec 15, 2020

Copy link
Copy Markdown
Author

Humm, after a while (several minutes) the error above disappeared. Everything is green

@Bobgy

Bobgy commented Dec 16, 2020

Copy link
Copy Markdown
Contributor

Thanks! base_v3 is correct

@Bobgy

Bobgy commented Dec 16, 2020

Copy link
Copy Markdown
Contributor

I'll fix the test infra failure before merging this PR

@Bobgy

Bobgy commented Jan 7, 2021

Copy link
Copy Markdown
Contributor

Hi @jal06, I think the PR is not the idiomatic way to install cluster-local-gateway with istio/asm (anthos service mesh).

Because looking at https://github.com/knative/docs/blob/master/docs/install/installing-istio.md#installing-istio-without-sidecar-injection (strange that it's different from https://knative.dev/docs/install/installing-istio/#installing-istio-without-sidecar-injection),
there's a declarative configuration in IstioOperator that allows installing cluster-local-gateway:

  components:
    ingressGateways:
      - name: istio-ingressgateway
        enabled: true
      - name: cluster-local-gateway
        enabled: true
        label:
          istio: cluster-local-gateway
          app: cluster-local-gateway
        k8s:
          service:
            type: ClusterIP
            ports:
            - port: 15020
              name: status-port
            - port: 80
              targetPort: 8080
              name: http2
            - port: 443
              targetPort: 8443
              name: https

This seems to be the correct approach when installing istio using operator. Similarly, we are installing asm using the operator per definition in https://github.com/kubeflow/manifests/blob/master/gcp/v2/asm/istio-operator.yaml.
Therefore, we should include a similar ingressGateway definition in that operator. This will ensure the added cluster-local-gateway work with the entire service mesh.

and I suspect that installing cluster-local-gateway using kubeflow/manifests installed a gateway with old istio versions not designed for ASM, that's why it fails IAP authentication in your later experimentation.

@Bobgy

Bobgy commented Jan 7, 2021

Copy link
Copy Markdown
Contributor

but a problem I realized is that, asm installation is at 1.4 version currently, and in that version istio was using a different spec for istio operator definition file. I think we should probably first figure out asm upgrade path and directly use the latest format.

@jal06

jal06 commented Jan 8, 2021

Copy link
Copy Markdown
Author

@Bobgy, thanks for your feedback.
I am happy to know why my later experimentation fails with IAP authentication.

Do you think it will be possible to upgrade asm and set the idiomatic way to install cluster-local-gateway for the next version of Kubeflow (1.3 ) ?
I am not sure I could help for that (lack of expertise and time) but I would be more than happy to test it and use it when this will be available

About asm version, I confirm I am using 1.4.10 (the recommended version in the documentation).

BTW, when trying to debug kfserving, I realised that Kiali could help a lot; but Kiali is not installed by default. Is it possible to add Kiali by default in the manifests ?

istioctl version
client version: 1.4.10-asm.18
cluster-local-gateway version: 
cluster-local-gateway version: 
galley version: 1.4.10-asm.15
galley version: 1.4.10-asm.15
ingressgateway version: 1.4.10-asm.15
ingressgateway version: 1.4.10-asm.15
nodeagent version: 
nodeagent version: 
pilot version: 1.4.10-asm.15
pilot version: 1.4.10-asm.15
sidecar-injector version: 1.4.10-asm.15
sidecar-injector version: 1.4.10-asm.15
data plane version: 1.4.10-asm.15 (20 proxies), 1.1.6 (2 proxies)

@Bobgy

Bobgy commented Jan 13, 2021

Copy link
Copy Markdown
Contributor

Do you think it will be possible to upgrade asm and set the idiomatic way to install cluster-local-gateway for the next version of Kubeflow (1.3 ) ?

Yes, at least upgrading asm will be on the roadmap, and it should be a lot easier to add cluster-local-gateway by then

@Bobgy

Bobgy commented Jan 13, 2021

Copy link
Copy Markdown
Contributor

For Kiali, it sounds like an interesting tool to use. I haven't tried myself. In general, I think we'll only provide the minimal to let people run Kubeflow, they can choose to add tools they like

@jal06

jal06 commented Jan 13, 2021

Copy link
Copy Markdown
Author

@Bobgy, about Kiali, what you propose sounds relevant with the new policy of addons integration in Istio project : https://istio.io/latest/blog/2020/addon-rework/

Thanks !!!

@Bobgy

Bobgy commented Jan 23, 2021

Copy link
Copy Markdown
Contributor

Let's close this for now and I'll follow up during next release.

@edi-bice-by

Copy link
Copy Markdown
Contributor

@Bobgy: "Let's close this for now and I'll follow up during next release."

I deployed KF 1.3 using gcp-blueprints on GCP with IAP and am having a hard time trying to get anything kfserving (both from inside and outside the cluster) working.

Is the cluster-local-gateway still required?

@Bobgy

Bobgy commented May 19, 2021

Copy link
Copy Markdown
Contributor

/cc @zijianjoy

@zijianjoy

Copy link
Copy Markdown
Contributor

Is the cluster-local-gateway still required?

Hello @edi-bice-by , there are quite some changes on KFServing and its dependencies. KNative Serving has been upgraded to 0.22.0 to support KFServing, see requirement: https://github.com/kubeflow/kfserving#prerequisites. As the link described, Since Knative v0.19.0 cluster local gateway deployment has been removed and shared with ingress gateway, if you are on Knative version later than v0.19.0 and KFServing version older than v0.6.0 you should modify localGateway to knative-local-gateway and localGatewayService to knative-local-gateway.istio-system.svc.cluster.local in the inference service config.

@edi-bice-by

edi-bice-by commented May 25, 2021

Copy link
Copy Markdown
Contributor

@zijianjoy yes, thank you, I had already observed the difference in naming and modified accordingly, though was surprised that the blueprint which sources precisely just that combination described in the documentation is not doing the correct thing in the first place.

I'm still unable to query any of the kfserving gcp_iap samples (flower or sklearn) externally. More concerning, hence the post here, is that I cannot seem to be able to resolve hostnames (and query kfserving endpoints) from within the cluster either.

@zijianjoy

Copy link
Copy Markdown
Contributor

@edi-bice-by

Although it is out-of-date, you might be able to find some useful information about querying KFServing endpoint using this sample: https://github.com/kubeflow/kfserving/tree/master/docs/samples/gcp-iap. For my attempt to make this endpoint available externally, see #209 (comment).

More specifically, I made relatively big change in this file for authorization https://github.com/kubeflow/kubeflow/blob/master/docs/gke/iap_request.py. The way I made the change is by creating desktop app client ID and client secret: https://cloud.google.com/iap/docs/authentication-howto#authenticating_from_a_desktop_app.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants