odo push recreates the pod if scaled to 0 #5315

dharmit · 2022-01-05T06:22:15Z

What type of PR is this?
/kind bug

What does this PR do / why we need it:
Refer issue being fixed

Which issue(s) this PR fixes:

PR acceptance criteria:

Unit test
Integration test
Documentation
I have read the test guidelines

How to test changes / Special notes to the reviewer:
Refer issue description of original issue for steps to test changes.

netlify · 2022-01-05T06:22:20Z

✔️ Deploy Preview for odo-docusaurus-preview ready!

🔨 Explore the source changes: 0ebe3b5

🔍 Inspect the deploy log: https://app.netlify.com/sites/odo-docusaurus-preview/deploys/61e27e0ce9a14200073be266

😎 Browse the preview: https://deploy-preview-5315--odo-docusaurus-preview.netlify.app

odo-robot · 2022-01-05T06:29:15Z

OpenShift Tests on commit 5f432de finished successfully.
View logs: TXT HTML

odo-robot · 2022-01-05T06:37:20Z

Kubernetes Tests on commit 5f432de finished successfully.
View logs: TXT HTML

odo-robot · 2022-01-05T06:38:56Z

Unit Tests on commit 5f432de finished successfully.
View logs: TXT HTML

feloy · 2022-01-05T12:50:53Z

pkg/devfile/adapters/kubernetes/component/adapter.go

@@ -529,7 +529,7 @@ func (a *Adapter) createOrUpdateComponent(componentExists bool, ei envinfo.EnvSp
 	if err != nil {
 		return err
 	}
-
+	deployment.Spec.Replicas = pointer.Int32Ptr(1)


It would be nice to have this field set when the Deployment is first created, by the devfile library. Could we propose a path to the library to add the Replicas to the generator.DeploymentParams structure?

By setting it at library level, it becomes applicable to Che folks as well. I'm not sure if that's what they would want. Otherwise, I think it could be patched at library level.

IMO, we could track this separately (issue on devfile repo or a discussion in the Devfile cabal). WDYT?

Through the generator.DeploymentParams structure, it could be an optional parameter, to not impact other library's users.

feloy · 2022-01-05T12:52:06Z

tests/integration/devfile/cmd_devfile_push_test.go

+
+			When("the Deployment's Replica count (pods) is scaled to 0", func() {
+				BeforeEach(func() {
+					commonVar.CliRunner.ScalePodToZero(cmpName, "app")


I think you will need to wait for the pod to be terminated before to continue, or the scale command won't have the time to have any effect

feloy · 2022-01-12T15:44:24Z

pkg/devfile/adapters/kubernetes/component/adapter.go

@@ -127,7 +126,7 @@ func (a Adapter) Push(parameters common.PushParameters) (err error) {
 	var podName string

 	// If the component already exists, retrieve the pod's name before it's potentially updated
-	if componentExists {
+	if componentExists && *a.deployment.Spec.Replicas == 1 {


This is very specific to this issue, and is not generic enough to cover all the cases when a pod is not present.

I made some more reading of this Push() function, and understand now that this call to getPod() is necessary to get the name of the previous pod (the one we synced files to previously), and to compare it later (https://github.com/redhat-developer/odo/pull/5315/files#diff-674dc676ed2f8aa2588ede2db4112fe412bfa0365f7b0f7303d5ae407582d2aaL271) with the pod we will sync the files to.

If the pod is restarted, all the files need to be synced; in the other case, only changed files need to be synced.

Instead of making the specific test on Deployment Replicas, you could react differently if getPod in line 131 fails. Instead of returning an error, you could continue and make sure that podChangedwill be set to true (as it is sure we will need to sync all the files again), for example by keeping podName to the empty string.

Also, could you please change the variable name componentExists to deploymentExists, to make the flow easier to understand?

I made some more reading of this Push() function, and understand now that this call to getPod() is necessary to get the name of the previous pod (the one we synced files to previously), and to compare it later (https://github.com/redhat-developer/odo/pull/5315/files#diff-674dc676ed2f8aa2588ede2db4112fe412bfa0365f7b0f7303d5ae407582d2aaL271) with the pod we will sync the files to.

Yes, that's what I was saying while we were discussing this yesterday. But I was struggling to explain. :)

Instead of making the specific test on Deployment Replicas, you could react differently if getPod in line 131 fails. Instead of returning an error, you could continue and make sure that podChangedwill be set to true (as it is sure we will need to sync all the files again), for example by keeping podName to the empty string.

But for that to happen, the user will have to wait for whatever the value of PushTimeout is in odo preferences. By default, it is 4 minutes. Also, getPod() function makes a call to WaitAndGetPodWithEvents in pkg/kclient/pods.go which mandates the desired phase of the pod to be Running:

odo/pkg/kclient/pods.go

Lines 25 to 27 in 724f16e

// WaitAndGetPod block and waits until pod matching selector is in in Running state

// desiredPhase cannot be PodFailed or PodUnknown

func (c *Client) WaitAndGetPodWithEvents(selector string, desiredPhase corev1.PodPhase, waitMessage string, pushTimeout time.Duration) (*corev1.Pod, error) {

Now, I know that we can make change to the code to not mandate it. But I'm honestly having a hard time and would appreciate some help. Using the debugger, I can't seem to be able to get inside below go routine:

odo/pkg/kclient/pods.go

Lines 46 to 95 in 724f16e

// Here we are going to start a loop watching for the pod status

podChannel := make(chan *corev1.Pod)

watchErrorChannel := make(chan error)

failedEvents := make(map[string]corev1.Event)

go func() {

loop:

for {

val, ok := <-w.ResultChan()

if !ok {

watchErrorChannel <- errors.New("watch channel was closed")

break loop

}

if e, ok := val.Object.(*corev1.Pod); ok {

klog.V(3).Infof("Status of %s pod is %s", e.Name, e.Status.Phase)

for _, cond := range e.Status.Conditions {

// using this just for debugging message, so ignoring error on purpose

jsonCond, _ := json.Marshal(cond)

klog.V(3).Infof("Pod Conditions: %s", string(jsonCond))

}

for _, status := range e.Status.ContainerStatuses {

// using this just for debugging message, so ignoring error on purpose

jsonStatus, _ := json.Marshal(status)

klog.V(3).Infof("Container Status: %s", string(jsonStatus))

}

switch e.Status.Phase {

case desiredPhase:

klog.V(3).Infof("Pod %s is %v", e.Name, desiredPhase)

podChannel <- e

break loop

case corev1.PodFailed, corev1.PodUnknown:

watchErrorChannel <- errors.Errorf("pod %s status %s", e.Name, e.Status.Phase)

break loop

default:

// we start in a phase different from the desired one, let's wait

if spinner == nil {

spinner = log.Spinner(waitMessage)

// Collect all the events in a separate go routine

quit := make(chan int)

go c.CollectEvents(selector, failedEvents, spinner, quit)

defer close(quit)

}

}

} else {

watchErrorChannel <- errors.New("unable to convert event object to Pod")

break loop

}

}

close(podChannel)

close(watchErrorChannel)

}()

Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager.

`podName` was being fetched long before its use. This commit fetches it just before it is needed.

openshift-ci · 2022-01-15T11:56:48Z

@dharmit: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/psi-unit-test-windows	`0ebe3b5`	link	false	`/test psi-unit-test-windows`
ci/prow/psi-unit-test-mac	`0ebe3b5`	link	false	`/test psi-unit-test-mac`
ci/prow/psi-kubernetes-integration-e2e	`0ebe3b5`	link	false	`/test psi-kubernetes-integration-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

feloy · 2022-01-17T10:21:06Z

/lgtm
/approve

openshift-ci · 2022-01-17T10:21:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feloy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [feloy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dharmit · 2022-01-18T06:32:47Z

/cherrypick v2

dharmit · 2022-01-19T05:10:27Z

/cherrypick v2

openshift-cherrypick-robot · 2022-01-19T05:11:22Z

@dharmit: new pull request created: #5360

In response to this:

/cherrypick v2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

* Use SSA for replicas Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager. * Fetches pod name just before it is needed `podName` was being fetched long before its use. This commit fetches it just before it is needed. * Fix shadow issue * Tests should wait for pod termination * Get pod name only if it exists * Check if only one pod exists for the component * Fix shadow error * Expect error in oc get pod output for oc

dharmit requested review from feloy and valaparthvi January 5, 2022 06:22

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 5, 2022

openshift-ci bot requested a review from mohammedzee1000 January 5, 2022 06:22

feloy reviewed Jan 5, 2022

View reviewed changes

feloy suggested changes Jan 12, 2022

View reviewed changes

openshift-ci bot assigned feloy Jan 12, 2022

kadel mentioned this pull request Jan 13, 2022

devfile/library add replicas to generator.DeploymentParams structure #5347

Closed

dharmit added 8 commits January 13, 2022 19:30

Use SSA for replicas

d05bc39

Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager.

Fetches pod name just before it is needed

2e7cf81

`podName` was being fetched long before its use. This commit fetches it just before it is needed.

Fix shadow issue

ebcb9dd

Tests should wait for pod termination

0fe40d4

Get pod name only if it exists

2ae5286

Check if only one pod exists for the component

44e4288

Fix shadow error

0139e89

Expect error in oc get pod output for oc

0ebe3b5

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. Required by Prow. label Jan 17, 2022

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. Required by Prow. label Jan 17, 2022

openshift-merge-robot merged commit 43444fd into redhat-developer:main Jan 17, 2022

openshift-cherrypick-robot mentioned this pull request Jan 19, 2022

[v2] odo push recreates the pod if scaled to 0 #5360

Merged

dharmit mentioned this pull request Jan 20, 2022

[v2 cherrypicking] Undeploy error #5365

Merged

3 tasks

dharmit deleted the fix-5041 branch February 4, 2022 07:21

	// WaitAndGetPod block and waits until pod matching selector is in in Running state
	// desiredPhase cannot be PodFailed or PodUnknown
	func (c Client) WaitAndGetPodWithEvents(selector string, desiredPhase corev1.PodPhase, waitMessage string, pushTimeout time.Duration) (corev1.Pod, error) {

	// Here we are going to start a loop watching for the pod status
	podChannel := make(chan *corev1.Pod)
	watchErrorChannel := make(chan error)
	failedEvents := make(map[string]corev1.Event)
	go func() {
	loop:
	for {
	val, ok := <-w.ResultChan()
	if !ok {
	watchErrorChannel <- errors.New("watch channel was closed")
	break loop
	}
	if e, ok := val.Object.(*corev1.Pod); ok {
	klog.V(3).Infof("Status of %s pod is %s", e.Name, e.Status.Phase)
	for _, cond := range e.Status.Conditions {
	// using this just for debugging message, so ignoring error on purpose
	jsonCond, _ := json.Marshal(cond)
	klog.V(3).Infof("Pod Conditions: %s", string(jsonCond))
	}
	for _, status := range e.Status.ContainerStatuses {
	// using this just for debugging message, so ignoring error on purpose
	jsonStatus, _ := json.Marshal(status)
	klog.V(3).Infof("Container Status: %s", string(jsonStatus))
	}
	switch e.Status.Phase {
	case desiredPhase:
	klog.V(3).Infof("Pod %s is %v", e.Name, desiredPhase)
	podChannel <- e
	break loop
	case corev1.PodFailed, corev1.PodUnknown:
	watchErrorChannel <- errors.Errorf("pod %s status %s", e.Name, e.Status.Phase)
	break loop
	default:
	// we start in a phase different from the desired one, let's wait
	if spinner == nil {
	spinner = log.Spinner(waitMessage)
	// Collect all the events in a separate go routine
	quit := make(chan int)
	go c.CollectEvents(selector, failedEvents, spinner, quit)
	defer close(quit)
	}
	}
	} else {
	watchErrorChannel <- errors.New("unable to convert event object to Pod")
	break loop
	}
	}
	close(podChannel)
	close(watchErrorChannel)
	}()

odo push recreates the pod if scaled to 0 #5315

odo push recreates the pod if scaled to 0 #5315

Uh oh!

Conversation

dharmit commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odo-robot bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odo-robot bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odo-robot bot commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feloy Jan 5, 2022

Choose a reason for hiding this comment

Uh oh!

dharmit Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

feloy Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

feloy Jan 5, 2022

Choose a reason for hiding this comment

Uh oh!

dharmit Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

feloy Jan 12, 2022

Choose a reason for hiding this comment

Uh oh!

dharmit Jan 13, 2022

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jan 15, 2022

Uh oh!

feloy commented Jan 17, 2022

Uh oh!

openshift-ci bot commented Jan 17, 2022

Uh oh!

dharmit commented Jan 18, 2022

Uh oh!

dharmit commented Jan 19, 2022

Uh oh!

openshift-cherrypick-robot commented Jan 19, 2022

Uh oh!

Uh oh!

dharmit commented Jan 5, 2022 •

edited

Loading

netlify bot commented Jan 5, 2022 •

edited

Loading

odo-robot bot commented Jan 5, 2022 •

edited

Loading

odo-robot bot commented Jan 5, 2022 •

edited

Loading

odo-robot bot commented Jan 5, 2022 •

edited

Loading