-
Notifications
You must be signed in to change notification settings - Fork 243
odo push recreates the pod if scaled to 0 #5315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
odo push recreates the pod if scaled to 0 #5315
Conversation
✔️ Deploy Preview for odo-docusaurus-preview ready! 🔨 Explore the source changes: 0ebe3b5 🔍 Inspect the deploy log: https://app.netlify.com/sites/odo-docusaurus-preview/deploys/61e27e0ce9a14200073be266 😎 Browse the preview: https://deploy-preview-5315--odo-docusaurus-preview.netlify.app |
@@ -529,7 +529,7 @@ func (a *Adapter) createOrUpdateComponent(componentExists bool, ei envinfo.EnvSp | |||
if err != nil { | |||
return err | |||
} | |||
|
|||
deployment.Spec.Replicas = pointer.Int32Ptr(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this field set when the Deployment is first created, by the devfile library. Could we propose a path to the library to add the Replicas to the generator.DeploymentParams
structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By setting it at library level, it becomes applicable to Che folks as well. I'm not sure if that's what they would want. Otherwise, I think it could be patched at library level.
IMO, we could track this separately (issue on devfile repo or a discussion in the Devfile cabal). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through the generator.DeploymentParams
structure, it could be an optional parameter, to not impact other library's users.
|
||
When("the Deployment's Replica count (pods) is scaled to 0", func() { | ||
BeforeEach(func() { | ||
commonVar.CliRunner.ScalePodToZero(cmpName, "app") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will need to wait for the pod to be terminated before to continue, or the scale command won't have the time to have any effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -127,7 +126,7 @@ func (a Adapter) Push(parameters common.PushParameters) (err error) { | |||
var podName string | |||
|
|||
// If the component already exists, retrieve the pod's name before it's potentially updated | |||
if componentExists { | |||
if componentExists && *a.deployment.Spec.Replicas == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very specific to this issue, and is not generic enough to cover all the cases when a pod is not present.
I made some more reading of this Push() function, and understand now that this call to getPod() is necessary to get the name of the previous pod (the one we synced files to previously), and to compare it later (https://github.com/redhat-developer/odo/pull/5315/files#diff-674dc676ed2f8aa2588ede2db4112fe412bfa0365f7b0f7303d5ae407582d2aaL271) with the pod we will sync the files to.
If the pod is restarted, all the files need to be synced; in the other case, only changed files need to be synced.
Instead of making the specific test on Deployment Replicas, you could react differently if getPod
in line 131 fails. Instead of returning an error, you could continue and make sure that podChanged
will be set to true (as it is sure we will need to sync all the files again), for example by keeping podName
to the empty string.
Also, could you please change the variable name componentExists
to deploymentExists
, to make the flow easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some more reading of this Push() function, and understand now that this call to getPod() is necessary to get the name of the previous pod (the one we synced files to previously), and to compare it later (https://github.com/redhat-developer/odo/pull/5315/files#diff-674dc676ed2f8aa2588ede2db4112fe412bfa0365f7b0f7303d5ae407582d2aaL271) with the pod we will sync the files to.
Yes, that's what I was saying while we were discussing this yesterday. But I was struggling to explain. :)
Instead of making the specific test on Deployment Replicas, you could react differently if
getPod
in line 131 fails. Instead of returning an error, you could continue and make sure thatpodChanged
will be set to true (as it is sure we will need to sync all the files again), for example by keepingpodName
to the empty string.
But for that to happen, the user will have to wait for whatever the value of PushTimeout
is in odo preferences. By default, it is 4 minutes. Also, getPod()
function makes a call to WaitAndGetPodWithEvents
in pkg/kclient/pods.go
which mandates the desired phase of the pod to be Running
:
Lines 25 to 27 in 724f16e
// WaitAndGetPod block and waits until pod matching selector is in in Running state | |
// desiredPhase cannot be PodFailed or PodUnknown | |
func (c *Client) WaitAndGetPodWithEvents(selector string, desiredPhase corev1.PodPhase, waitMessage string, pushTimeout time.Duration) (*corev1.Pod, error) { |
Now, I know that we can make change to the code to not mandate it. But I'm honestly having a hard time and would appreciate some help. Using the debugger, I can't seem to be able to get inside below go routine:
Lines 46 to 95 in 724f16e
// Here we are going to start a loop watching for the pod status | |
podChannel := make(chan *corev1.Pod) | |
watchErrorChannel := make(chan error) | |
failedEvents := make(map[string]corev1.Event) | |
go func() { | |
loop: | |
for { | |
val, ok := <-w.ResultChan() | |
if !ok { | |
watchErrorChannel <- errors.New("watch channel was closed") | |
break loop | |
} | |
if e, ok := val.Object.(*corev1.Pod); ok { | |
klog.V(3).Infof("Status of %s pod is %s", e.Name, e.Status.Phase) | |
for _, cond := range e.Status.Conditions { | |
// using this just for debugging message, so ignoring error on purpose | |
jsonCond, _ := json.Marshal(cond) | |
klog.V(3).Infof("Pod Conditions: %s", string(jsonCond)) | |
} | |
for _, status := range e.Status.ContainerStatuses { | |
// using this just for debugging message, so ignoring error on purpose | |
jsonStatus, _ := json.Marshal(status) | |
klog.V(3).Infof("Container Status: %s", string(jsonStatus)) | |
} | |
switch e.Status.Phase { | |
case desiredPhase: | |
klog.V(3).Infof("Pod %s is %v", e.Name, desiredPhase) | |
podChannel <- e | |
break loop | |
case corev1.PodFailed, corev1.PodUnknown: | |
watchErrorChannel <- errors.Errorf("pod %s status %s", e.Name, e.Status.Phase) | |
break loop | |
default: | |
// we start in a phase different from the desired one, let's wait | |
if spinner == nil { | |
spinner = log.Spinner(waitMessage) | |
// Collect all the events in a separate go routine | |
quit := make(chan int) | |
go c.CollectEvents(selector, failedEvents, spinner, quit) | |
defer close(quit) | |
} | |
} | |
} else { | |
watchErrorChannel <- errors.New("unable to convert event object to Pod") | |
break loop | |
} | |
} | |
close(podChannel) | |
close(watchErrorChannel) | |
}() |
Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager.
`podName` was being fetched long before its use. This commit fetches it just before it is needed.
@dharmit: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: feloy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherrypick v2 |
1 similar comment
/cherrypick v2 |
@dharmit: new pull request created: #5360 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* Use SSA for replicas Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager. * Fetches pod name just before it is needed `podName` was being fetched long before its use. This commit fetches it just before it is needed. * Fix shadow issue * Tests should wait for pod termination * Get pod name only if it exists * Check if only one pod exists for the component * Fix shadow error * Expect error in oc get pod output for oc
* Use SSA for replicas Even though the default value for "Replicas" is 1, we explicitly set it so that the field for replicas is tracked by SSA and odo is made its manager. * Fetches pod name just before it is needed `podName` was being fetched long before its use. This commit fetches it just before it is needed. * Fix shadow issue * Tests should wait for pod termination * Get pod name only if it exists * Check if only one pod exists for the component * Fix shadow error * Expect error in oc get pod output for oc
What type of PR is this?
/kind bug
What does this PR do / why we need it:
Refer issue being fixed
Which issue(s) this PR fixes:
Fixes #5041
PR acceptance criteria:
Unit test
Integration test
Documentation
I have read the test guidelines
How to test changes / Special notes to the reviewer:
Refer issue description of original issue for steps to test changes.