Skip to content

Add support for command/args fields in container components #5768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

rm3l
Copy link
Member

@rm3l rm3l commented May 27, 2022

What type of PR is this:
/kind bug

What does this PR do / why we need it:
As discussed in #5648 (comment), the goal of this PR is to support command and args fields defined in container components by leveraging the fact that odo dev is a long-running client-side command.
At this time, Supervisord requires to override the container entrypoint, which is not compliant with the Devfile specification. Per the Devfile specification, we should use any command or args defined in the container component, or whatever command or entrypoint is available in the container image.
So the approach implemented here is to no longer run containers with Supervisord.
Instead, the pod is now created without the copy-supervisord init container, and the container component is created either with the command or args defined in the Devfile , or by not specifying any command, which means Kubernetes/OpenShift will use whatever is defined in the image.
Then subsequent run or debug Devfile commands are executed right away in the running container, in a separate goroutine:

  • the command process PID is stored in a dedicated file in the container, under /opt/odo/.odo_devfile_cmd_*.pid
  • the goroutine spawned can be stopped by killing that remote process in the container, which stops the command execution streaming connection to the container
  • the command status can be determined by inspecting this PID file

At the moment, most container images used in Devfile stacks define terminating commands, which requires defining explicit commands and/or args if we want to use them with the changes here.
See devfile/registry#102 and devfile/api#679

Because devfile/registry#102 is not merged yet on the Devfile side and to avoid being blocked by this, we decided [1] to implement a temporary hack that sets the container command to tail -f /dev/null if none was explicitly defined.

[1] #5768 (comment)

Which issue(s) this PR fixes:
Fixes #5648

PR acceptance criteria:

  • Unit test

  • Integration test

  • Documentation

How to test changes / Special notes to the reviewer:
odo dev should work as before, except that:

  • Supervisord should no longer be there in the pod.
  • the container command should be used as is if defined in the Devfile
  • the container command should be overridden with tail -f /dev/null if not set in the Devfile (purpose of the temporary hack depicted above)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. Required by Prow. label May 27, 2022
@openshift-ci
Copy link

openshift-ci bot commented May 27, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@netlify
Copy link

netlify bot commented May 27, 2022

Deploy Preview for odo-docusaurus-preview canceled.

Name Link
🔨 Latest commit 70137a1
🔍 Latest deploy log https://app.netlify.com/sites/odo-docusaurus-preview/deploys/62abad9f510863000878fd12

@openshift-ci openshift-ci bot added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue as a feature request. For PRs, that means that the PR is the implementation labels May 27, 2022
@rm3l rm3l changed the title [WIP] Add support for command/args fields in container components, via a "--no-supervisord" experimental flag to "odo dev" [WIP] Add support for command/args fields in container components, via a --no-supervisord experimental flag to "odo dev" May 27, 2022
@rm3l rm3l changed the title [WIP] Add support for command/args fields in container components, via a --no-supervisord experimental flag to "odo dev" [WIP] Add support for command/args fields in container components, via a --no-supervisord experimental flag to odo dev May 27, 2022
@odo-robot
Copy link

odo-robot bot commented May 27, 2022

Unit Tests on commit 31d1c7e finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 27, 2022

Kubernetes Tests on commit 31d1c7e finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 27, 2022

OpenShift Tests on commit 31d1c7e finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 27, 2022

Windows Tests (OCP) on commit 31d1c7e finished successfully.
View logs: TXT HTML

@odo-robot
Copy link

odo-robot bot commented May 27, 2022

Validate Tests on commit 31d1c7e finished successfully.
View logs: TXT HTML

@rm3l rm3l force-pushed the 5648-add-support-for-command-field-in-container-components branch from 2b3d7c3 to b46d279 Compare May 27, 2022 12:22
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. Required by Prow. label May 29, 2022
@rm3l rm3l force-pushed the 5648-add-support-for-command-field-in-container-components branch from cb0f061 to 02853b7 Compare May 30, 2022 07:43
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. Required by Prow. label May 30, 2022
@rm3l rm3l force-pushed the 5648-add-support-for-command-field-in-container-components branch 4 times, most recently from fbb4a9d to e1e9224 Compare May 31, 2022 07:46
@rm3l rm3l changed the title [WIP] Add support for command/args fields in container components, via a --no-supervisord experimental flag to odo dev Add support for command/args fields in container components, via a --no-supervisord experimental flag to odo dev May 31, 2022
@rm3l rm3l changed the title Add support for command/args fields in container components, via a --no-supervisord experimental flag to odo dev Add support for command/args fields in container components, via a --no-supervisord flag to odo dev May 31, 2022
@rm3l rm3l force-pushed the 5648-add-support-for-command-field-in-container-components branch from e1e9224 to 899ac0e Compare May 31, 2022 08:28
@rm3l rm3l marked this pull request as ready for review May 31, 2022 08:32
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. Required by Prow. label May 31, 2022
@openshift-ci openshift-ci bot requested a review from feloy May 31, 2022 08:32
rm3l added 5 commits June 16, 2022 09:20
…aiting for 1 sec

Now that the command is run via a goroutine,
there might be some situations where we were checking
the status just before the goroutine had a chance to start.
…andAndGetOutput

Rely on the same logic in ExecuteCommand
As commented out in [1], we don't store anymore the debug port value in the ENV file.

[1] redhat-developer#5768 (comment)
@rm3l
Copy link
Member Author

rm3l commented Jun 16, 2022

/hold I'm seeing some weird behavior especially with java-springboot.

with old approach I don't remember last time odo dev failed on me. But when testing this PR i got following error twice

▶ odo dev
  __
 /  \__     Developing using the my-java-springboot-app Devfile
 \__/  \    Namespace: demo
 /  \__/    odo version: v3.0.0-alpha3
 \__/

↪ Deploying to the cluster in developer mode
 ✓  Added storage m2 to my-java-springboot-app
 ✓  Searching resource in cluster es
 ✓  Creating kind ServiceBinding [58ms]
 ✓  Searching resource in cluster 
 ✓  Creating kind ServiceBinding [21ms]
 ◐  Waiting for Kubernetes resources^C

Cancelling deployment.
This is non-preemptive operation, it will wait for other tasks to finish first

 ✗  Waiting for Kubernetes resources [1m]
Cleaning resources, please wait
Failed to delete the "Deployment" resource: my-java-springboot-app-app
 ✗  failed to create the component: timeout while waiting for "my-java-springboot-app-app-747d44d4f7-hm8rq" pod to be deleted

I'm trying to debug it but so far I was not able to reproduce it with verbose logs enabled.

/hold

Did this happen right after running odo dev or later after a file change?
I remember I had a similar issue with ServiceBinding, with odo waiting for some resources, due to #5781, I think. Anyways, I have just rebased this branch onto main. I tested it again with a SpringBoot application and a binding in the Devfile.
Could you check again with this version?

@rm3l
Copy link
Member Author

rm3l commented Jun 16, 2022

/hold I'm seeing some weird behavior especially with java-springboot.
with old approach I don't remember last time odo dev failed on me. But when testing this PR i got following error twice

▶ odo dev
  __

✗ Waiting for Kubernetes resources [1m]

What is the total deployment time for other sessions that don't fail? It seems to me that 1m is pretty short. I remember getting this error when the image is too long to download

Is this timeout hardcoded in odo?

@feloy
Copy link
Contributor

feloy commented Jun 16, 2022

/hold I'm seeing some weird behavior especially with java-springboot.
with old approach I don't remember last time odo dev failed on me. But when testing this PR i got following error twice

▶ odo dev
  __

✗ Waiting for Kubernetes resources [1m]

What is the total deployment time for other sessions that don't fail? It seems to me that 1m is pretty short. I remember getting this error when the image is too long to download

Is this timeout hardcoded in odo?

I think it is the PushTimeout you can set in preferences

@rm3l rm3l force-pushed the 5648-add-support-for-command-field-in-container-components branch from 84c9ea6 to 520e3b3 Compare June 16, 2022 07:26
@rm3l
Copy link
Member Author

rm3l commented Jun 16, 2022

/hold I'm seeing some weird behavior especially with java-springboot.
with old approach I don't remember last time odo dev failed on me. But when testing this PR i got following error twice

▶ odo dev
  __

✗ Waiting for Kubernetes resources [1m]

What is the total deployment time for other sessions that don't fail? It seems to me that 1m is pretty short. I remember getting this error when the image is too long to download

Is this timeout hardcoded in odo?

I think it is the PushTimeout you can set in preferences

Okay, I see - thanks.

@kadel
Copy link
Member

kadel commented Jun 16, 2022

Did this happen right after running odo dev or later after a file change?
I remember I had a similar issue with ServiceBinding, with odo waiting for some resources, due to #5781, I think. Anyways, I have just rebased this branch onto main. I tested it again with a SpringBoot application and a binding in the Devfile.
Could you check again with this version?

It looks like my failure is not related to this PR.
Now it is consistently failing even on main, or with alpha3 release.

I'm debugging the problem, but it looks like something related to SBO
My Binding has a following status

  status:
    conditions:
    - lastTransitionTime: "2022-06-16T08:04:49Z"
      message: ""
      reason: DataCollected
      status: "True"
      type: CollectionReady
    - lastTransitionTime: "2022-06-16T08:04:49Z"
      message: Unable to retrieve ClusterWorkloadResourceMapping for type "apps/v1,
        Resource=deployments"
      reason: ApplicationNotFound
      status: "False"
      type: InjectionReady
    - lastTransitionTime: "2022-06-16T08:04:49Z"
      message: Unable to retrieve ClusterWorkloadResourceMapping for type "apps/v1,
        Resource=deployments"
      reason: ProcessingError
      status: "False"
      type: Ready

Even tho the application is defined correctly

 spec:
    application:
      group: apps
      name: myspring-app
      resource: deployments
      version: v1
▶ k get deployment myspring-app 
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
myspring-app   1/1     1            1           7s

It is still a problem.
Removing the hold as this is not related to this PR.
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. Required by Prow. label Jun 16, 2022
rm3l added 4 commits June 16, 2022 23:17
This package no longer depends on Devfile-related packages.
This boolean is tied to the given retry schedule,
so it makes sense for it to be passed with the schedule.
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 6 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@rm3l rm3l requested review from valaparthvi and feloy June 17, 2022 06:07
Copy link
Contributor

@valaparthvi valaparthvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code lgtm, thank you for patiently working on this!

// An error is returned for non-exec Devfile commands.
func devfileCommandToRemoteCmdDefinition(devfileCmd devfilev1.Command) (remotecmd.CommandDefinition, error) {
if devfileCmd.Exec == nil {
return remotecmd.CommandDefinition{}, errors.New(" only Exec commands are supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return remotecmd.CommandDefinition{}, errors.New(" only Exec commands are supported")
return remotecmd.CommandDefinition{}, errors.New("only Exec commands are supported")

envVars[e.Name] = e.Value
}

return remotecmd.CommandDefinition{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. Required by Prow. label Jun 17, 2022
@feloy
Copy link
Contributor

feloy commented Jun 17, 2022

/override ci/prow/unit
/override ci/prow/v4.10-integration-e2e

@openshift-ci
Copy link

openshift-ci bot commented Jun 17, 2022

@feloy: Overrode contexts on behalf of feloy: ci/prow/unit, ci/prow/v4.10-integration-e2e

In response to this:

/override ci/prow/unit
/override ci/prow/v4.10-integration-e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot merged commit 74516a9 into redhat-developer:main Jun 17, 2022
@rm3l rm3l mentioned this pull request Jun 20, 2022
5 tasks
cdrage pushed a commit to cdrage/odo that referenced this pull request Aug 31, 2022
…dhat-developer#5768)

* Introduce new 'pkg/remotecmd' package

This package allows to execute commands in remote packages
and exposes an interface for managing processes associated
to given Devfile commands.

* Rely on 'pkg/libdevfile' as much as possible for Devfile command execution

This requires passing a handler at the odo side,
which in turns uses the 'pkg/remotecmd' package to
run commands in remote containers.

* Switch to running without Supervisord as PID 1 in containers

To do this, the idea is to start the container component:
1- using the command/args defined in the Devfile
2- using whatever was defined in the container image
   if there is no command/args defined in the Devfile

Then, once the container is started, we would execute the Devfile
commands directly in the container component, just like a simple
'kubectl exec' command would do.
Since this is a long-running command (and potentially never ending),
we would need to run it in the background, i.e. in a side goroutine.

Point 2) above requires implementing a temporary hack (as discussed in [1]),
without us having to wait for [2] to be merged on the Devfile side.
This temporary hack overrides the container entrypoint with "tail -f /dev/null"
if the component defines no command or args (in which case we should have
used whatever is defined in the image, per the specification).

[1] redhat-developer#5768 (comment)
[2] devfile/registry#102

* Rename K8s adapter struct 'client' field into 'kubeClient', as suggested in review

* Rename sync adapter struct 'client' fields to better distinguish between them

* Make sure messages displayed to users running 'odo dev' are the same

* Update temporary hack log message

Co-authored-by: Philippe Martin <[email protected]>

* Make sure to handle process output line by line, for performance purposes

* Handle remote process output and errors in the Devfile command handler

The implementation in kubeexec.go should remain
as generic as possible

* Keep retrying remote process status until timeout, rather than just waiting for 1 sec

Now that the command is run via a goroutine,
there might be some situations where we were checking
the status just before the goroutine had a chance to start.

* Handle remote process output and errors in the Devfile command handler

The implementation in kubeexec.go should remain
as generic as possible

* Update kubeexec StopProcessForCommand implementation such that it relies on /proc to kill the parent children processes

* Ignore missing children file in getProcessChildren

* Unit-test methods in kubexec.go

* Fix missing logs when build command does not pass when running 'odo dev'

Also add integration test case

* Fix spinner status when commands passed to exec_handler do not pass

* Make sure to check process status right after stopping it

The process just stopped might take longer to exit (it might have caught
the signal and is performing additional cleanup)

* Keep retrying remote process status until timeout, rather than just waiting for 1 sec

Now that the command is run via a goroutine,
there might be some situations where we were checking
the status just before the goroutine had a chance to start.

* Fix potential deadlock when reading output from remotecmd#ExecuteCommandAndGetOutput

Rely on the same logic in ExecuteCommand

* Add more unit tests

* Remove block that used to check debug port from env info

As commented out in [1], we don't store anymore the debug port value in the ENV file.

[1] redhat-developer#5768 (comment)

* Rename 'getCommandFromFlag' into 'getCommandByName', as suggested in review

* Make remotecmd package more generic

This package no longer depends on Devfile-related packages.

* Fix comments in libdevfile.go

* Move errorIfTimeout struct field as parameter of RetryWithSchedule

This boolean is tied to the given retry schedule,
so it makes sense for it to be passed with the schedule.

* Expose a single ExecuteCommand function that returns both stdout and stderr

Co-authored-by: Philippe Martin <[email protected]>
@dharmit dharmit mentioned this pull request Sep 9, 2022
3 tasks
@rm3l rm3l deleted the 5648-add-support-for-command-field-in-container-components branch October 27, 2022 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. Required by Prow. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. Required by Prow.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add support for command field in container components
5 participants