Closed
Description
Description
Replace Argo with a custom DAG manager in the Cortex operator.
Motivation
Argo isn't primarily designed for our use case. For example, creating two k8s resources in one Argo job required hacking (and still has issues, e.g. readiness checks). Also, spinning up one pod per k8s resource is wasteful.
Also, this would dramatically help with EKS pod limits (see #219)
Notes
- The DAG is already implicitly stored in the context (it can be re-calculated at any time)
- The DAG should survive mid-deployment operator restart