Skip to content

Resources not allocated to Spark workloads to generate training datasets #56

Closed
@vishalbollu

Description

@vishalbollu

Description

Failed to create training datasets.

To Reproduce

Reproducible in many of the examples, but easiest to reproduce in Poker.

  1. Define only the app, environment and raw columns in YAML
  2. cx deploy to process the raw columns
  3. Define a model that uses the raw columns
  4. cx deploy

Stack Trace

cx logs dnn/training_dataset

Failed to start:

time="2019-04-18T20:12:09Z" level=info msg="Creating a docker executor"
time="2019-04-18T20:12:09Z" level=info msg="Executor (version: v2.2.1, build_date: 2018-10-11T16:27:29Z) initialized with template:\narchiveLocation: {}\ninputs: {}\nmetadata:\n  labels:\n    appName: recommendations\n    argo: \"true\"\n    workloadID: bord6fnh1lma5hyn8my3\n    workloadType: data-job\nname: bord6fnh1lma5hyn8my3\noutputs: {}\nresource:\n  action: create\n  failureCondition: status.applicationState.state in (FAILED,SUBMISSION_FAILED,UNKNOWN)\n  manifest: |-\n    {\n      \"kind\": \"SparkApplication\",\n      \"apiVersion\": \"sparkoperator.k8s.io/v1alpha1\",\n      \"metadata\": {\n        \"name\": \"bord6fnh1lma5hyn8my3\",\n        \"namespace\": \"cortex\",\n        \"creationTimestamp\": null,\n        \"labels\": {\n          \"appName\": \"recommendations\",\n          \"workloadID\": \"bord6fnh1lma5hyn8my3\",\n          \"workloadType\": \"data-job\"\n        },\n        \"ownerReferences\": [\n          {\n            \"apiVersion\": \"argoproj.io/v1alpha1\",\n            \"kind\": \"Workflow\",\n            \"name\": \"argo-recommendations-rplw6\",\n            \"uid\": \"3dca2989-6216-11e9-aaf1-02cc01957708\",\n            \"blockOwnerDeletion\": false\n          }\n        ]\n      },\n      \"spec\": {\n        \"type\": \"Python\",\n        \"mode\": \"cluster\",\n        \"image\": \"969758392368.dkr.ecr.us-west-2.amazonaws.com/cortexlabs/spark:latest\",\n        \"imagePullPolicy\": \"Always\",\n        \"mainApplicationFile\": \"local:///src/spark_job/spark_job.py\",\n        \"arguments\": [\n          \"--workload-id=bord6fnh1lma5hyn8my3 --context=s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack --cache-dir=/mnt/context --raw-columns= --aggregates= --transformed-columns= --training-datasets=d6c73248656984e3d08a6165cd3b34de27253021cb94c232526f96776999d73\"\n        ],\n        \"driver\": {\n          \"cores\": 0,\n          \"memory\": \"0k\",\n          \"envVars\": {\n            \"CORTEX_CACHE_DIR\": \"/mnt/context\",\n            \"CORTEX_CONTEXT_S3_PATH\": \"s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack\",\n            \"CORTEX_SPARK_VERBOSITY\": \"WARN\",\n            \"CORTEX_WORKLOAD_ID\": \"bord6fnh1lma5hyn8my3\"\n          },\n          \"envSecretKeyRefs\": {\n            \"AWS_ACCESS_KEY_ID\": {\n              \"name\": \"aws-credentials\",\n              \"key\": \"AWS_ACCESS_KEY_ID\"\n            },\n            \"AWS_SECRET_ACCESS_KEY\": {\n              \"name\": \"aws-credentials\",\n              \"key\": \"AWS_SECRET_ACCESS_KEY\"\n            }\n          },\n          \"labels\": {\n            \"appName\": \"recommendations\",\n            \"userFacing\": \"true\",\n            \"workloadID\": \"bord6fnh1lma5hyn8my3\",\n            \"workloadType\": \"data-job\"\n          },\n          \"podName\": \"bord6fnh1lma5hyn8my3\",\n          \"serviceAccount\": \"spark\"\n        },\n        \"executor\": {\n          \"cores\": 0,\n          \"memory\": \"0k\",\n          \"envVars\": {\n            \"CORTEX_CACHE_DIR\": \"/mnt/context\",\n            \"CORTEX_CONTEXT_S3_PATH\": \"s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack\",\n            \"CORTEX_SPARK_VERBOSITY\": \"WARN\",\n            \"CORTEX_WORKLOAD_ID\": \"bord6fnh1lma5hyn8my3\"\n          },\n          \"envSecretKeyRefs\": {\n            \"AWS_ACCESS_KEY_ID\": {\n              \"name\": \"aws-credentials\",\n              \"key\": \"AWS_ACCESS_KEY_ID\"\n            },\n            \"AWS_SECRET_ACCESS_KEY\": {\n              \"name\": \"aws-credentials\",\n              \"key\": \"AWS_SECRET_ACCESS_KEY\"\n            }\n          },\n          \"labels\": {\n            \"appName\": \"recommendations\",\n            \"workloadID\": \"bord6fnh1lma5hyn8my3\",\n            \"workloadType\": \"data-job\"\n          },\n          \"instances\": 0\n        },\n        \"deps\": {\n          \"pyFiles\": [\n            \"local:///src/spark_job/spark_util.py\",\n            \"local:///src/lib/*.py\"\n          ]\n        },\n        \"restartPolicy\": {\n          \"type\": \"Never\"\n        },\n        \"pythonVersion\": \"3\"\n      },\n      \"status\": {\n        \"lastSubmissionAttemptTime\": null,\n        \"completionTime\": null,\n        \"driverInfo\": {},\n        \"applicationState\": {\n          \"state\": \"\",\n          \"errorMessage\": \"\"\n        }\n      }\n    }\n  successCondition: status.applicationState.state in (COMPLETED)\n"
time="2019-04-18T20:12:09Z" level=info msg="Loading manifest to /tmp/manifest.yaml"
time="2019-04-18T20:12:09Z" level=info msg="kubectl create -f /tmp/manifest.yaml -o name"
time="2019-04-18T20:12:10Z" level=fatal msg="The SparkApplication \"bord6fnh1lma5hyn8my3\" is invalid: []: Invalid value: map[string]interface {}{\"apiVersion\":\"sparkoperator.k8s.io/v1alpha1\", \"kind\":\"SparkApplication\", \"metadata\":map[string]interface {}{\"name\":\"bord6fnh1lma5hyn8my3\", \"namespace\":\"cortex\", \"creationTimestamp\":\"2019-04-18T20:12:09Z\", \"labels\":map[string]interface {}{\"workloadID\":\"bord6fnh1lma5hyn8my3\", \"workloadType\":\"data-job\", \"appName\":\"recommendations\"}, \"ownerReferences\":[]interface {}{map[string]interface {}{\"apiVersion\":\"argoproj.io/v1alpha1\", \"kind\":\"Workflow\", \"name\":\"argo-recommendations-rplw6\", \"uid\":\"3dca2989-6216-11e9-aaf1-02cc01957708\", \"blockOwnerDeletion\":false}}, \"generation\":1, \"uid\":\"3f1c787f-6216-11e9-aaf1-02cc01957708\", \"selfLink\":\"\"}, \"spec\":map[string]interface {}{\"image\":\"969758392368.dkr.ecr.us-west-2.amazonaws.com/cortexlabs/spark:latest\", \"mainApplicationFile\":\"local:///src/spark_job/spark_job.py\", \"mode\":\"cluster\", \"restartPolicy\":map[string]interface {}{\"type\":\"Never\"}, \"type\":\"Python\", \"driver\":map[string]interface {}{\"serviceAccount\":\"spark\", \"cores\":0, \"envSecretKeyRefs\":map[string]interface {}{\"AWS_ACCESS_KEY_ID\":map[string]interface {}{\"key\":\"AWS_ACCESS_KEY_ID\", \"name\":\"aws-credentials\"}, \"AWS_SECRET_ACCESS_KEY\":map[string]interface {}{\"key\":\"AWS_SECRET_ACCESS_KEY\", \"name\":\"aws-credentials\"}}, \"envVars\":map[string]interface {}{\"CORTEX_CACHE_DIR\":\"/mnt/context\", \"CORTEX_CONTEXT_S3_PATH\":\"s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack\", \"CORTEX_SPARK_VERBOSITY\":\"WARN\", \"CORTEX_WORKLOAD_ID\":\"bord6fnh1lma5hyn8my3\"}, \"labels\":map[string]interface {}{\"appName\":\"recommendations\", \"userFacing\":\"true\", \"workloadID\":\"bord6fnh1lma5hyn8my3\", \"workloadType\":\"data-job\"}, \"memory\":\"0k\", \"podName\":\"bord6fnh1lma5hyn8my3\"}, \"deps\":map[string]interface {}{\"pyFiles\":[]interface {}{\"local:///src/spark_job/spark_util.py\", \"local:///src/lib/*.py\"}}, \"executor\":map[string]interface {}{\"envVars\":map[string]interface {}{\"CORTEX_CACHE_DIR\":\"/mnt/context\", \"CORTEX_CONTEXT_S3_PATH\":\"s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack\", \"CORTEX_SPARK_VERBOSITY\":\"WARN\", \"CORTEX_WORKLOAD_ID\":\"bord6fnh1lma5hyn8my3\"}, \"instances\":0, \"labels\":map[string]interface {}{\"appName\":\"recommendations\", \"workloadID\":\"bord6fnh1lma5hyn8my3\", \"workloadType\":\"data-job\"}, \"memory\":\"0k\", \"cores\":0, \"envSecretKeyRefs\":map[string]interface {}{\"AWS_ACCESS_KEY_ID\":map[string]interface {}{\"key\":\"AWS_ACCESS_KEY_ID\", \"name\":\"aws-credentials\"}, \"AWS_SECRET_ACCESS_KEY\":map[string]interface {}{\"name\":\"aws-credentials\", \"key\":\"AWS_SECRET_ACCESS_KEY\"}}}, \"imagePullPolicy\":\"Always\", \"pythonVersion\":\"3\", \"arguments\":[]interface {}{\"--workload-id=bord6fnh1lma5hyn8my3 --context=s3://cortex-cluster-vishal/apps/recommendations/contexts/9063143c7366987a974e14c07bf21c40bed64e03a3aa0fed55a670c7756e317.msgpack --cache-dir=/mnt/context --raw-columns= --aggregates= --transformed-columns= --training-datasets=d6c73248656984e3d08a6165cd3b34de27253021cb94c232526f96776999d73\"}}, \"status\":map[string]interface {}{\"lastSubmissionAttemptTime\":interface {}(nil), \"applicationState\":map[string]interface {}{\"errorMessage\":\"\", \"state\":\"\"}, \"completionTime\":interface {}(nil), \"driverInfo\":map[string]interface {}{}}}: validation failure list:\nspec.driver.cores in body should be greater than 0\nspec.executor.instances in body should be greater than or equal to 1\nspec.executor.cores in body should be greater than 0\ngithub.com/argoproj/argo/errors.New\n\t/root/go/src/github.com/argoproj/argo/errors/errors.go:48\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).ExecResource\n\t/root/go/src/github.com/argoproj/argo/workflow/executor/resource.go:36\ngithub.com/argoproj/argo/cmd/argoexec/commands.execResource\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:38\ngithub.com/argoproj/argo/cmd/argoexec/commands.glob..func2\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/commands/resource.go:23\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/argoproj/argo/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/root/go/src/github.com/argoproj/argo/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/root/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:15\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:198\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361"

Version

master

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions