-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
backend-scrumitems centered around engineering activitiesitems centered around engineering activities
Description
Done Condition (What do we need? Why do we need it? Keep this is small as possible!)
- Confirm with Chris if disabled DAGs will be re-enabled in production pipeline - Confirmed 7/17 that these DAGs will eventually be re-enabled
- Databricks environment files - are the access key / secret linked to Chris's user? Should the dev pipeline use the dev databricks server? - test server credentials are linked to Chris, prod server credentials came from MODE team and are shared between the CDM team for production use. The test server stores NLP models - these should be migrated to the prod Databricks server.
- Repos private - repo pip install in yaml files don't work. Had to install manually with
pip install .
to avoid making each repo public. Does each repo need to be a pip dependency? - Repos can be made public, or we can punt on this and move towards monorepo approach - Changes to environment.yml files - should be propagated to each repo:
- python=3.11
- numpy=1.23.5
- pyzmq=25.1.2
DAGs that are enabled on the production Airflow server:
- cdm_etl_diagnosis: Issue in dev branch with removed functions being called:
_clean_data_icdo
andprimary_dx_timeline_formatting
. Need to confirm this behavior with Chris. Output file (table_dx_timeline_primary.tsv
) is a very different size than before (21 mib before, 174 mib after) - Chris will address code errors
DAGs that are not enabled on the production Airflow server:
- cdm_etl_tumor_site_prediction: Issue with
radiology_rpt_tumor_site_prediction
task - condor file runs a script that has hardcoded paths. Need to look into parameterizing condor submit files. - cdm_etl_progression: Issue with
radiology_rpt_progression_prediction
task - condor file runs a script that has hardcoded paths. - cdm_cancer_presence_radiology_inference: Issue creating cprr conda environment. Confirm with Chris/CDM team.
LibMambaUnsatisfiableError: Encountered problems while solving:
- nothing provides __glibc >=2.28,<3.0.a0 needed by c-ares-1.33.1-heb4867d_0
Could not solve for environment specs
The following package could not be installed
└─ c-ares ==1.33.1 heb4867d_0 is not installable because it requires
└─ __glibc >=2.28,<3.0.0a0 *, which is missing on the system.
- cdm_prior_tx_dag: Couldn't create conda env - environment.yml doesn't have versions or specify msk_cdm as a dependency
Technical Description (How are we going to achieve the above)
Potential Issues
Dependencies
Technical Requirements
Outside People/Teams
Changes
Metadata
Metadata
Assignees
Labels
backend-scrumitems centered around engineering activitiesitems centered around engineering activities