Skip to content

CDM Dag testing follow-ups #1549

@callachennault

Description

@callachennault

Done Condition (What do we need? Why do we need it? Keep this is small as possible!)

  • Confirm with Chris if disabled DAGs will be re-enabled in production pipeline - Confirmed 7/17 that these DAGs will eventually be re-enabled
  • Databricks environment files - are the access key / secret linked to Chris's user? Should the dev pipeline use the dev databricks server? - test server credentials are linked to Chris, prod server credentials came from MODE team and are shared between the CDM team for production use. The test server stores NLP models - these should be migrated to the prod Databricks server.
  • Repos private - repo pip install in yaml files don't work. Had to install manually with pip install . to avoid making each repo public. Does each repo need to be a pip dependency? - Repos can be made public, or we can punt on this and move towards monorepo approach
  • Changes to environment.yml files - should be propagated to each repo:
- python=3.11
- numpy=1.23.5
- pyzmq=25.1.2 

DAGs that are enabled on the production Airflow server:

  • cdm_etl_diagnosis: Issue in dev branch with removed functions being called: _clean_data_icdo and primary_dx_timeline_formatting. Need to confirm this behavior with Chris. Output file (table_dx_timeline_primary.tsv) is a very different size than before (21 mib before, 174 mib after) - Chris will address code errors

DAGs that are not enabled on the production Airflow server:

  • cdm_etl_tumor_site_prediction: Issue with radiology_rpt_tumor_site_prediction task - condor file runs a script that has hardcoded paths. Need to look into parameterizing condor submit files.
  • cdm_etl_progression: Issue with radiology_rpt_progression_prediction task - condor file runs a script that has hardcoded paths.
  • cdm_cancer_presence_radiology_inference: Issue creating cprr conda environment. Confirm with Chris/CDM team.
LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides __glibc >=2.28,<3.0.a0 needed by c-ares-1.33.1-heb4867d_0
 
Could not solve for environment specs
The following package could not be installed
└─ c-ares ==1.33.1 heb4867d_0 is not installable because it requires
   └─ __glibc >=2.28,<3.0.0a0 *, which is missing on the system.
  • cdm_prior_tx_dag: Couldn't create conda env - environment.yml doesn't have versions or specify msk_cdm as a dependency

Technical Description (How are we going to achieve the above)

Potential Issues

Dependencies

Technical Requirements

Outside People/Teams

Changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend-scrumitems centered around engineering activities

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions