Skip to content

Darwin Core Archive Field Value Counter

John Wieczorek edited this page Nov 8, 2016 · 8 revisions

This workflow:

  • creates a given directory as a workspace
  • downloads a Darwin Core Archive from a given URL
  • extracts the core file of a Darwin Core Archive to a tab-separated text file
  • for each field in a given list of fields, creates a report of counts of distinct values

The files produced by this workflow are:

  • dwca.zip - the Darwin Core archive file downloaded from the given URL
  • dwca_extracted_occurrences.txt - the core file of the downloaded Darwin Core Archive as a TXT file
  • count_[field].csv - for each field in the given list of fields, a file containing the distinct values and the number of times they appeared in the extracted core file. Files are named 'count_[field].csv', where [field] is the name of the field for which the report is being made. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report

References

Workflow configuration file: https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_term_values.yaml

Clone this wiki locally