-
Notifications
You must be signed in to change notification settings - Fork 3
Darwin Core Archive Field Value Counter
John Wieczorek edited this page Nov 8, 2016
·
8 revisions
This workflow:
- creates a given directory as a workspace
- downloads a Darwin Core Archive from a given URL
- extracts the core file of a Darwin Core Archive to a tab-separated text file
- for each field in a given list of fields, creates a report of counts of distinct values
The files produced by this workflow are:
- dwca.zip - the Darwin Core archive file downloaded from the given URL
- dwca_extracted_occurrences.txt - the core file of the downloaded Darwin Core Archive as a TXT file
- count_[field].csv - for each field in the given list of fields, a file containing the distinct values and the number of times they appeared in the extracted core file. Files are named 'count_[field].csv', where [field] is the name of the field for which the report is being made. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report
Workflow configuration file: https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_term_values.yaml