-
Notifications
You must be signed in to change notification settings - Fork 3
Darwin Core Archive Geography Assessor
John Wieczorek edited this page Nov 8, 2016
·
6 revisions
This workflow:
- creates a given directory as a workspace
- downloads a Darwin Core Archive from a given URL
- downloads a geography lookup file from https://github.com/kurator-org/kurator-validation/tree/master/packages/kurator_dwca/data/vocabularies
- downloads a country lookup file from https://github.com/kurator-org/kurator-validation/tree/master/packages/kurator_dwca/data/vocabularies
- extracts the core file of the Darwin Core Archive to a tab-separated text file
- creates a report of counts of distinct values of the combination of higher geography fields
- creates a report of counts of distinct values of the country field
- creates a report of recommended values for geography
- creates a report of geography combinations not found in the geography lookup file
- creates a report of country values not found in the country lookup file
The files produced by this workflow are:
- count_country.csv - the distinct values of the country field and the number of times they appeared in the extracted core file. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report
- count_geography.csv - the distinct combination of values of the higher geography fields and the number of times they appeared in the extracted core file. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report
- dwca_extracted_occurrences.txt - the core file of the downloaded Darwin Core Archive as a TXT file
- dwca.zip - the Darwin Core archive file downloaded from the given URL. See https://github.com/kurator-org/kurator-validation/wiki/Vocabulary-File-Structure
- lookup_country.txt - downloaded copy of the country lookup file
- lookup_geography.txt - downloaded copy of the geography lookup file
- new_country.csv - file containing the country values not found in the country lookup file.
- new_geography.csv - file containing the distinct combinations of higher geography not found in the geography lookup file.
- recommended_geography.csv - file containing the recommendations to standardize distinct combinations of higher geography. See https://github.com/kurator-org/kurator-validation/wiki/Geography-Recommendation-Report
Workflow configuration file: https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_geography_assessor.yaml
Darwin Core Controlled Value lookup files: https://github.com/kurator-org/kurator-validation/tree/master/packages/kurator_dwca/data/vocabularies