-
Notifications
You must be signed in to change notification settings - Fork 3
Darwin Core Archive Controlled Field Assessor
John Wieczorek edited this page Nov 8, 2016
·
10 revisions
This workflow:
- creates a given directory as a workspace
- downloads a Darwin Core Archive from a given URL
- downloads vocabulary lookup files from https://github.com/kurator-org/kurator-validation/tree/master/packages/kurator_dwca/data/vocabularies.
- extracts the core file of a Darwin Core Archive to a tab-separated text file
- for each field in a the list of Darwin Core Controlled Value fields (see below), creates a report of counts of distinct values
- for each field in a the list of Darwin Core Controlled Value fields (see below), creates a report of recommended values for values that are not standard.
The files produced by this workflow are:
- count_[field].csv - for each Darwin Core Controlled Value field found in the extracted file, a file containing the distinct values and the number of times they appeared in the extracted core file. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Count-Report
- dwca.zip - the Darwin Core archive file (see https://en.wikipedia.org/wiki/Darwin_Core_Archive) downloaded from the given URL
- dwca_extracted_occurrences.txt - the core file of the downloaded Darwin Core Archive as a TXT file
- recommended_[field].csv - for each Darwin Core Controlled Value field found in the extracted file for which recommendations could be found, a file containing the recommendations to standardize values in that field. See https://github.com/kurator-org/kurator-validation/wiki/Field-Value-Recommendation-Report
- vocab_[field].txt - for each Darwin Core Controlled Value field found in the extracted file, a file containing the lookup file to use for standardizing that field. See https://github.com/kurator-org/kurator-validation/wiki/Vocabulary-File-Structure
Workflow configuration file: https://github.com/kurator-org/kurator-validation/blob/master/packages/kurator_dwca/workflows/dwca_controlled_term_assessor.yaml
Darwin Core Controlled Value lookup files: https://github.com/kurator-org/kurator-validation/tree/master/packages/kurator_dwca/data/vocabularies
Darwin Core Controlled Value fields (from http://rs.tdwg.org/dwc/terms/index.htm):
- basisOfRecord (http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord)
- day (http://rs.tdwg.org/dwc/terms/index.htm#day)
- disposition (http://rs.tdwg.org/dwc/terms/index.htm#disposition)
- establishmentMeans (http://rs.tdwg.org/dwc/terms/index.htm#establishmentMeans)
- geodeticDatum (http://rs.tdwg.org/dwc/terms/index.htm#geodeticDatum)
- georeferenceVerificationStatus (http://rs.tdwg.org/dwc/terms/index.htm#georeferenceVerificationStatus)
- identificationQualifier (http://rs.tdwg.org/dwc/terms/index.htm#identificationQualifier)
- identificationVerificationStatus (http://rs.tdwg.org/dwc/terms/index.htm#identificationVerificationStatus)
- language (http://rs.tdwg.org/dwc/terms/index.htm#language)
- license (http://rs.tdwg.org/dwc/terms/index.htm#license)
- lifeStage (http://rs.tdwg.org/dwc/terms/index.htm#lifeStage)
- month (http://rs.tdwg.org/dwc/terms/index.htm#month)
- nomenclaturalCode (http://rs.tdwg.org/dwc/terms/index.htm#nomenclaturalCode)
- nomenclaturalStatus (http://rs.tdwg.org/dwc/terms/index.htm#nomenclaturalStatus)
- occurrenceStatus (http://rs.tdwg.org/dwc/terms/index.htm#occurrenceStatus)
- organismScope (http://rs.tdwg.org/dwc/terms/index.htm#organismScope)
- preparations (http://rs.tdwg.org/dwc/terms/index.htm#preparations)
- reproductiveCondition (http://rs.tdwg.org/dwc/terms/index.htm#reproductiveCondition)
- sex (http://rs.tdwg.org/dwc/terms/index.htm#sex)
- typeStatus (http://rs.tdwg.org/dwc/terms/index.htm#typeStatus)
- taxonRank (http://rs.tdwg.org/dwc/terms/index.htm#taxonRank)
- taxonomicStatus (http://rs.tdwg.org/dwc/terms/index.htm#taxonomicStatus)
- type (http://rs.tdwg.org/dwc/terms/index.htm#type)