Skip to content

Contribution: 671 ecoinvent→EF 3.1 biosphere links + 3 likely incorrect mappings #8

@mklarmann

Description

@mklarmann

Summary

We (Eaternity) have been building a unified biosphere flow mapping that triangulates across three registries (ecoinvent biosphere3, BAFU/UVEK, and EF 3.1) using 8 evidence sources. We recently integrated the D-D-S consensus flows from randonneur_data as one of those evidence sources. This integration was very valuable — it confirmed 4,226 of our existing GLAD-based links and added 119 new unique cross-registry links.

As part of this integration, we identified:

  1. 671 ecoinvent biosphere3 → EF 3.1 links that we have but randonneur_data currently lacks
  2. 3 likely incorrect mappings in the current data that we think should be reviewed

Proposed New Links (671)

The full set of 671 links is available in randonneur format (ready for direct import) as a gist:

eaternity_biosphere_feedback.json (456 KB, CC-BY-4.0)

Methodology

These links were derived by triangulating biosphere3, BAFU/UVEK (32,525 flows), and EF 3.1 (93,815 flows) using 8 evidence sources:

Evidence Source Weight Description
exact_name_compartment 0.40 Exact name match within same compartment
cf_fingerprint_agreement 0.25 Characterization factor agreement across ≥2 LCIA methods
glad_uuid_bridge 0.20 GLAD UUID-based authoritative link
dds_consensus_bridge 0.18 D-D-S consensus (this package!)
existing_mapping 0.15 Pre-existing cross-registry mapping
base_name_compartment_strong 0.12 Base name match (strong, same qualifier)
base_name_compartment 0.08 Base name match (weak)
paren_comma_name_compartment 0.10 Parenthetical/comma variant match

The 671 proposed links all have a minimum combined confidence score of 0.30. Evidence breakdown:

  • 651 have exact_name_compartment evidence
  • 661 have existing_mapping evidence (from BAFU triangulation)
  • 530 have glad_uuid_bridge evidence (GLAD confirms but D-D-S lacks the specific compartment variant)
  • 40 have cf_fingerprint_agreement (CF values match across registries)

Each entry in the gist follows the randonneur replace format with source UUID, target UUID, names, contexts, and a comment explaining the evidence.

3 Likely Incorrect Mappings

We found 3 mappings in the current ecoinvent-3.x-biosphere-EF-3.1-biosphere datasets that appear to be errors. All originate from GLAD:

1. Methyl acrylate → methacrylate (10 links across compartments)

  • bio3 name: Methyl acrylate
  • bio3 CAS: 000096-33-3
  • ef31 name: methacrylate
  • D-D-S method: glad
  • Problem: These are different chemical compounds:
    • Methyl acrylate (CAS 96-33-3) = methyl ester of acrylic acid (CH₂=CHCOOCH₃)
    • Methacrylate = methacrylic acid / methyl methacrylate (CAS 79-41-4 / 80-62-6)
    • Different molecular structure (methyl acrylate lacks the alpha-methyl group that defines methacrylate)
    • Different ecotoxicity CFs in USEtox: methyl acrylate has higher acute toxicity (LC50 freshwater fish 29 mg/L vs ~200+ mg/L for methyl methacrylate)
  • Affected compartments: air (5 sub-compartments), water (4 sub-compartments), soil (1)
  • Recommendation: Remove these 10 links. Methyl acrylate should map to "methyl acrylate" or "methyl propenoate" in EF 3.1, not "methacrylate".

2. Imazethapyr → pursuit (3 links)

  • bio3 name: Imazethapyr
  • bio3 CAS: 081335-77-5
  • ef31 name: pursuit
  • D-D-S method: glad (air), algorithmic (water, soil)
  • Problem: "Pursuit" is a trade name (BASF herbicide brand) rather than a chemical name. While Pursuit does contain imazethapyr as the active ingredient, EF 3.1 should use standard chemical nomenclature, not brand names. This creates ambiguity:
    • A formulated product ("Pursuit") contains adjuvants and inert ingredients beyond the active substance
    • Trade names can refer to different formulations over time
    • Other databases won't recognize "pursuit" as imazethapyr
  • Affected compartments: air/non-urban, soil/agricultural, water/ground-
  • Recommendation: Rename the EF 3.1 target to "imazethapyr" (the IUPAC/ISO common name used in all other pesticide databases: PAN, PPDB, PubChem).

3. HCFC-140 CAS number inconsistency (5 affected links)

  • bio3 name: Ethane, 1,1,1-trichloro-, HCFC-140
  • ef31 name: 1,1,1-trichloroethane
  • Problem: The air emission entries carry CAS 000079-00-5, which is the CAS number for 1,1,2-trichloroethane (a different isomer). The correct CAS for 1,1,1-trichloroethane (HCFC-140) is 000071-55-6.
    • The same bio3 flow's water emission entries correctly use CAS 000071-55-6
    • So within the same dataset, HCFC-140 has two different CAS numbers depending on compartment
    • These isomers have very different properties:
      • 1,1,1-trichloroethane (CAS 71-55-6): ozone-depleting substance, ODP = 0.12
      • 1,1,2-trichloroethane (CAS 79-00-5): not ozone-depleting, but higher cancer risk (IARC Group 3)
    • Using the wrong CAS would pull incorrect characterization factors from USEtox
  • Affected compartments: 5 air sub-compartments have the wrong CAS
  • Recommendation: Correct CAS to 000071-55-6 for all air emission entries of "Ethane, 1,1,1-trichloro-, HCFC-140"

Additionally, we noticed that "Ethane, 1,1,2-trichloro-" (without the trifluoro suffix) maps to "Hydrocarbons, chlorinated" (a generic group flow) in 4 air compartments. This is a significant loss of specificity — EF 3.1 does have "1,1,2-trichloroethane" as a specific flow (used in the air/unspecified compartment). The other 4 compartments should also map to the specific flow rather than the generic group.

Our Integration Results

For reference, integrating the D-D-S consensus flows into our mapping yielded:

Metric Value
D-D-S links loaded 4,345
Confirms our GLAD links 4,226 (97.3%)
New unique links from D-D-S 119
Clusters strengthened 993
Cross-block merges 22
CF conflicts introduced 0
Regressions 0

The integration was smooth and highly valuable. The randonneur format and CC-BY-4.0 licensing made it straightforward to consume programmatically.

Context

We maintain a unified biosphere flow mapping across 84,276 flow clusters (6,580 cross-registry) as part of our open LCA infrastructure. The mapping is used to bridge BAFU/UVEK (Swiss Federal Office for the Environment) and ecoinvent flows to the EF 3.1 framework for multi-impact assessment. Our pipeline runs weekly automated updates against randonneur_data to stay current.

Happy to provide more detail on any of these findings, or to submit the 671 links as a PR in whatever format works best for the project.


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions