Skip to content

Running Coffea with Dask/Futures executor throw an error: cannot pickle 'property' object #302

@oshadura

Description

@oshadura

Describe the bug

Running an example of Coffea with Dask/Futures executor with METProcessor(processor.ProcessorABC) (one of ADL examples: https://github.com/mat-adamec/coffea-benchmarks) cannot pickle 'property' object

I am almost sure it is a problem with Python version...

To Reproduce

import os

from coffea import hist
from coffea.analysis_objects import JaggedCandidateArray
import coffea.processor as processor

from dask.distributed import Client, LocalCluster
from dask_jobqueue import HTCondorCluster

fileset = {
    'Jets': { 'files': ['root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root'],
             'treename': 'Events'
            }
}

class METProcessor(processor.ProcessorABC):
    def __init__(self):
        self._columns = ['MET_pt']
        dataset_axis = hist.Cat("dataset", "")
        MET_axis = hist.Bin("MET", "MET [GeV]", 50, 0, 100)
        self._accumulator = processor.dict_accumulator({
            'MET': hist.Hist("Counts", dataset_axis, MET_axis),
            'cutflow': processor.defaultdict_accumulator(int)
        })

    @property
    def accumulator(self):
        return self._accumulator

    @property
    def columns(self):
        return self._columns

    def process(self, df):
        output = self.accumulator.identity()
        MET = df['MET_pt']
        output['cutflow']['all events'] += MET.size
        output['cutflow']['number of chunks'] += 1
        output['MET'].fill(dataset=dataset, MET=MET.flatten())
        return output

    def postprocess(self, accumulator):
        return accumulator

client = Client(processes=False, dashboard_address=None)

exe_args = {
        'client': client,
    }
output = processor.run_uproot_job(fileset,
                                treename = 'Events',
                                processor_instance = METProcessor(),
                                executor = processor.dask_executor,
                                executor_args = exe_args
                                )

hist.plot1d(output['MET'], overlay='dataset', fill_opts={'edgecolor': (0,0,0,0.3), 'alpha': 0.8})

for key, value in output['cutflow'].items():
    print(key, value)

Output

Traceback (most recent call last):#######] | 100% Completed |  3.3s
  File "adl1.py", line 65, in <module>
    output = processor.run_uproot_job(fileset,
  File "/usr/lib/python3.8/site-packages/coffea/processor/executor.py", line 774, in run_uproot_job
    pi_to_send = lz4f.compress(cloudpickle.dumps(processor_instance), compression_level=pi_compression)
  File "/usr/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 62, in dumps
    cp.dump(obj)
  File "/usr/lib/python3.8/site-packages/cloudpickle/cloudpickle_fast.py", line 538, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle 'property' object

Desktop (please complete the following information):

  • OS: Manjaro
  • Python 3.8.2
  • Coffea: 0.6.39

CC: @mat-adamec

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions