-
Notifications
You must be signed in to change notification settings - Fork 54
Added support for storing the state of the Event Processor along the Checkpoint #84
Added support for storing the state of the Event Processor along the Checkpoint #84
Conversation
* Made setup 2.7 compatible * Separated async tests * Support 2.7 types * Bumped version * Added non-ascii tests * Fix CI * Fix Py27 pylint * Added iot sample * Updated sender/receiver client opening * bumped version * Updated tests * Fixed test name * Fixed test env settings * Skip eph test
Checkpoint. Both Checkpoint and the EP state are stored as pickled objects.
…state_along_checkpoints' into storing_event_processor_state_along_checkpoints
Resolves #82 |
Thanks so much @konstantinmiller! Though a quick comment in the meantime - Please avoid the use of pickle. The blob leases need to be interchangeable between EPH implementations of different languages and pickle will potentially make the lease formatting Python-specific. The formatting of the leases should be compatible with both those generated from other SDKs and those generated with previous versions of the Python SDK (i.e. where no context data is in the lease). With regards to how other SDKs will load a lease that has been created with an extra field of data - that's an interesting question which I will take a look at and update this thread with my findings. Thanks again for taking the time to add this! |
Ok, I definitely see your point. Even though I'm not 100% sure if having event processors that are using different languages for the same Event Hub is a common use case? One solution would be to store everything as JSON. This, however, will incur an overhead for converting binary data into, say, base64 representation, and it will also have longer reading and writing times from and to the storage due to its larger size, as compared to binary. Is there any binary format that is similarly portable as JSON across languages? As an alternative, maybe it would be acceptable to write the checkpoint as JSON to one blob block and the EP context as binary to another? In any case, please let me know what is your preferred solution so that I can modify the PR. We would like to use your EPH library in production as soon as possible. |
How about protobuf? |
So, would it help if we would say, we keep the JSON format, and include the custom object as a base64 encoded byte array? |
Hi @konstantinmiller - sorry for the delay. Yeah I think bytes might be a good representation in this case. Universal and lightweight :) |
Shall I update the PR? This solution will be JSON and thus no longer Python-dependent, but it would still introduce an additional field to the lease. Will that be OK? |
Thanks @konstantinmiller - yes please update the PR :) |
I've updated the request. The Event Processor state is now stored as an additional field in the JSON object that is written to the blob. The data is expected to be in string format. Which, for example, could be a pickled object, encoded in base64. |
@annatisch , could you please have a look at the PR and let us know if it can be merged of it needs to be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @konstantinmiller!
Sorry for the delay - on the whole it looks good. Just a couple of minor comments :)
Thanks for this @konstantinmiller - I am just working on a patch for issue #89 after which I will publish a release :) |
…Checkpoint (#84) * Updates for release 1.2.0 (#81) * Made setup 2.7 compatible * Separated async tests * Support 2.7 types * Bumped version * Added non-ascii tests * Fix CI * Fix Py27 pylint * Added iot sample * Updated sender/receiver client opening * bumped version * Updated tests * Fixed test name * Fixed test env settings * Skip eph test * Added support for storing the state of the Event Processor along the Checkpoint. Both Checkpoint and the EP state are stored as pickled objects. * Fixing pylint complaints. * Switched from pickle back to JSON for lease persistence. * Fixes bug when accessing leases that don't contain EP context. Also, minor renaming.
Added support for storing the state of the Event Processor along the Checkpoint. Both Checkpoint and the EP state are now stored as one pickled object. This is good if the EP state is a big (tens of Megabytes) binary object.
Alternatively, we could implement an option to chose if to store as JSON or pickled object. However, if the EP state is a big binary object, it will have a processing and storage overhead if stored as JSON. Another alternative would be to store the checkpoint in one blob block as JSON, while storing the EP state in another blob block as pickled object. But storing both as one pickled object is the simplest solution.