Skip to content

api: Use exponential backoff in call_on_each_event. #586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions zulip/zulip/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -601,8 +601,16 @@ def call_on_each_event(
if narrow is None:
narrow = []

def do_register() -> Tuple[str, int]:
while True:
queue_id = None
# NOTE: Back off exponentially to cover against potential bugs in this
# library causing a DoS attack against a server when getting errors
# (explicit values listed for clarity)
backoff = RandomExponentialBackoff(maximum_retries=10,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can go with maximum_retries=50, i.e. basically arranging it so that it will eventually give up, but only if it's really clear the server is down for good and not just having temporary downtime.

I also notice there's a time.sleep(1) in do_register; probably that should be changed as well.

(Or maybe do_register should be folded into this loop?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, the behavior is changed from before, which is a little concerning, though arguably useful - the handler will return after some time, rather than be in an infinite loop, so the caller can decide whether to retry again?

We could put the backoff parameters into the method call, or something like them; that could also enable behavior like previously (regarding the infinite loop), but with backoff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct to potentially fail, so the caller knows something is horribly wrong. We have a bunch of clients that are forever looping and getting 401s during registration, and I don't think that their owners have any sign that anything is going wrong.

I think we should raise an exception if we hit the max retries, since existing code is likely not expecting the function to ever return. Raising an exception means it won't unexpectedly fall through to other code.

This behavior change certainly merits a documentation update as well.

timeout_success_equivalent=300,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a timeout_success_equivalent on this. That's for cases where we don't have the ability to put an explicit success() call in, and we want to be able to reset our backoff after 5 minutes on the assumption that that's a "success." Here, we have places we can mark as successful. Having a timeout_success_equivalent muddies the logic -- and I think won't kick in unless we have 5-minute HTTP requests, which should really not be treated as successes. :)

delay_cap=90)
while backoff.keep_going():
# Ensure event queue exists (or continues to do so)
if queue_id is None:
if event_types is None:
res = self.register()
else:
Expand All @@ -611,18 +619,15 @@ def do_register() -> Tuple[str, int]:
if 'error' in res['result']:
if self.verbose:
print("Server returned error:\n%s" % res['msg'])
time.sleep(1)
backoff.fail()
continue
else:
return (res['queue_id'], res['last_event_id'])

queue_id = None
# Make long-polling requests with `get_events`. Once a request
# has received an answer, pass it to the callback and before
# making a new long-polling request.
while True:
if queue_id is None:
(queue_id, last_event_id) = do_register()
backoff.succeed()
queue_id, last_event_id = res['queue_id'], res['last_event_id']

# Make long-polling requests with `get_events`. Once a request
# has received an answer, pass it to the callback and before
# making a new long-polling request.
res = self.get_events(queue_id=queue_id, last_event_id=last_event_id)
if 'error' in res['result']:
if res["result"] == "http-error":
Expand All @@ -649,12 +654,11 @@ def do_register() -> Tuple[str, int]:
#
# Reset queue_id to register a new event queue.
queue_id = None
# Add a pause here to cover against potential bugs in this library
# causing a DoS attack against a server when getting errors.
# TODO: Make this back off exponentially.
time.sleep(1)

backoff.fail()
continue

backoff.succeed()
for event in res['events']:
last_event_id = max(last_event_id, int(event['id']))
callback(event)
Expand Down