Skip to content

Conversation

@WesleyRosenblum
Copy link
Contributor

@WesleyRosenblum WesleyRosenblum commented Aug 29, 2023

Description of changes:

QUIC-RECOVERY does not explicitly include a requirement that the Probe Timeout (PTO) timer is re-armed after the loss timer expires, but the pseudocode in QUIC-RECOVERY§A.9 does include this behavior:

OnLossDetectionTimeout():
  earliest_loss_time, pn_space = GetLossTimeAndSpace()
  if (earliest_loss_time != 0):
    // Time threshold loss Detection
    lost_packets = DetectAndRemoveLostPackets(pn_space)
    assert(!lost_packets.empty())
    OnPacketsLost(lost_packets)
    SetLossDetectionTimer()    <---------- re-arms the PTO timer
    return

Arming the PTO timer in this case is necessary, as the loss timer expiring will not necessarily cause any additional transmissions. There may still be tail packets pending; the PTO timer is needed to ensure those packets are accounted for and retransmitted as necessary. An example interop failure caused by this can be seen here

In addition, QUIC-RECOVERY§6.2.1 specifies that the acknowledgement of ack-eliciting packets should trigger a PTO update:

A sender SHOULD restart its PTO timer every time an ack-eliciting packet is sent or acknowledged,

The pseudocode in QUIC-RECOVERY§A.7 does not make that distinction, however. This behavior is preferred, as detect_and_remove_lost_packets() will cancel the loss timer, and there may still be ack eliciting packets pending that require a PTO timer for recovery. I've removed the requirement that an acknowledged packet be ack-eliciting to more closely match the pseudocode.

Call-outs:

I refactored the max pto backoff a bit to make it easier to validate the consistency of the PTO timer within the recovery manager.

Testing:

New and updated unit tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@WesleyRosenblum WesleyRosenblum marked this pull request as draft August 29, 2023 22:03
@WesleyRosenblum WesleyRosenblum marked this pull request as ready for review August 30, 2023 02:25
camshaft
camshaft previously approved these changes Aug 30, 2023
//# threshold loss detection will expire earlier than the PTO timer in
//# most cases and is less likely to spuriously retransmit data.
self.pto.cancel();
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd be checking consistency in any of these return points. Maybe move the whole thing into a closure?

(|| {
  // do logic here
})();
self.check_consistency();

especially where we're cancelling the timer it's good to have a sanity check that the invariants still hold

timer_required &= self.time_of_last_ack_eliciting_packet.is_some();

if timer_required {
assert!(self.armed_timer_count() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be easier to debug if you use assert_ne since that'll print the actual value

Suggested change
assert!(self.armed_timer_count() > 0);
assert_ne!(self.armed_timer_count(), 0);

@WesleyRosenblum WesleyRosenblum merged commit 8e00eec into main Aug 31, 2023
@WesleyRosenblum WesleyRosenblum deleted the WesleyRosenblum/ptoconsistency branch August 31, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants