Conversation
|
The idea around using One of the main things to convey when a reconnection happens is that message history may have been lost, which is why we send an explicit Using an explicit new message reduces the dependencies on other extensions, which in this case I think is a good thing. |
There was a problem hiding this comment.
How might this interact with a METADATA spec or anything else like that? I would expect all the client set metadata to be restored when resuming.
Probably needs to be documented in either this spec or the revised metadata spec or both, but not urgent now as neither of them are mergeable yet.
|
There's no mention of the length of time a server may allow a connection to be resumed. Is the intended use to be able to resume after 5mins of disconnection? 1 hour? Or x2 ping timeout length? And I'm assuming that once the RESUME cap has been negotiated that's when the server will not simply quit the user after a ping timeout. Which leads me to think the cap should also advertise the length of time a connection can be resumed within? |
|
As mentioned on other specs, scoping any errors to the command will help tidy things up and remove the need for extra numerics. Eg. |
|
Honestly I haven't given thought to the possibility of letting/having the server extend the ping timeout length when the cap's negotiated, or similar. I suppose servers could definitely do so if they want to, that'd be interesting. Maybe the cap's value could be a series of k/v's similar to sts' params. Right now, it's intended to just allow reconnection up to the point where the server disconnects the client due to ping timeout, but allowing servers to make it more clear (and extend how long the client connection stays open if they've negotiated the right cap) definitely makes sense, thanks for the yell with that! |
|
So if I've understood this correctly, the only way to make use of this is to guess the servers ping timeout length, and ping the server from the client at an interval shorter than that to try catch it before the server does. This sounds messy and ripe for a connection race condition. Having an increased timeout once the cap has been negotiated will make this more useful. |
Is this language still accurate? Successful SASL does not terminate registration on oragono, at least. It seems like the only things that can terminate registration are |
|
suggestion from e, |
|
another suggestion, explicitly tell clients how to act if they receive a WARN/FAIL code that they don't understand (if a new one is added in future how to act?) |
|
another suggestion: amend
to
|
|
note to self: in the examples, could probably remove the source from most of the std-replies responses and possibly other things |
|
Idea mentioned in the channel: make the server remember caps between sessions that are resumed? can probably cut down a lot on traffic needed to resume, client and server both already know what they've negotiated anyways. |
Following that line of thought: how about assuming that in servers for which we have a resume token, we can just assume the cap is supported, and send the command before the CAP LS response? So this example from the current version: Becomes If If it is supported, the first response from the server is the RESUME command, at which point the client can assume the old caps from that session are back (optionally, the server could send a cap ack as if we just had requested it?) If someone really wants to, they can send Sending it in response to the activation of the cap (as the current version) is still a valid flow, just a higher latency one. |
|
That flow may make more sense, yeah |
|
A note: one of the motivations for this change is to speed up reconnections of mobile clients that detach and wait for push notifications, but I'm not sure how viable it is for that purpose in its current form, since we'd have to keep the session information around for much longer than current implementations do (210 seconds in Oragono). This also has implications for presence and communication of presence to other clients. Transfer of ISUPPORT (avoidable), MOTD (avoidable), and NAMES (probably unavoidable) were also suggested as reasons for slow mobile performance. It would be useful to have more details on mobile handshake performance (how much of it is roundtrips from the chattiness of the current registration sequence, how much of it is bandwidth). |
Under the normal rules, C would then change nick to dan-backup-nick, which seems unideal. |
|
In a potential low-latency version of the resume handshake, the server could send:
and omit:
|
|
Would it make sense to publish the session lifetime as a cap value? |
|
I can't decide whether I like My specs tend to end up very long and wordy and precise, and a feature like this does require some specificity. But I feel like the current spec, as written, isn't as client-friendly as it could be. Server devs have seeeemed to like it though? (we have an impl, solanum has a pr, insp's been interested in implementing it). Honestly it might just be that the spec as written is laid out very 'academically', whereas it should probably start up the top with a very quick rundown of 'here are a few flows inc. C2S proto stuff' that hotlinks to the actual message/etc definitions down below. Aiming to make it easier-to-understand for client devs. With a change like that, the length may not even be so much of an issue for me. Now, onto some stuff that's I like
I don't mind Another very specific point. This spec isn't 'first and foremost' designed to replay history. That's like an added bonus (since we're already sending a timestamp to notify other clients of how much history we lost, hell why can't the server just automagically replay that lost history to us). I think it's a bit too prominent in the spec and mentions of it should be dialled back, because this spec isn't for that. Or maybe the spec is for that and I need to accept it. Server implementors, thoughts? Another point, implementors allowing I'm interested in whether client devs think 'reconnect to the same server to use resume' complicates their reconnection logic significantly, or if there are other issues that make them not want to reconnect to the same server again as a rule (e.g. working around server issues, just not wanting to complicate their connection logic like that, etc). I've mentioned this and the general thoughts seem to be that it's either not enough of an issue for me to worry about or that it's a worthwhile complication. Further, would clients find a cap value like |
|
I am opposed to ratification of The current spec tries to do too much and provides too little benefit to users, in exchange for a considerable increase in implementation complexity. In particular, the core purpose of the spec (as identified in its introduction) is essentially just to streamline or automate
On the server side, the spec is so difficult to implement that there are only two candidate implementations. One is Oragono, which doesn't have to consider the multiple-server case. The other implementation disallows resuming on a different server than the one you started on, which (a) increases the complexity of client reconnect implementations (b) is not currently exposed to clients. There are workarounds for some of these problems, e.g., we could include both the server's extended ping timeout and whether all-server resume is supported in the CAP value. But I think at this point it's better to go back to the drawing board. For example:
|
|
Summarizing some discussion from #ircv3: the spec, as written, does not make IRC viable on mobile platforms (which are increasingly hostile to background TCP connections). So it's applicable mainly to connection disruptions and software restarts on desktops and servers. Moreover, not all implementations may be able to maintain history across spontaneous disruptions (as opposed to BRB). So, the baseline supported use case is software restarts of desktop and server software. At which point the question is: is the (considerable) engineering effort involved in implementing this specification justified? |
No, that's not the reason why SASL's disallowed with resuming a connection. The reason for that is just that... if you sasl, the connection registration process finishes, so you can't do it and also do something that hijacks the connection resumption process like resume. I'd be interested in client developer opinions on this, and how the spec handles this aspect.
This is a v3 draft, you really can't use 'number of implementations' as an indicator of spec worth/complexity. I mean, this is more implementations than most of our junk gets before it gets pushed to the site in draft form.
Um. Yeah. The part of my comment which says "'fix mobile clients and irc' is a goal that's pretty often been pushed onto this spec and that people use to justify either liking (it solves this yay!) or not-liking (it doesn't solve this, boo!) this spec. [...] I also want to make very clear that the above two use cases aren't things that this spec tries to touch in any way." indicates this.
Yes. This is the point. I don't see why it couldn't be useful to mobile clients in the same way, just that they'd probably also request whatever protocol-optimization cap we end up making for mobile clients.
Yes. This is the point. I don't expect implementations to suddenly start storing history when they haven't before -- that bit's purely opportunistic if it's available, and if it's not then cool other clients will know that the reconnecting client missed messages at least. |
|
If someone is really bent on the 'this doesn't solve mobile, so it's not worth it' then what does a replacement for this spec which focuses on mobile look like? Does it aim at the same thing that this specification aims at, or is it pretty orthogonal to this? The two alternate things which I've heard which would be mobile alternatives to this spec/functionality are:
|
That's my position: doing this is necessary to support mobile, and having it would make RESUME redundant. Optimizing the handshake is just the icing on the cake.
I think we're on the same page here about use cases here. I just don't see bouncer<->ircd as something that's critical to streamline or optimize, particularly given the other limitations on functionality we're accepting.
As discussed, I believe this to be inaccurate. |
|
As mentioned in the meeting, unless there's some more concrete client interest in this feature then I'm leaning very much towards killing it. As-is it probably isn't useful enough to justify the complexity.
|
|
To clarify, I closed this in response to what looked like lack of interest from the proposers of the spec, and what appeared to be confused/conflicting ideas around what the spec is actually for. Apologies if that was hasty, I can reopen this if that's not the case. From my own perspective as a client developer, I'm seeing resume as a solution to "faster and less noisy reconnect after brief/sudden loss of connectivity" or "brb for a software update restart". From that point of view, it seems like a cool concept, but not worth the length/complexity of this spec to achieve. And it's sounding like it will be hard to ratify as a result. |
|
I probably would've closed it in a few days anyway. With regards to 'lack of client interest', for me that's mostly just me informally mentioning it around a few client devs and them largely seeming meh or unenthused. Very unofficial, but yeah it affects my want to continue working on the spec. If clients like the idea/spec in its current form, awesome. If they like the idea but are unenthused about the spec as it stands, cool I could work with that. If they aren't really into the idea at all, then eh. Generally it's felt like server devs are pretty into this while client devs aren't. I think the spec right now is way too long for client devs to be interested in implementing it, and it's also not really written in a way that helps people understand how to implement it (this is my spec/tech writing nerd coming out but it's written in a very 'clinical' way that server devs may appreciate but I can't see clients liking at all). I could probably spend time working on this and trying to make it shorter/simpler, and improve the structure of the spec, and make sure there's much less confusion about what exactly this feature aims to solve. But fact is that right now I'm not enthusiastic enough to put the time into doing that. |
Would server dev's be interested in migrate more? I see both resume and migrate as trying to do something very similar (allowing the other party to reconnect), just one is orchestrated from the server, and the other from the client. As a client author, I'd happily implement migrate as I can see this as a gap from being able to keep networks up to date and secure from a maintenance point of view. For resume, it doesn't solve any of my problems as it is today, but that's not to say it doesn't solve problems for all clients. Perhaps it could be useful for bouncers and similar which may want to perform any patching at a scale without disrupting their customers. |
|
as a client dev I like this extension idea |
As a server developer no. It has an entirely dififerent use case and honestly it seems like it's trying to solve a problem that doesn't really exist (at least not for us). |
|
@SadieCat Orchestrating security patching/keep software up to date without disconnecting and disrupting ongoing conversations for connected clients is the use-case I imagine the most for migrate. I don't see this as an great user experiance at the moment. |
|
We've solved this problem by pushing TLS out to a separate process (in our case HAProxy) and transferring fds/data to the new process. |
|
@SadieCat That doesn't count for all cases of security patching, the kernel/os, updating TLS libraries for active connections etc. |
|
👀 |
Occasionally, clients disconnect from IRC. What happens these days is that the client connects with a different nick, joins all their old channels again, waits for the old connection to time out (or manually kills it using services), and then changes back to their original nickname.
This feature intends to vastly simplify this form of reconnection, and reduce the amount of nick-switching,
JOINandQUITnotices, and general disruption that other clients see when this happens.Note that this is not related to detaching and re-attaching an active connection (such as if a client is forced to change servers), but only to streamline the reconnection process for unintended connection drops.
Rendered specification