Skip to content

Server tries to route traffic through "hard" disconnected agents #152

@Avatat

Description

@Avatat

Hello,
I have Kubernetes 1.18 cluster where Konnectivity v0.0.12 is running.
Most of the time, everything works perfectly, but after ~20 days of konnectivity-server uptime, some DIAL_REQ are getting nowhere.
For example, kubectl logs -n kube-system konnectivity-agent-h6bt9 -f only worked after the third execution.

Server log:

I1015 11:35:55.013608       1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0]
I1015 11:35:55.013887       1 server.go:257] start serving frontend stream
I1015 11:35:55.013926       1 server.go:268] >>> Received DIAL_REQ
I1015 11:35:55.013940       1 backend_manager.go:170] pick agentID=f42e628f-f3a5-42cc-bb75-8065768b1be1 as backend
I1015 11:35:55.014079       1 server.go:290] >>> DIAL_REQ sent to backend

I1015 11:36:10.378211       1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0]
I1015 11:36:10.378345       1 server.go:257] start serving frontend stream
I1015 11:36:10.378357       1 server.go:268] >>> Received DIAL_REQ
I1015 11:36:10.378363       1 backend_manager.go:170] pick agentID=f42e628f-f3a5-42cc-bb75-8065768b1be1 as backend
I1015 11:36:10.378418       1 server.go:290] >>> DIAL_REQ sent to backend

I1015 11:36:18.243790       1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0]
I1015 11:36:18.243873       1 server.go:257] start serving frontend stream
I1015 11:36:18.244049       1 server.go:268] >>> Received DIAL_REQ
I1015 11:36:18.244062       1 backend_manager.go:170] pick agentID=15fb1f4a-55a5-454d-9257-a717e70dc5dd as backend
I1015 11:36:18.244138       1 server.go:290] >>> DIAL_REQ sent to backend
I1015 11:36:18.247179       1 server.go:522] <<< Received DIAL_RSP(rand=2352227733928774978), agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84)
I1015 11:36:18.247268       1 server.go:144] register frontend &{grpc 0xc020108df0 <nil> 0xc02cc82960 84 15fb1f4a-55a5-454d-9257-a717e70dc5dd {13824409201659358837 1806906010189152 0x227f060} 0xc02cd49da0} for agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84
I1015 11:36:18.247867       1 server.go:308] >>> Received 239 bytes of DATA(id=84)
I1015 11:36:18.248012       1 server.go:324] >>> DATA sent to Backend
I1015 11:36:18.256637       1 server.go:551] <<< Received 2171 bytes of DATA from agentID 15fb1f4a-55a5-454d-9257-a717e70dc5dd, connID 84
...

Agent log:

Nothing before...
I1015 11:36:18.244396       1 client.go:271] [tracing] recv packet, type: DIAL_REQ
I1015 11:36:18.244482       1 client.go:280] received DIAL_REQ
I1015 11:36:18.247643       1 client.go:271] [tracing] recv packet, type: DATA
I1015 11:36:18.247680       1 client.go:339] received DATA(id=84)
I1015 11:36:18.247775       1 client.go:413] [connID: 84] write last 239 data to remote
I1015 11:36:18.255019       1 client.go:384] received 2171 bytes from remote for connID[84]
I1015 11:36:18.257829       1 client.go:271] [tracing] recv packet, type: DATA
I1015 11:36:18.257850       1 client.go:339] received DATA(id=84)
I1015 11:36:18.257908       1 client.go:413] [connID: 84] write last 64 data to remote
I1015 11:36:18.258552       1 client.go:271] [tracing] recv packet, type: DATA
I1015 11:36:18.258661       1 client.go:339] received DATA(id=84)
I1015 11:36:18.258777       1 client.go:413] [connID: 84] write last 203 data to remote
I1015 11:36:18.260344       1 client.go:384] received 106 bytes from remote for connID[84]
I1015 11:36:18.264715       1 client.go:384] received 183 bytes from remote for connID[84]

Sometimes, when I want to view logs from the second agent, I'm getting these warnings:
Server log:

I1015 11:51:13.078407       1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0]
I1015 11:51:13.078709       1 server.go:257] start serving frontend stream
I1015 11:51:13.078763       1 server.go:268] >>> Received DIAL_REQ
I1015 11:51:13.078790       1 backend_manager.go:170] pick agentID=d700cd66-6ae4-416f-89be-4523adda93c3 as backend
W1015 11:51:13.078907       1 server.go:288] >>> DIAL_REQ to Backend failed: rpc error: code = Unavailable desc = transport is closing
I1015 11:51:13.078966       1 server.go:290] >>> DIAL_REQ sent to backend
I1015 11:51:15.008189       1 server.go:224] proxy request from client, userAgent [grpc-go/1.26.0]
I1015 11:51:15.008413       1 server.go:257] start serving frontend stream
I1015 11:51:15.008458       1 server.go:268] >>> Received DIAL_REQ
I1015 11:51:15.008485       1 backend_manager.go:170] pick agentID=d700cd66-6ae4-416f-89be-4523adda93c3 as backend
W1015 11:51:15.008574       1 server.go:288] >>> DIAL_REQ to Backend failed: rpc error: code = Unavailable desc = transport is closing
I1015 11:51:15.008627       1 server.go:290] >>> DIAL_REQ sent to backend

Agent logs nothing.

I believe, that konnectivity-server (+ kube-apiserver) pod restart would help (because it helped in the past), but I don't want to do it, because maybe you will need some more tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions