packet: Read BGP peer address from metadata service#1010
Conversation
87ebbc3 to
c9ed8a3
Compare
Ideally such a test would be a "blackbox" ingress test (i.e. sending traffic via MetalLB). However, there doesn't seem to be a trivial way to handle EIP allocation (in combination with multiple parallel CI runs) so we would have to build something for it. Additionally, such a test wouldn't have caught this specific issue because only specific facilities were affected. I can't think of a unit test which would have caught this problem. If you can, please share your thoughts. |
|
There is a race condition with enabling BGP. |
rata
left a comment
There was a problem hiding this comment.
Added two comments about failure scenarios, but if this is needed ASAP we can handle those in a following PR. As you consider it is best.
assets/terraform-modules/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl
Outdated
Show resolved
Hide resolved
assets/terraform-modules/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl
Outdated
Show resolved
Hide resolved
c9ed8a3 to
7f1d1cd
Compare
de2f728 to
feaa9ab
Compare
Fixed using a retry script. |
We randomize the locations to get best machines availability, so it would occur in some cases at least.
Right... Given that it is not trivial to test, we should merge this to fix issue and perhaps raise the priority of testing it. |
invidian
left a comment
There was a problem hiding this comment.
Just one thought, otherwise LGTM
assets/terraform-modules/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl
Show resolved
Hide resolved
|
Marking as "do not merge" until #1010 (comment) is resolved. |
assets/terraform-modules/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl
Show resolved
Hide resolved
feaa9ab to
e26628c
Compare
I think it should fail, yes. I guess that would assume mis-configured project which has BGP disabled? |
For example. Could also be other things we haven't foreseen. The current behavior is to fail. I agree it's better to fail loudly than quietly here. |
afeaa89 to
61fbe0e
Compare
|
I've refactored the script to poll for BGP metadata and write the peer address to I've tested a manual deployment with BGP both enabled and disabled and this seems to work well. |
assets/terraform-modules/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl
Outdated
Show resolved
Hide resolved
61fbe0e to
f483143
Compare
f483143 to
4aae616
Compare
Thanks. P.S. GitHub tells me that 😉 |
In some Packet facilities the BGP peer address isn't the same as the gateway address allocated for a host. Rather, it is a loopback address that's reachable via the gateway. The Packet metadata service now exposes BGP info to hosts, so we can query the metadata service for the BGP peer address. We currently use the first peer address only since MetalLB doesn't support multiple node peers yet. The source address is explicitly specified since when the peer address is a loopback address, the source IP addresses which ends up getting selected by the kernel is the node's *public* address which doesn't work. In cases where the peer address is the gateway address there is no harm in explicitly specifying the source. Removing the `/bin/sh -c ""` wrapper in ExecStart because we no longer need a shell since now we read the peer address from an environment file. Removing the TODO about using Afterburn for the peer address since we no longer use the gateway address as the peer address. Fixes #1009.
4aae616 to
725d6fc
Compare
I know, but it's still showing the PR status in the list as "Waiting for review", which is not really correct 😄 |
|
Testing looks good. Need another LGTM. |
In some Packet facilities the BGP peer address isn't the same as the gateway address allocated for a host. Rather, it is a loopback address that's reachable via the gateway.
The Packet metadata service now exposes BGP info to hosts, so we can query the metadata service for the BGP peer address.
The source address is explicitly specified since when the peer address is a loopback address, the source IP addresses which ends up getting selected by the kernel is the node's public address which doesn't work. In cases where the peer address is the gateway address there is no harm in explicitly specifying the source.
Fixes #1009.