-
Notifications
You must be signed in to change notification settings - Fork 200
Feature: implement the BackendManager list #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: implement the BackendManager list #144
Conversation
e02c3d0
to
e630bc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @charleszheng44!
One concern I have is that this the way this is currently written, all kube-apiserver -> cluster requests must use some node IP as a destination address...this seems to be a big implicit assumption and may break certain functionality.
Limiting a request to only one routing path also seems a bit fragile. If the connection to the underlying agent is broken/overloaded/etc, is there no alternative routing for the corresponding request?
Overall the PR is good, I'm just thinking in a more generalized way to see if this can support a broader set of use cases.
@Jefftree @cheftako Thanks for your comments. I think both of you agree that the current implementation is not generic enough. Once we specify a proxy strategy on the server-side, the agents have to provide required headers, e.g. I think @cheftako 's suggestion can be a good solution to this problem. We can compose a BackendManager(BM) list containing multiple BMs, and run all of them with the proxy server. Each BM can have a corresponding BackendStorage(BS) that will group agents by the corresponding header. For example, we have a list of BMs:
When an agent joins the server, based on the headers it provided, it will be added into different BSs (but will always be added into the During the runtime, proxy server will call I think this mechanism can cover most of the cases. Please let me know your thoughts. If you agree on this plan, I will wrap up the |
Hi @charleszheng44 how's the work coming? Very much like the direction your heading. Please let us know if we can help. |
Hi @cheftako, If you agree that the "BackendManager(BM) list containing multiple BM" might be a good solution, I will try to wrap up this PR by the end of this week. |
Sounds like a great solution to me. Were you thinking an ordered, comma seperated list of the BMs? Probably defaulting to just the current BM? |
@cheftako I plan to make it a comma-separated ordered list. Yes, I will make the defaulting be just the current BM. |
e630bc9
to
ba5dbe8
Compare
70e42d8
to
27dae70
Compare
cmd/agent/main.go
Outdated
@@ -70,6 +70,7 @@ type GrpcProxyAgentOptions struct { | |||
adminServerPort int | |||
|
|||
agentID string | |||
agentHost string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like this but wonder if we're short changing on the agentHost. I'm assuming this is just a single IP address and that we are running 1 agent per node. However we frequently give nodes IP ranges to assign to each pod running on that node its own individual IP address. (See https://cloud.google.com/kubernetes-engine/docs/concepts/network-overview). In addition we may have both IPv4 and IPv6 addresses for both nodes and pods. I wonder if it would be better to have a map of agent properties? Then we can easily extend this for things like exact IP match, range IP match, rang IPv6 match, failure zone match, etc. We could encode this as something like "host=foo,cidr=127.0.0.1/16,ipv6=:::::,failure-zone=us-central1-b". If we wanted to be really paranoid we could url encode it (https://en.wikipedia.org/wiki/Percent-encoding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of encoding all potential addresses into a single string. What about something like the following,
-
when starting an agent, users can specify all potential addresses by passing a string, "host=foo,cidr=127.0.0.1/16,ipv6=:::::,failure-zone=us-central1-b" through the command line,
-
the agent will validate if the the addresses' string is valid and add it to the gRPC header,
-
then, the server will deserialize the addresses from the header, and add the agent to corresponding backendmanager.
As we can put the addresses' string into the request header(we are currently storing the agentHost
and agentID
in the gRPC stream header), do we really need to do URL encoding? I am not sure if the gRPC header can hold reserved characters of the Percent-encoding, but I will try it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for both of you: How much of the information in host=foo,cidr=127.0.0.1/16,ipv6=:::::,failure-zone=us-central1-b
etc is directly obtainable when starting the konnectivity agent pod? Are these parameters directly obtainable through something like environment variables or would additional logic be needed? Theoretically kube-apiserver would have this kind of information about the nodes, but adding additional logic there seems quite complicated. I'm wondering how we can support this in the simplest way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, there may be two kinds of "agent address",
-
addresses that can be got directly when starting the konnectivity agent, like podIP(s), hostIP, etc,
-
addresses that added by user through cmd option, like,
--agent-addresses="host=foo,cidr=127.0.0.1/16,ipv6=:::::,failure-zone=us-central1-b"
, if some addresses are stored in env, we can pass them through the pod's template yaml.
And it will be the users' (or whoever starts the agent ) responsiblity to ensure the provided addresses are reachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also information which is probably category 1) but is not directly tied to the agent. So for instance things like failure zone are characteristics of the Node that the agent is running in. It can be looked up when the agent is started but its not clear to me that is when/how we will obtain that data. Its also possible it may be looked up by the (appropriate) backend manager when the agent is registered.
ac78f90
to
be1ee66
Compare
cmd/agent/main.go
Outdated
@@ -101,6 +105,7 @@ func (o *GrpcProxyAgentOptions) Flags() *pflag.FlagSet { | |||
flags.DurationVar(&o.syncInterval, "sync-interval", o.syncInterval, "The initial interval by which the agent periodically checks if it has connections to all instances of the proxy server.") | |||
flags.DurationVar(&o.probeInterval, "probe-interval", o.probeInterval, "The interval by which the agent periodically checks if its connections to the proxy server are ready.") | |||
flags.StringVar(&o.serviceAccountTokenPath, "service-account-token-path", o.serviceAccountTokenPath, "If non-empty proxy agent uses this token to prove its identity to the proxy server.") | |||
flags.StringVar(&o.agentIdentifiers, "agent-identifiers", o.agentIdentifiers, "Identifiers of the agent that will be used by the server when choosing agent. e.g.,host=localhost,host=node1.mydomain.com,cidr=127.0.0.1/16,ipv4=1.2.3.4,ipv4=5.6.7.8,ipv6=:::::") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is we have ',' and '=' as reserved characters which cannot be present in the identifiers. If we encode the value/identifier, then it is a bit more complicated to set a value but that restrictions are removed. We would need to define the encoder.
URLEncode (https://play.golang.org/p/Zx0wMq7oacr) makes things fairly easy though it would switch us to using '&' and '='. Your example would then be "cidr=127.0.0.1%2F16&host=localhost&host=node1.mydomain.com&ipv4=1.2.3.4&ipv4=5.6.7.8&ipv6=%3A%3A%3A%3A%3A"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be better if we don't need to reserve any character. But does that mean users need to pass the argument in the format of "cidr=127.0.0.1%2F16&ipv6=%3A%3A%3A%3A%3A"? Will that be a little bit user-unfriendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://play.golang.org/p/3E93xZ-i11n shows that the library will pass through unencoded parameters so "cidr=127.0.0.1/16&host=localhost&host=node1.mydomain.com&ipv4=1.2.3.4&ipv4=5.6.7.8&ipv6=:::::" will also work. I would still document the "correct" form but you can mention that simple form also works. If you mention it though I would add a unit test to make sure that a later change to the library does not break our customers.
cmd/agent/main.go
Outdated
@@ -116,6 +121,7 @@ func (o *GrpcProxyAgentOptions) Print() { | |||
klog.V(1).Infof("SyncInterval set to %v.\n", o.syncInterval) | |||
klog.V(1).Infof("ProbeInterval set to %v.\n", o.probeInterval) | |||
klog.V(1).Infof("ServiceAccountTokenPath set to %q.\n", o.serviceAccountTokenPath) | |||
klog.V(1).Infof("AgentIdentifiers set to %s.\n", o.agentIdentifiers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially if we decide to encode the data I would suggest we print each key=value pair separately (& decoded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will print out each identifier separately.
pkg/server/backend_manager.go
Outdated
func (s *DefaultBackendStorage) AddBackend(agentID string, conn agent.AgentService_ConnectServer) Backend { | ||
func (s *DefaultBackendStorage) AddBackend(identifier string, idType pkgagent.IdentifierType, conn agent.AgentService_ConnectServer) Backend { | ||
if !containIdType(s.idTypes, idType) { | ||
klog.ErrorS(&ErrWrongIDType{idType, s.idTypes}, "fail to add backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems an alarming and misleading error to log. As I understand it, it is expected that we would call AddBackend on every IdentifierType an agent presents for each BackendStorage. This means we expect some of those identifiers to not match all BackendStorage and these error message are expected and do not (normally) indicate an actual error. We should either advertise the supported backends and filter this call to prevent this from being expected. Or we should fix the level V(4)? and probably adjust the message to be less alarming "backend not using unsupported type"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will switch to klog.V(4).Infof
klog.V(2).InfoS("Register backend for agent", "connection", conn, "agentID", agentID) | ||
s.mu.Lock() | ||
defer s.mu.Unlock() | ||
_, ok := s.backends[agentID] | ||
_, ok := s.backends[identifier] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we have lost the id type, which means there is at least theoretically the possibility of a collision across types. Good news is we can always fix this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try to fix this issue in the next PR.
62bc27a
to
1ffc0f0
Compare
9715e68
to
0a935f0
Compare
/retest |
cmd/server/main.go
Outdated
@@ -141,6 +150,7 @@ func (o *ProxyRunOptions) Flags() *pflag.FlagSet { | |||
flags.StringVar(&o.agentServiceAccount, "agent-service-account", o.agentServiceAccount, "Expected agent's service account during agent authentication (used with agent-namespace, authentication-audience, kubeconfig).") | |||
flags.StringVar(&o.kubeconfigPath, "kubeconfig", o.kubeconfigPath, "absolute path to the kubeconfig file (used with agent-namespace, agent-service-account, authentication-audience).") | |||
flags.StringVar(&o.authenticationAudience, "authentication-audience", o.authenticationAudience, "Expected agent's token authentication audience (used with agent-namespace, agent-service-account, kubeconfig).") | |||
flags.StringVar(&o.proxyStrategies, "proxy-strategies", o.proxyStrategies, "The list of proxy strategies used by the server to pick a backend/tunnel. e.g., \"destHost,\"") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking but would be nice if the tool tip listed the valid strategy values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list all the valid strategy values
Seems pretty good to me. Waiting on @Jefftree review 👍 |
1c75f95
to
682693c
Compare
Please don't forget to add a follow up to test the desthost manager routing. LGTM from me :) |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: charleszheng44, cheftako The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@charleszheng44 Reiterating Walter's point, thank you so much for adding this PR! I really look forward to using it and seeing how it evolves in the future! /lgtm |
This PR implements the BackendManager list mechanism, which allows the proxy server to have multiple backend manager(BM). Each BM can have a corresponding BackendStorage(BS) that will group agents by the corresponding header.
For example, we have a list of BMs:
[DestHostBM, DestCIDRBM, DefaultBM]
All of them will use the
DefaultBackendStorage
, but theDestHostBM
(implemented in this PR)will group agents by AgentHost, i.e.,map[AgentHost]conn
, theDestCIDRBM
(will be included in an independent PR) will group agents by AgentCIDR, i.e.map[AgentCIDR]conn
, and theDefaultBM
will group agents by AgentID, usingmap[AgentID]conn
.When an agent joins the server, it can provide a list of addresses that are reachable through it and will be added into different BMs (but will always be added into the DefaultBM), say, an agent providing only AgentCIDR will not be added into the DestHostBM, but if it provides AgentHost as well as AgentCIDR, it will be added into all the three BMs.
During the runtime, proxy server will call
Backend()
method in the order of the listed BM. For example, given the list[DestHostBM, DestCIDRBM, DefaultBM]
, the server will first try to find a backend with the same host address, if not found, it will use the backend whose CIDR cover the request's host, if still not found, then the server will just choose a random backend.Take the GRPC+mTLS as an example, to enable the BackendManager list, one can run the
proxy-server
with the option--proxy-strategies "destHost,default"
and theproxy-agent
with the option--agent-addr ""host=localhost,cidr=127.0.0.1/16,ipv6=:::::,failure-zone=us-central1-b""
.