Skip to content

Akka.Discovery.Dns #3365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: dev
Choose a base branch
from
Open

Akka.Discovery.Dns #3365

wants to merge 39 commits into from

Conversation

anpin
Copy link

@anpin anpin commented Jul 3, 2025

Discovery via DNS

  • A/AAAA records resolved via default inet-resolver
  • SRV records are resolved via async-dns (name inherited from JVM Akka)

Fixes #3364

Changes

  • New IDnsProvider for async-dns resolver to issue direct SRV queries as underlying dotnet implementation resolves only A/AAAA
  • Hosting extensions to configure
  • Examples for docker-compose
  • Fix for IPv6 formatting in BootstrapCordinator

Checklist

For significant changes, please ensure that the following have been completed (delete if not relevant):

Latest dev Benchmarks

Include data from the relevant benchmark prior to this change here.

This PR's Benchmarks

Include data from after this change here.

@anpin anpin force-pushed the feature/dns branch 4 times, most recently from 8f61cdd to 9bf29c4 Compare July 8, 2025 19:13
@anpin
Copy link
Author

anpin commented Jul 8, 2025

I managed to get the DNS cluster example working, for both A and SRV records. As mentioned in previous issues SRV required to implement custom resolve handler.

@Aaronontheweb
Copy link
Member

@anpin VERY nice! is this ready for review or are you still working on it?

@anpin
Copy link
Author

anpin commented Jul 9, 2025

Thanks. Still working on it. Main concerns for now:

  • new SRV resolver doesn't have unit tests yet
  • Not sure how to properly instantiate new DNS provider. In pekko DnsProvider marked as obsolete. Akka.Net relies on DnsExt, but it doesn't respect IDnsProvider.ManagerClass and always uses hard-coded SimpleDnsManager. I think this should be fixed in main Akka.Net repo.

@Aaronontheweb
Copy link
Member

think this should be fixed in main Akka.Net repo.

100% - honestly, I'm not even sure how much Akka.IO's DNS code is even used. I just spent a bunch of time in April / May purging tons of poorly designed older code from the TCP stack in Akka.IO. Wouldn't surprise me if the DNS stack was full of rot too.

@anpin
Copy link
Author

anpin commented Jul 9, 2025

Well it seems that TcpOutgoingConnection is using dotnet dns client these days.
https://github.com/akkadotnet/akka.net/blob/c10cfc16d25879a9db3ce5f9de1a3e1074e0401f/src/core/Akka/IO/TcpOutgoingConnection.cs#L108

However as pointed out in the previous discussion, dotnet dns client is not capable of resolving SRV records, unlike JVM counterpart

@Aaronontheweb
Copy link
Member

@anpin merged your PR on the main Akka.NET project - are you blocked from working on this until we do a new release of that or are you good to go for now?

@anpin
Copy link
Author

anpin commented Jul 11, 2025

no pressure on my end for a new release. thanks @Aaronontheweb

@anpin
Copy link
Author

anpin commented Jul 14, 2025

Fixed examples and added tests to resolve some publicly known records. I think it is close to be considered for review.

image

@anpin
Copy link
Author

anpin commented Jul 14, 2025

Couldn't find a way to unit test TCP fallback for SRV requests, maybe akka/pekko tests have some clues

@anpin
Copy link
Author

anpin commented Jul 14, 2025

Above mentioned PR akkadotnet/akka.net#7727 not present on nuget feed yet, so the tests in CI are still trying to use SimpleDnsManager

@anpin
Copy link
Author

anpin commented Jul 15, 2025

looked at the jvm hocon again and noticed that a few major features were overlooked, e.g. multiple nameservers

@anpin
Copy link
Author

anpin commented Jul 15, 2025

Trying to implement caching now, existing SimpleDnsCache can't be reused here as SimpleDnsCache.Put method and IPeriodicCacheCleanup are internal. Any idea how to move forward @Aaronontheweb?

Edit: Actually existing SimpleDnsCache cache is not a great fit for SRV records as cached instances use IPAddress class, without a port. For SRV records cache using EndPoint would be more appropriate.

@anpin anpin marked this pull request as ready for review July 17, 2025 12:35
@anpin
Copy link
Author

anpin commented Jul 18, 2025

seems like some tests in per-existing code are unreliable on linux

@Aaronontheweb
Copy link
Member

seems like some tests in per-existing code are unreliable on linux

yeah some of the K8s specs are flaky - we'll get to work on reviewing this. thanks for all of your hard work @anpin

@Arkatufus
Copy link
Contributor

Question, there's a public static object in AsyncDnsClient called TcpDropped, it is being sent to the actor parent when there is a problem with the TCP connection. Where is this message being handled?

@Arkatufus
Copy link
Contributor

Comparing to the scala code, this is supposed to be handled by the DnsClient to cleanup its inflight TCP request list

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some problems with the current implementation.

@anpin
Copy link
Author

anpin commented Jul 22, 2025

Switched query ID from short to int and added lookup to verify ID doesn't exists in the collection.

Refactored TcpDropped message and added a handle.

I'm on a very unreliable network today, so got to test those TCP failure cases while correcting the code. Made the TCP failures reply back with error on connection error and fail discovery before timeout occurs, for untruncated UDP requests the client will still fail on timeout.

@Arkatufus please let me know if I can improve anyhow.

@anpin anpin requested a review from Arkatufus July 24, 2025 11:56
Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed parts so far:

  • DNS UDP/TCP packet binary serializer/deserializer, everything looks good.
  • General DNS client actor flow, looks good.

Would love to have more people to look over this, but it looks good to me.

@Aaronontheweb
Copy link
Member

Thanks @Arkatufus - I'll take a stab at this in the next day or two

@Aaronontheweb
Copy link
Member

Gonna ask CoPilot to review it too - why not?

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces DNS-based service discovery support to Akka.NET Management, enabling cluster bootstrap through DNS A/AAAA records and SRV records. This new capability allows Akka clusters to discover nodes using DNS infrastructure without requiring specialized service discovery systems.

  • Implements a complete DNS discovery provider with async DNS client supporting both standard DNS (A/AAAA) and SRV record resolution
  • Adds IPv6 support enhancement to BootstrapCoordinator for proper URI formatting
  • Provides comprehensive examples and documentation for various DNS record types

Reviewed Changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 9 comments.

File Description
src/management/Akka.Management/Cluster/Bootstrap/Internal/BootstrapCoordinator.cs Fixes IPv6 address formatting in URI construction
src/discovery/dns/Akka.Discovery.Dns/*.cs Complete DNS service discovery implementation including async DNS client, protocol handling, and caching
src/discovery/dns/Akka.Discovery.Dns.Tests/*.cs Unit tests for DNS discovery functionality
src/cluster.bootstrap/examples/discovery/dns/* Docker-compose examples demonstrating A, AAAA, and SRV record-based discovery
Comments suppressed due to low confidence (1)

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried running the demo locally and got this:

[+] Running 6/6
 ✔ node1                    Built                                                                                                                                                                                                        0.0s  ✔ Network src_akkanet      Created                                                                                                                                                                                                      0.1s  ✔ Container src-coredns-1  Created                                                                                                                                                                                                      0.2s  ✔ Container src-node1-1    Created                                                                                                                                                                                                      0.2s  ✔ Container src-node3-1    Created                                                                                                                                                                                                      0.2s  ✔ Container src-node2-1    Created                                                                                                                                                                                                      0.3s Attaching to coredns-1, node1-1, node2-1, node3-1
coredns-1  | .:1053
coredns-1  | [INFO] plugin/reload: Running configuration SHA512 = ff9cb9f62c926ec7a3b0d3ee6a542fbe41139b95a7f80cd26dd277e714af89d68c326d83caf0302723cfd0a3ffb7faf4d55ee9220a9e6ac89018003ded3abfef
coredns-1  | CoreDNS-1.10.1
coredns-1  | linux/amd64, go1.20, 055b2c3
node1-1    | ==== AKKA DNS CLUSTER NODE STARTING ====
node1-1    | Hostname: node1.akkacluster
node1-1    | IP addresses: 172.28.0.10
node1-1    | Environment variables:
node1-1    |   CLUSTER__PORT: 4053
node1-1    |   CLUSTER__IP: 0.0.0.0
node1-1    |   MANAGEMENT__PORT: 18558
node1-1    |   ACTORSYSTEM: DnsCluster
node1-1    |   SERVICENAME: akkacluster.dns.oci
node1-1    |   PORTNAME: management
node2-1    | ==== AKKA DNS CLUSTER NODE STARTING ====
node3-1    | ==== AKKA DNS CLUSTER NODE STARTING ====
node1-1    |   DNS_PORT: 1053
node2-1    | Hostname: node2.akkacluster
node1-1 exited with code 9
node3-1    | Hostname: node3.akkacluster
node1-1    |   DNS_NAMESERVER: 172.28.0.2
node2-1    | IP addresses: 172.28.0.20
node3-1    | IP addresses: 172.28.0.30
node1-1    | ===================================
node2-1    | Environment variables:
node3-1    | Environment variables:
node2-1 exited with code 9
node1-1    | \nPerforming DNS resolution test for 'akkacluster.dns.oci'...
node2-1    |   CLUSTER__PORT: 4053
node3-1    |   CLUSTER__PORT: 4053
node1-1    | ;; communications error to 127.0.0.11#1053: connection refused
node2-1    |   CLUSTER__IP: 0.0.0.0
node3-1    |   CLUSTER__IP: 0.0.0.0
node1-1    | ;; communications error to 127.0.0.11#1053: connection refused
node3-1 exited with code 9
node2-1    |   MANAGEMENT__PORT: 28558
node3-1    |   MANAGEMENT__PORT: 38558
node1-1    | ;; communications error to 127.0.0.11#1053: connection refused
node2-1    |   ACTORSYSTEM: DnsCluster
node3-1    |   ACTORSYSTEM: DnsCluster
node1-1    |
node2-1    |   SERVICENAME: akkacluster.dns.oci
node3-1    |   SERVICENAME: akkacluster.dns.oci
node1-1    | ; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> -p 1053 akkacluster.dns.oci
node2-1    |   PORTNAME: management
node3-1    |   PORTNAME: management
node1-1    | ;; global options: +cmd
node2-1    |   DNS_PORT: 1053
node3-1    |   DNS_PORT: 1053
node1-1    | ;; no servers could be reached
node2-1    |   DNS_NAMESERVER: 172.28.0.2
node3-1    |   DNS_NAMESERVER: 172.28.0.2
node2-1    | ===================================
node3-1    | ===================================
node2-1    | \nPerforming DNS resolution test for 'akkacluster.dns.oci'...
node3-1    | \nPerforming DNS resolution test for 'akkacluster.dns.oci'...
node2-1    | ;; communications error to 127.0.0.11#1053: connection refused
node3-1    | ;; communications error to 127.0.0.11#1053: connection refused
node2-1    | ;; communications error to 127.0.0.11#1053: connection refused
node3-1    | ;; communications error to 127.0.0.11#1053: connection refused
node2-1    | ;; communications error to 127.0.0.11#1053: connection refused
node3-1    | ;; communications error to 127.0.0.11#1053: connection refused
node2-1    |
node3-1    |
node2-1    | ; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> -p 1053 akkacluster.dns.oci
node3-1    | ; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> -p 1053 akkacluster.dns.oci
node2-1    | ;; global options: +cmd
node3-1    | ;; global options: +cmd
node2-1    | ;; no servers could be reached
node3-1    | ;; no servers could be reached

It looks like all of the nodes killed themselves after a DNS failure:

image

I'm running on Windows, but the same thing happened to me when I ran this on a Linux box too. What's supposed to happen when we run the sample?

- **A/AAAA records**: Standard DNS address records that return IP addresses
- **SRV records**: Service records that provide both IP addresses and port information

## Enabling DNS Discovery Using Akka.Hosting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question from yours, truly: so in order for this to work, do we need a DNS server that is writeable and accessible somewhere inside the network? I'm totally unfamiliar with how to do this outside of using tools like Pulumi to modify DNS records on specific cloud providers - so if I wanted to run this on a bare metal setup, would I need to stand up something like CoreDNS or PiHole?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @anpin

@anpin
Copy link
Author

anpin commented Jul 30, 2025

All three examples are running fine on my linux machine, but I will test it in another environment. Attaching logs produced as such:

cd ~/projects/Akka.Management/src/cluster.bootstrap/examples/discovery/dns/
./build.ps1 a > a.log
./build.ps1 aaaa > aaaa.log
./build.ps1 srv > srv.log

srv.log
a.log
aaaa.log

@anpin
Copy link
Author

anpin commented Jul 30, 2025

I had a rough time running docker on my windows VM and I don't have a physical windows machine I can test it on right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discovery via DNS
3 participants