Skip to content

Conversation

gmelodie
Copy link

@gmelodie gmelodie commented Jun 5, 2025

This is an attempt to formalize the spec for AutoTLS client as announced in https://blog.libp2p.io/autotls/

Note: @kaiserd noted that AutoTLS might be using DANE, but I need some input since I couldn't find any work explicitly linking the two.

@MarcoPolo
Copy link
Contributor

Thanks for writing this up! pinging @aschmahmann and @lidel for a heads up. I'll also take a look at this soon™

Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some basic errors found while skimming

@gmelodie gmelodie requested a review from SgtPooki June 5, 2025 19:50
@p-shahi p-shahi requested a review from aschmahmann June 6, 2025 22:24
@kaiserd
Copy link

kaiserd commented Jul 21, 2025

There has not been much feedback in a while.

Imo, this doc ready to be merged as the first version of the AutoTLS spec.
It will streamline implementing AutoTLS significantly, compared to the info that is currently available.
We can improve in a follow up.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmelodie @kaiserd Do we have a third-party implementation of this spec other than the two created by Shipyard team?

Personally I would prefer to wait until someone dogfoods this spec (Shipyard wrote things without this PR):

As for the spec proposed in this document, some missing things:

  • guidance on sane timeouts and exponential backoff
  • IPv6

(details inline)

ps. AutoTLS is not using DANE (spec does not claim that, but the PR description mentions it).


**Note:** `varint` is a protobuf [varint](https://protobuf.dev/programming-guides/encoding/#varints) field that encodes the length of each of the `key=value` string.

**Note:** The node SHOULD include only multiaddresses containing public IPv4 addresses in `multiaddrs`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc there is more this (IPv6, filtering):

  • The node MUST include only publicly reachable addresses
  • For IPv4: exclude private ranges
  • For IPv6: exclude private ranges and NAT64 translated addresses
  • Exclude relay addresses (containing /p2p-circuit)


**Note:** The node SHOULD NOT send more than `max_dns_retries` DNS requests.
After `max_dns_timeout`, the communication is considered failed.
What to do after `max_dns_timeout` has passed is left as an implementation decision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imo the spec should clearly say that at this point the certificate request flow SHOULD be aborted and retried later.

(Do not attempt to ask ACME for cert if DNS TXT record is not confirmed to exist – we don't want to hammer Let's Encrypt with obviously failing requests)


**Note:** The node SHOULD NOT send more than `max_acme_poll_retries` poll requests to the ACME server.
After `max_acme_timeout`, the communication has failed.
What to do after `max_acme_timeout` has passed is left as an implementation decision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar ask: spec should clearly note that there should be exponential backoff for retry attempts.


| Parameter | Description | Reasonable Default |
|--------------------------|------------------------------------------------------------------|--------------|
| `max_dns_retries` | The maximum number of DNS queries that the node SHOULD make before giving up | ??? |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p2p-forge/client uses time-based timeout (3 minutes) with exponential backoff rather than fixed retry counts

| Parameter | Description | Reasonable Default |
|--------------------------|------------------------------------------------------------------|--------------|
| `max_dns_retries` | The maximum number of DNS queries that the node SHOULD make before giving up | ??? |
| `max_dns_timeout` | The maximum number of seconds a node SHOULD wait for DNS records to be set | ??? |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc p2p-forge/client gives up waiting for TXT record after 3min I think -- at this point something went wrong and the entire registration needs to be resumed

|--------------------------|------------------------------------------------------------------|--------------|
| `max_dns_retries` | The maximum number of DNS queries that the node SHOULD make before giving up | ??? |
| `max_dns_timeout` | The maximum number of seconds a node SHOULD wait for DNS records to be set | ??? |
| `max_acme_poll_retries` | The maximum number of GET requests that the node SHOULD issue to ACME server before giving up | ??? |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACME specs recommend polling until timeout rather than fixed retry counts

| `max_dns_retries` | The maximum number of DNS queries that the node SHOULD make before giving up | ??? |
| `max_dns_timeout` | The maximum number of seconds a node SHOULD wait for DNS records to be set | ??? |
| `max_acme_poll_retries` | The maximum number of GET requests that the node SHOULD issue to ACME server before giving up | ??? |
| `max_acme_timeout` | The maximum number of seconds a node SHOULD wait for an ACME resource status to change | ??? |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's Encrypt recommendation for challenge completion is ~10minutes, p2p-forge client aborts sooner, around 3 minutes I think

@gmelodie
Copy link
Author

Do we have a third-party implementation of this spec other than the two created by Shipyard team?

Yes! There's the nim-libp2p implementation

(bare minimum) someone actually tries to implement AutoTLS client independently, by only following this document, providing feedback if spec is complete

We actually wrote the document based on the implementation that we did.

ps. AutoTLS is not using DANE (spec does not claim that, but the PR description mentions it).

Thanks for clarifying!

@lidel can you check again please?

@gmelodie gmelodie requested a review from lidel August 20, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

5 participants