Skip to content

Conversation

cskiraly
Copy link
Contributor

@cskiraly cskiraly commented Apr 12, 2025

Our metrics related to dial errors were off. The original error was not wrapped, so the caller function had no chance of picking it up. Therefore the most common error, which is "TooManyPeers", was not correctly counted.

The metrics were originally introduced in #27621

I was thinking of various possible solutions.

  • the one proposed here wraps both the new error and the origial error. It is not a pattern we use in other parts of the code, but works. This is maybe the smallest possible change.
  • as an alternate, I could write a proper errProtoHandshakeError with it's own wrapped error
  • finally, I'm not even sure we need errProtoHandshakeError, maybe we could just pass up the original error.

The original error was not wrapped, so the caller function had
no chance of picking it up.

Signed-off-by: Csaba Kiraly <[email protected]>
@cskiraly cskiraly requested a review from lightclient April 12, 2025 13:21
@cskiraly cskiraly marked this pull request as ready for review April 13, 2025 13:19
@cskiraly cskiraly requested review from fjl and zsfelfoldi as code owners April 13, 2025 13:20
@@ -67,23 +67,22 @@ func markDialError(err error) {
if !metrics.Enabled() {
return
}
if err2 := errors.Unwrap(err); err2 != nil {
Copy link
Contributor

@fjl fjl Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we really want here is

var reason DiscReason
if errors.As(err, &reason) {
    switch reason {
    ...
    }
}
var phe *protoHandshakeError
if errors.As(err, &phe) {
    dialProtoHandshakeError.Mark(1)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable, and also cleaner. I will verify how it works, and come back with a patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dialProtoHandshakeError would have to be in an else branch, and we would still need to handle dialEncHandshakeError separately.
I would say the original with Is and As is better. I've added a catch-all and some comments to make it even cleaner.

cskiraly and others added 5 commits April 15, 2025 02:21
We can't pass errProtoHandshakeError here, because the framework uses
`Is` for testing, and that would need `As`. We can however test how
it works for internal errors, which is the most typical case.

Signed-off-by: Csaba Kiraly <[email protected]>
Signed-off-by: Csaba Kiraly <[email protected]>
@fjl fjl added this to the 1.15.9 milestone Apr 15, 2025
@cskiraly
Copy link
Contributor Author

LGTM

@fjl fjl merged commit 6928ec5 into ethereum:master Apr 15, 2025
3 of 4 checks passed
sivaratrisrinivas pushed a commit to sivaratrisrinivas/go-ethereum that referenced this pull request Apr 21, 2025
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
ethereum#27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
0g-wh pushed a commit to 0glabs/0g-geth that referenced this pull request Apr 22, 2025
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
ethereum#27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
jakub-freebit pushed a commit to fblch/go-ethereum that referenced this pull request Jul 3, 2025
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
ethereum#27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
jakub-freebit pushed a commit to fblch/go-ethereum that referenced this pull request Jul 23, 2025
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
ethereum#27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
howjmay pushed a commit to iotaledger/go-ethereum that referenced this pull request Aug 27, 2025
Our metrics related to dial errors were off. The original error was not
wrapped, so the caller function had no chance of picking it up.
Therefore the most common error, which is "TooManyPeers", was not
correctly counted.

The metrics were originally introduced in
ethereum#27621

I was thinking of various possible solutions.
- the one proposed here wraps both the new error and the origial error.
It is not a pattern we use in other parts of the code, but works. This
is maybe the smallest possible change.
- as an alternate, I could write a proper `errProtoHandshakeError` with
it's own wrapped error
- finally, I'm not even sure we need `errProtoHandshakeError`, maybe we
could just pass up the original error.

---------

Signed-off-by: Csaba Kiraly <[email protected]>
Co-authored-by: Felix Lange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants