Skip to content

add lifecycle metrics #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 18, 2021
Merged

add lifecycle metrics #93

merged 2 commits into from
Mar 18, 2021

Conversation

tomerd
Copy link
Contributor

@tomerd tomerd commented Mar 18, 2021

motivation: startup/shutdown metrics are important for real-life services, for example spike in start metrics indicates a crash-loop

changes:

  • add start and shutdown counters to ComponentLifecycle
  • add start and shutdown timers to report duration of startup and shutdown operations

@tomerd tomerd requested review from yim-lee and fabianfett March 18, 2021 01:31
@tomerd tomerd force-pushed the feature/mettrics branch from ea7e5ef to 0dc16e9 Compare March 18, 2021 01:32
@tomerd tomerd added this to the 1.0.0-alpha.7 milestone Mar 18, 2021
@tomerd tomerd force-pushed the feature/mettrics branch 2 times, most recently from 79855f8 to 52a0c8f Compare March 18, 2021 01:41
self.state = .starting(queue)
}

self.log("starting")
Counter(label: "\(self.label).lifecycle.start").increment()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is label risky i.e. containing spaces etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good question, do you think we should normalize it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just thinking about it but I also just remembered that last time we said that the libs should sanitize, so we do e.g. in the prometheus one https://github.com/MrLotU/SwiftPrometheus/blob/f4d4adfeb1178e250a0ea427017fb4d1cc877de3/Sources/Prometheus/PrometheusMetrics.swift#L112 🤔

I guess it's fine 👍

@tomerd tomerd force-pushed the feature/mettrics branch from 52a0c8f to 28f7ef6 Compare March 18, 2021 02:07
motivation: startup/shutdown metrics are important for real-life services, for example spike in start metrics indicates a crash-loop

changes:
* add start and shutdown counters to ComponentLifecycle
* add start and shutdown timers to report duration of startup and shutdown operations
@tomerd tomerd force-pushed the feature/mettrics branch from 28f7ef6 to 39b9cd1 Compare March 18, 2021 02:09
start { error in
Timer(label: "\(self.label).\(tasks[index].label).lifecycle.start").recordNanoseconds(DispatchTime.now().uptimeNanoseconds - startTime.uptimeNanoseconds)
Copy link
Contributor

@ktoso ktoso Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be worth making some struct/enum with the labels such that it is easier to look in one place to see "aha, these are all the metric labels this lib exports"?

I do this in some other project, it is easier to know what metrics are exposed then.
Sure not all that much here but it's a nice pattern;

So either some Labels.start(base: self.label) or Metrics.timer(base: self.label) I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have func recordInterval(since: DispatchTime, now: DispatchTime = .now()) { for this type of nanosecond recording, consider using it

Copy link
Contributor Author

@tomerd tomerd Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordInterval is not define in the core library afaict, where did you see it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see but it's not in CoreMetrics hm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes... I prefer not to introduce dependency on Foundation for low level libs if we can get away with it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah though this could live in core I guess, I'll ticketify -- it's just Dispatch dep, not foundation hm

Copy link

@avolokhov avolokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graphite/statsd and Prometheus have different naming schemes (dot notation vs snake_case).
For my own education, is it a general direction to use dot notation?

@@ -18,9 +18,9 @@ import Darwin
import Glibc
#endif
import Backtrace
import CoreMetrics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional? AFAIR libraries shouldn't normally depend on CoreMetrics unless implementing a MetricFactory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CoreMetrics is useful when you do not want to pul in Foundation

shutdown { error in
Timer(label: "\(self.label).\(tasks[index].label).lifecycle.shutdown").recordNanoseconds(DispatchTime.now().uptimeNanoseconds - startTime.uptimeNanoseconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using recordinternal(since: startTime)

@tomerd
Copy link
Contributor Author

tomerd commented Mar 18, 2021

Graphite/statsd and Prometheus have different naming schemes (dot notation vs snake_case).
For my own education, is it a general direction to use dot notation?

I do not believe there is a standard for this, and for this reason our Metrics API is also is not opinionated about this: as you pointed out different system (Graphite, Prometheus) have their own conventions and constraints so it must be the "job" of the "backend" library to sanitize the labels to whatever its sensitive to, for example Prometheus is sensitive to dot notation, so its the "job" of the Prometheus "backend" library to convert dots into underscores or whatever.

@ktoso
Copy link
Contributor

ktoso commented Mar 18, 2021

Yeah agreed on the sanitization, sounds good. LGTM

@tomerd tomerd merged commit 4714c3c into swift-server:main Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants