-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-3377] [Metrics] Metrics can be accidentally aggregated #2250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 2250 at commit
|
…ause the instance of SparkContext is no longer used
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
…turn null when correspondin entry is absent
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
…cture-improvement
…cture-improvement
…cture-improvement
retest this please. |
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
retest this please. |
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
…cture-improvement
…cture-improvement
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
…cture-improvement
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
Could you cleanup the changes? It's confusing to see a bunch of debugging changes were left. |
This reverts commit e4a4593.
…cture-improvement
b5c907d
to
ead8966
Compare
Sorry, now I've just cleaned up. |
QA tests have started for PR 2250 at commit
|
QA tests have finished for PR 2250 at commit
|
Can anyone review this if you have time? |
…cture-improvement
This seems like a good idea; I can see how the current behavior is confusing, especially since I think it might be common for multiple apps to be running with the same name (e.g. two copies of I'm not sure that calling The Master-assigned application ID is exposed through |
…cture-improvement
…s when using YARN cluster mode
QA tests have started for PR 2250 at commit
|
@JoshRosen Thanks for your advise. I tried to use application id for metrics name and I found there were something difficulty. Problem 1. We need application id before creating SparkEnv Problem 2. Difficult to pass application id to Executors via SparkConf So I have 2 solutions. 2nd is #2432 . And for problem 2, when launching ExecutorBackends, launcher pass application id to ExecutorBackends. It doesn't consider Mesos because MesosSchedulerBackend doesn't return application id so if we use Mesos, System.currentTimeMillis is used instead of application id. |
QA tests have finished for PR 2250 at commit
|
I feel strongly that we should use the same application ID to refer to the application in every context, since creating a different id based off of System.currentTimeMillis could be very confusing for users. As a user, I'd like to be able to grep logs / metrics / web UIs for my application data using one application id; displaying some other unique but random value is confusing because I have to compare timestamps, etc. to correlate the ids. This is tricky, though, since we have a "chicken and egg" initialization problem, as you've described. I like the approach that you've suggested in #2432, so I'm going to continue review over there. Feel free to leave this PR open, though, so that it shows up in our PR dashboard and invites discussion; it will be automatically closed if I merge your other PR. |
I'm using codahale base MetricsSystem of Spark with JMX or Graphite, and I saw following 2 problems.
(1) When applications which have same spark.app.name run on cluster at the same time, some metrics names are mixed. For instance, if 2+ application is running on the cluster at the same time, each application emits the same named metric like "SparkPi.DAGScheduler.stage.failedStages" and Graphite cannot distinguish the metrics is for which application.
(2) When 2+ executors run on the same machine, JVM metrics of each executors are mixed. For instance, 2+ executors running on the same node can emit the same named metric "jvm.memory" and Graphite cannot distinguish the metrics is from which application.
I think the main issue tried to resolve in #1067 is subsumed by this PR.
Closes #1067
Closes #2432