Skip to content

feat(metrics): Add Raft leadership metrics.#7338

Merged
karlmcguire merged 17 commits intomasterfrom
danielmai/metrics-raft
Jan 25, 2021
Merged

feat(metrics): Add Raft leadership metrics.#7338
karlmcguire merged 17 commits intomasterfrom
danielmai/metrics-raft

Conversation

@danielmai
Copy link
Copy Markdown
Contributor

@danielmai danielmai commented Jan 19, 2021

This adds Raft leadership metrics for both Alphas and Zeros.

  • dgraph_raft_has_leader
  • dgraph_raft_is_leader
  • dgraph_raft_leader_changes_total

Changes:

  • Add group label to each Raft metric. For example, this makes it easy to see from the metric dgraph_raft_is_leader{group="1",instance="alpha1:8180"} 1 that alpha1 is the leader of group 1.

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 4 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)


dgraph/cmd/zero/raft.go, line 870 at r2 (raw file):

				}
				if rd.SoftState.Lead != raft.None {
					ostats.Record(context.Background(), x.RaftHasLeader.M(1))

Add group="0" tags for Zero too, so PromQL queries can find Raft metrics based on group 0 easily. e.g.,

dgraph_raft_leader_changes_seen_total{group="0"}
dgraph_raft_leader_changes_seen_total{group="1"}
dgraph_raft_leader_changes_seen_total{group="2"}
...

worker/draft.go, line 1063 at r2 (raw file):

	// lastLead is for detecting leadership changes
	lastLead := uint64(math.MaxUint64)

You can already get the current leadership information and compare that to what's received in the soft state. You can see how the leaderChanges metric is tracked in etcd raft: https://github.com/etcd-io/etcd/blob/01844fd2856016c488fd0f8974252a0070f277ae/server/etcdserver/raft.go#L178-L181


worker/draft_test.go, line 118 at r2 (raw file):

}

func TestRaftMetrics(t *testing.T) {

You can remove this test and add the metrics to TestMetrics, so that we can verify that the metrics are getting exposed properly with the correct names.


x/metrics.go, line 121 at r2 (raw file):

RaftLeaderChangesSeenTotal

Rename this to RaftLeaderChanges. And rename the metric to raft_leader_changes_total.

@karlmcguire
Copy link
Copy Markdown
Contributor

I looked around for something similar to etcd's raftReadyHandler.getLeader() function that stores the "current" or "previous" leader, but we don't have anything like that. So, I left the lastLeader variable in the function and added a note. Let me know if that's not sufficient.

Copy link
Copy Markdown
Contributor Author

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r3.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @manishrjain and @vvbalaji-dgraph)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants