Multiple fixes to improve cluster ops#3337
Conversation
danielmai
left a comment
There was a problem hiding this comment.
Reviewed 7 of 7 files at r1.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain and @martinmr)
conn/node.go, line 380 at r1 (raw file):
// Exit after this deadline. Let BatchAndSendMessages create another goroutine, if needed. deadline := time.Now().Add(60 * time.Second) ticker := time.NewTicker(time.Second)
PR says the ticker is 500ms but this says 1 second?
martinmr
left a comment
There was a problem hiding this comment.
Reviewed 4 of 7 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @manishrjain)
edgraph/config.go, line 20 at r1 (raw file):
import ( "expvar"
why are the changes in this file part of this PR?
manishrjain
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)
conn/node.go, line 380 at r1 (raw file):
Previously, danielmai (Daniel Mai) wrote…
PR says the ticker is 500ms but this says 1 second?
Will update the PR description.
edgraph/config.go, line 20 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
why are the changes in this file part of this PR?
No longer need to set the conf var, I just log all the options. It was broken to begin with.
manishrjain
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)
conn/node.go, line 380 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Will update the PR description.
Done.
edgraph/config.go, line 20 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
No longer need to set the conf var, I just log all the options. It was broken to begin with.
Done.
danielmai
left a comment
There was a problem hiding this comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @martinmr)
…timeout error during conn creation on a stopped node.
… for Delta field during proposal.
Multiple fixes to improve cluster ops: 1. A delta proposal should bypass the throttle pending proposal mechanism. This is important to ensure that the latest updates from Zero are always proposed. 2. Do not retry connection to a dead server in a busy-wait loop. Instead, use a ticker (1s) to pause before a retry. On error, only log once. Return after 60s. 3. Output all the configs on start to log file. Remove a broken dgraph_conf var. Changes: * Fix the flakiness of blockade test. Was happening because of the 10s timeout error during conn creation on a stopped node. * Simplify the change by removing the Important bit. Instead just check for Delta field during proposal.
Multiple fixes to improve cluster ops: 1. A delta proposal should bypass the throttle pending proposal mechanism. This is important to ensure that the latest updates from Zero are always proposed. 2. Do not retry connection to a dead server in a busy-wait loop. Instead, use a ticker (1s) to pause before a retry. On error, only log once. Return after 60s. 3. Output all the configs on start to log file. Remove a broken dgraph_conf var. Changes: * Fix the flakiness of blockade test. Was happening because of the 10s timeout error during conn creation on a stopped node. * Simplify the change by removing the Important bit. Instead just check for Delta field during proposal.
Multiple fixes to improve cluster ops:
This change is