Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Conversation

alcortesm
Copy link
Contributor

Sometimes, the TestClone and TestPush tests of transport/file fail
in travis.

This is due to a race condition caused by an incorrect usage of
Cmd.Wait() while reading from the output and error pipes of the
command.

This patch fixes the problem by using Cmd.CombinedOutput() instead of
calling Cmd.Start() and Cmd.Wait() while reading from the output and
error pipes.

Details:

From the exec package documentation:

Wait will close the pipe after seeing the command exit, so most callers
need not close the pipe themselves; however, an implication is that it
is incorrect to call Wait before all reads from the pipe have completed.
For the same reason, it is incorrect to call Run when using StdoutPipe.

In our tests, the old execAndGetOutput function was creating two
gorutines to read from the stderr and stdout pipes of the command and
then call Wait on the command.

This caused a race condition: when the Wait call finished before the
gorutines have read from the pipes, they returned caused an error on
outErr.

The problem only happens sometimes on travis.

To reproduce the problem locally, just add a call to
time.Sleep(time.Second) to the gorutine before its ioutil.ReadAll call
to delay them, then Wait will always finish before them, closing the
pipes, and the gorutines will fail. The returned error detected by the
test will be:

FAIL: server_test.go:55: ServerSuite.TestClone
server_test.go:65:
    c.Assert(err, IsNil, Commentf("STDOUT:\n%s\nSTDERR:\n%s\n", stdout, stderr))
    ... value *os.PathError = &os.PathError{Op:"read", Path:"|0", Err:0x9} ("read |0: bad file descriptor")
    ... STDOUT:
    STDERR:

Sometimes, the `TestClone` and `TestPush` tests of `transport/file` fail
in travis.

This is due to a race condition caused by an incorrect usage of
`Cmd.Wait()` while reading from the output and error pipes of the
command.

This patch fixes the problem by using `Cmd.CombinedOutput()` instead of
calling `Cmd.Start()` and `Cmd.Wait()` while reading from the output and
error pipes.

Details:

From the `exec` package documentation:

```
Wait will close the pipe after seeing the command exit, so most callers
need not close the pipe themselves; however, an implication is that it
is incorrect to call Wait before all reads from the pipe have completed.
For the same reason, it is incorrect to call Run when using StdoutPipe.
```

In our tests, the old `execAndGetOutput` function was creating two
gorutines to read from the stderr and stdout pipes of the command and
then call `Wait` on the command.

This caused a race condition: when the `Wait` call finished before the
gorutines have read from the pipes, they returned caused an error on
`outErr`.

The problem only happens sometimes on travis.

To reproduce the problem locally, just add a call to
time.Sleep(time.Second) to the gorutine before its `ioutil.ReadAll` call
to delay them, then `Wait` will always finish before them, closing the
pipes, and the gorutines will fail.  The returned error detected by the
test will be:

```
FAIL: server_test.go:55: ServerSuite.TestClone
server_test.go:65:
    c.Assert(err, IsNil, Commentf("STDOUT:\n%s\nSTDERR:\n%s\n", stdout, stderr))
    ... value *os.PathError = &os.PathError{Op:"read", Path:"|0", Err:0x9} ("read |0: bad file descriptor")
    ... STDOUT:
    STDERR:
```
@codecov-io
Copy link

Codecov Report

Merging #267 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #267   +/-   ##
=======================================
  Coverage   76.54%   76.54%           
=======================================
  Files         111      111           
  Lines        7559     7559           
=======================================
  Hits         5786     5786           
  Misses       1148     1148           
  Partials      625      625

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a69e1cf...a10759a. Read the comment docs.

@alcortesm alcortesm merged commit f5a9c7e into src-d:master Feb 15, 2017
@alcortesm alcortesm deleted the fix-race-condition-on-transport-file-test branch February 15, 2017 11:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants