Skip to content

Training with continue and fork mode terminated due to unhandled system error #1017

@drremo1

Description

@drremo1

Hello, I have recently installed wav2letter v0.2 on ubuntu 18.04. I am now trying to continue training with the pretrained dev-clean transformer models from sota/2019 recipe for only 1 epoch. However, the training won't start and it immediately gets terminated showing these errors:

I0401 05:59:19.868680 17296 Train.cpp:80] Parsing command line flags
I0401 05:59:19.868815 17296 Train.cpp:81] Overriding flags should be mutable when using `continue`
I0401 05:59:19.868882 17296 Train.cpp:85] Reading flags from file /mnt/d/198/train.cfg
terminate called after throwing an instance of 'std::runtime_error'
  what():  unhandled system error
*** Aborted at 1680299961 (unix time) try "date -d @1680299961" if you are using GNU date ***
PC: @     0x7f5e92f1ce87 gsignal
*** SIGABRT (@0x3e800004390) received by PID 17296 (TID 0x7f5ec06ac380) from PID 17296; stack trace: ***
    @     0x7f5ebf583980 (unknown)
    @     0x7f5e92f1ce87 gsignal
    @     0x7f5e92f1e7f1 abort
    @     0x7f5e93911957 (unknown)
    @     0x7f5e93917ae6 (unknown)
    @     0x7f5e93917b21 std::terminate()
    @     0x7f5e93917d54 __cxa_throw
    @     0x55cf42b5c6f8 fl::detail::ncclCheck()
    @     0x55cf42b5ddd7 fl::distributedInit()
    @     0x55cf42acb387 w2l::initDistributed()
    @     0x55cf4283eab2 main
    @     0x7f5e92effc87 __libc_start_main
    @     0x55cf428a7e4a _start
Aborted

This happens also happens when I try it with fork.

This error was obtained by running this:

wav2letter/build/Train continue /mnt/d/198 --flagsfile /mnt/d/198/train.cfg --logtostderr=1 --minloglevel=0 --rndv_filepath=

At first I thought it was the flagsfile but removing it from the command line gives the same error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions