Bug Description
When running an ED2 model, start_model_runs() sometimes fails to copy some ensembles in the run directory to a remote host (HPC). One possible reason for this this is that rsync is currently being run inside of a for-loop and maybe there are some limits to how many connections to the server are open or how often connections can be made. It'll be more efficient to just rsync all the ensemble files over at once outside of a for-loop anyways, even if it doesn't fix this bug.
It's either happening here:
|
PEcAn.remote::remote.copy.to( |
|
host = settings$host, |
|
src = file.path(settings$rundir, run_id_string), |
|
dst = settings$host$rundir, |
|
delete = TRUE) |
|
} |
Or maybe here (can't remember)
|
out <- PEcAn.remote::start_qsub( |
|
run = run, |
|
qsub_string = settings$host$qsub, |
|
rundir = settings$rundir, |
|
host = settings$host, |
|
host_rundir = settings$host$rundir, |
|
host_outdir = settings$host$outdir, |
|
stdout_log = "stdout.log", |
|
stderr_log = "stderr.log", |
|
job_script = "job.sh") |
To Reproduce
difficult to reproduce, sorry.
Expected behavior
All files for ensemble runs should be copied over and if they can't be, there should be an informative warning or error.
Additional context
Add any other context about the problem here.
Bug Description
When running an ED2 model,
start_model_runs()sometimes fails to copy some ensembles in the run directory to a remote host (HPC). One possible reason for this this is thatrsyncis currently being run inside of a for-loop and maybe there are some limits to how many connections to the server are open or how often connections can be made. It'll be more efficient to justrsyncall the ensemble files over at once outside of a for-loop anyways, even if it doesn't fix this bug.It's either happening here:
pecan/base/workflow/R/start_model_runs.R
Lines 97 to 102 in f5194f8
Or maybe here (can't remember)
pecan/base/workflow/R/start_model_runs.R
Lines 132 to 141 in f5194f8
To Reproduce
difficult to reproduce, sorry.
Expected behavior
All files for ensemble runs should be copied over and if they can't be, there should be an informative warning or error.
Additional context
Add any other context about the problem here.