Description
Hello,
The code below is a minimal reproducible example that shows the behavior (the original code came up in an application I was writing). Sorry it is a bit convoluted, but on my machine it needed to be so to reproduce the error.
On julia 0.5
the code runs without running out of memory, but not on 0.6
addprocs(16)
@everywhere using DistributedArrays
function test_gc(dA,dlA)
@sync for ip in procs(dA)
@spawnat ip begin @time begin
localpart(dlA)[:,:,ip]+=2localpart(dA)[:,:,1]+svd(localpart(dA)[:,:,2])[1][1]+2convert(Array,dA[1:size(localpart(dA),1),1:size(localpart(dA),2),3])
localpart(dlA)[:,:,ip+1]+=3localpart(dA)[:,:,1]+svd(localpart(dA)[:,:,5])[1][1]+4convert(Array,dA[1:size(localpart(dA),1),1:size(localpart(dA),2),3])
localpart(dlA)[:,:,ip+2]+=4localpart(dA)[:,:,1]+(localpart(dA)[:,:,7])+5convert(Array,dA[1:size(localpart(dA),1),1:size(localpart(dA),2),3])
end
end
end
end
n1,n2,n3=2001,2001,701
for i=1:16
dA=drand((n1,n2,n3),workers()[1:i]);dlA=similar(dA);
println(i);
@time test_gc(dA,dlA);
d_closeall();
@everywhere gc()
end
I monitored the memory usage using top
and what happens is the following:
1- The total memory should be the same (about 60% on a 64 Gb node), split in i
procs
2- Each function call creates temporary arrays that need to be garbage collected
3- As the number of procs increases, the code runs faster as one would hope
4- For some reason, in 0.5
the memory de-allocation and garbage collection is faster than 0.6
5- As a result, as memory is allocated for run i
, residual memory from runs i-1,i-2,...
is still being deallocated
6- Code runs out of memory...
I am not sure if this is expected behavior, or why 0.5
was more robust.
p.s: I am on master
for DistributedArrays
Cheers!