This repository was archived by the owner on Sep 11, 2020. It is now read-only.
Clone is slow when repo contains a lot of filesΒ #793
Closed
Description
Problem
I've noticed that PlainClone
was taking a long time in my use case.
In order to try and reproduce the problem I created a repo containing 17k files
with randomly generated data in them: https://github.com/glibsm/alphabet.
Below is the code used and the resulting performance numbers.
Any help is appreciated. Haven't dug into the go-git library code just yet, figured
I'd share the info I have first.
Sample code
package main
import (
"flag"
"fmt"
"log"
"os"
"runtime/pprof"
git "gopkg.in/src-d/go-git.v4"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
cloneOpts := &git.CloneOptions{
URL: "https://github.com/glibsm/alphabet", // repo with 17k files
Progress: os.Stdout,
}
repo, err := git.PlainClone("./repo", false, cloneOpts)
if err != nil {
log.Panic(err)
}
head, err := repo.Head()
if err != nil {
log.Panic(err)
}
fmt.Println(head)
}
Running the sample code
Counting objects: 36579, done.
Compressing objects: 100% (1423/1423), done.
Total 36579 (delta 4), reused 36574 (delta 3), pack-reused 0
978d628f9d82456dffef0bb0579b8a617750c8db refs/heads/master
./plain --cpuprofile plain.cpu 22.18s user 14.97s system 140% cpu **26.404 total**
Command line clone for speed comparison
$ time git clone https://github.com/glibsm/alphabet
Cloning into 'alphabet'...
remote: Counting objects: 36579, done.
remote: Compressing objects: 100% (1423/1423), done.
remote: Total 36579 (delta 4), reused 36574 (delta 3), pack-reused 0
Receiving objects: 100% (36579/36579), 1.56 MiB | 5.98 MiB/s, done.
Resolving deltas: 100% (4/4), done.
Checking out files: 100% (17579/17579), done.
real 0m3.569s
user 0m0.508s
sys **0m2.826s**
As we can see, the same repo clone times are 2.8 vs 26 seconds, which is quite a big gap.
CPU Profile
Top20 flat
(pprof) top20
Showing nodes accounting for 31.26s, 96.81% of 32.29s total
Dropped 218 nodes (cum <= 0.16s)
Showing top 20 nodes out of 180
flat flat% sum% cum cum%
20.34s 62.99% 62.99% 20.34s 62.99% syscall.Syscall
5.24s 16.23% 79.22% 5.24s 16.23% runtime.kevent
1.67s 5.17% 84.39% 1.67s 5.17% runtime.mach_semaphore_wait
0.84s 2.60% 86.99% 0.84s 2.60% runtime.memmove
0.75s 2.32% 89.32% 1.91s 5.92% runtime.scanobject
0.53s 1.64% 90.96% 0.53s 1.64% runtime.mach_semaphore_timedwait
0.44s 1.36% 92.32% 0.50s 1.55% runtime.heapBitsForObject
0.44s 1.36% 93.68% 0.44s 1.36% runtime.mach_semaphore_signal
0.41s 1.27% 94.95% 0.41s 1.27% runtime.usleep
0.29s 0.9% 95.85% 0.50s 1.55% runtime.greyobject
0.11s 0.34% 96.19% 1.05s 3.25% os.(*file).close
0.09s 0.28% 96.47% 2.64s 8.18% runtime.gcDrain
0.02s 0.062% 96.53% 0.19s 0.59% runtime.mallocgc
0.02s 0.062% 96.59% 6.58s 20.38% gopkg.in/src-d/go-git%2ev4.(*Worktree).resetIndex
0.02s 0.062% 96.66% 0.32s 0.99% gopkg.in/src-d/go-git.v4/plumbing/format/index.(*Index).Remove
0.01s 0.031% 96.69% 7.11s 22.02% os.openFileNolog
0.01s 0.031% 96.72% 0.67s 2.07% runtime.gcMarkDone
0.01s 0.031% 96.75% 3.53s 10.93% runtime.systemstack
0.01s 0.031% 96.78% 8.85s 27.41% gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChangeRegularFile
0.01s 0.031% 96.81% 0.24s 0.74% gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*teeReader).ReadByte
Top 20 cumulative
(pprof) top20 -cum
Showing nodes accounting for 20.37s, 63.08% of 32.29s total
Dropped 218 nodes (cum <= 0.16s)
Showing top 20 nodes out of 180
flat flat% sum% cum cum%
0 0% 0% 22.55s 69.84% main.main
0 0% 0% 22.55s 69.84% runtime.main
0 0% 0% 22.55s 69.84% gopkg.in/src-d/go-git%2ev4.(*Repository).clone
0 0% 0% 22.55s 69.84% gopkg.in/src-d/go-git%2ev4.PlainClone
0 0% 0% 22.55s 69.84% gopkg.in/src-d/go-git%2ev4.PlainCloneContext
0 0% 0% 21.68s 67.14% gopkg.in/src-d/go-git%2ev4.(*Worktree).Reset
20.34s 62.99% 62.99% 20.34s 62.99% syscall.Syscall
0 0% 62.99% 15.10s 46.76% gopkg.in/src-d/go-git%2ev4.(*Worktree).resetWorktree
0.01s 0.031% 63.02% 13.64s 42.24% gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).EncodedObject
0 0% 63.02% 12.48s 38.65% gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChange
0 0% 63.02% 12.06s 37.35% gopkg.in/src-d/go-git.v4/plumbing/object.(*Tree).FindEntry
0 0% 63.02% 12.06s 37.35% gopkg.in/src-d/go-git.v4/plumbing/object.(*Tree).dir
0 0% 63.02% 9.59s 29.70% gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).getFromPackfile
0.01s 0.031% 63.05% 8.85s 27.41% gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChangeRegularFile
0.01s 0.031% 63.08% 7.34s 22.73% gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).decodeObjectAt
0 0% 63.08% 7.23s 22.39% internal/poll.(*FD).Seek
0 0% 63.08% 7.23s 22.39% os.(*File).Seek
0 0% 63.08% 7.23s 22.39% os.(*File).seek
0 0% 63.08% 7.23s 22.39% gopkg.in/src-d/go-billy.v4/helper/chroot.(*file).Seek
Go-git sha
The version used in the sample code
- name: gopkg.in/src-d/go-git.v4
version: e9247ce9c5ce12126f646ca3ddf0066e4829bd14