Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Clone is slow when repo contains a lot of filesΒ #793

Closed
@glibsm

Description

@glibsm

Problem

I've noticed that PlainClone was taking a long time in my use case.

In order to try and reproduce the problem I created a repo containing 17k files
with randomly generated data in them: https://github.com/glibsm/alphabet.

Below is the code used and the resulting performance numbers.

Any help is appreciated. Haven't dug into the go-git library code just yet, figured
I'd share the info I have first.

Sample code

package main

import (
        "flag"
        "fmt"
        "log"
        "os"
        "runtime/pprof"

        git "gopkg.in/src-d/go-git.v4"
)

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
        flag.Parse()
        if *cpuprofile != "" {
                f, err := os.Create(*cpuprofile)
                if err != nil {
                        log.Fatal(err)
                }
                pprof.StartCPUProfile(f)
                defer pprof.StopCPUProfile()
        }

        cloneOpts := &git.CloneOptions{
                URL:      "https://github.com/glibsm/alphabet", // repo with 17k files
                Progress: os.Stdout,
        }
        repo, err := git.PlainClone("./repo", false, cloneOpts)
        if err != nil {
                log.Panic(err)
        }

        head, err := repo.Head()
        if err != nil {
                log.Panic(err)
        }
        fmt.Println(head)
}

Running the sample code

Counting objects: 36579, done.
Compressing objects: 100% (1423/1423), done.
Total 36579 (delta 4), reused 36574 (delta 3), pack-reused 0
978d628f9d82456dffef0bb0579b8a617750c8db refs/heads/master
./plain --cpuprofile plain.cpu  22.18s user 14.97s system 140% cpu **26.404 total**

Command line clone for speed comparison

$ time git clone https://github.com/glibsm/alphabet
Cloning into 'alphabet'...
remote: Counting objects: 36579, done.
remote: Compressing objects: 100% (1423/1423), done.
remote: Total 36579 (delta 4), reused 36574 (delta 3), pack-reused 0
Receiving objects: 100% (36579/36579), 1.56 MiB | 5.98 MiB/s, done.
Resolving deltas: 100% (4/4), done.
Checking out files: 100% (17579/17579), done.

real    0m3.569s
user    0m0.508s
sys     **0m2.826s**

As we can see, the same repo clone times are 2.8 vs 26 seconds, which is quite a big gap.

CPU Profile

Top20 flat

(pprof) top20
Showing nodes accounting for 31.26s, 96.81% of 32.29s total
Dropped 218 nodes (cum <= 0.16s)
Showing top 20 nodes out of 180
      flat  flat%   sum%        cum   cum%
    20.34s 62.99% 62.99%     20.34s 62.99%  syscall.Syscall
     5.24s 16.23% 79.22%      5.24s 16.23%  runtime.kevent
     1.67s  5.17% 84.39%      1.67s  5.17%  runtime.mach_semaphore_wait
     0.84s  2.60% 86.99%      0.84s  2.60%  runtime.memmove
     0.75s  2.32% 89.32%      1.91s  5.92%  runtime.scanobject
     0.53s  1.64% 90.96%      0.53s  1.64%  runtime.mach_semaphore_timedwait
     0.44s  1.36% 92.32%      0.50s  1.55%  runtime.heapBitsForObject
     0.44s  1.36% 93.68%      0.44s  1.36%  runtime.mach_semaphore_signal
     0.41s  1.27% 94.95%      0.41s  1.27%  runtime.usleep
     0.29s   0.9% 95.85%      0.50s  1.55%  runtime.greyobject
     0.11s  0.34% 96.19%      1.05s  3.25%  os.(*file).close
     0.09s  0.28% 96.47%      2.64s  8.18%  runtime.gcDrain
     0.02s 0.062% 96.53%      0.19s  0.59%  runtime.mallocgc
     0.02s 0.062% 96.59%      6.58s 20.38%  gopkg.in/src-d/go-git%2ev4.(*Worktree).resetIndex
     0.02s 0.062% 96.66%      0.32s  0.99%  gopkg.in/src-d/go-git.v4/plumbing/format/index.(*Index).Remove
     0.01s 0.031% 96.69%      7.11s 22.02%  os.openFileNolog
     0.01s 0.031% 96.72%      0.67s  2.07%  runtime.gcMarkDone
     0.01s 0.031% 96.75%      3.53s 10.93%  runtime.systemstack
     0.01s 0.031% 96.78%      8.85s 27.41%  gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChangeRegularFile
     0.01s 0.031% 96.81%      0.24s  0.74%  gopkg.in/src-d/go-git.v4/plumbing/format/packfile.(*teeReader).ReadByte

Top 20 cumulative

(pprof) top20 -cum
Showing nodes accounting for 20.37s, 63.08% of 32.29s total
Dropped 218 nodes (cum <= 0.16s)
Showing top 20 nodes out of 180
      flat  flat%   sum%        cum   cum%
         0     0%     0%     22.55s 69.84%  main.main
         0     0%     0%     22.55s 69.84%  runtime.main
         0     0%     0%     22.55s 69.84%  gopkg.in/src-d/go-git%2ev4.(*Repository).clone
         0     0%     0%     22.55s 69.84%  gopkg.in/src-d/go-git%2ev4.PlainClone
         0     0%     0%     22.55s 69.84%  gopkg.in/src-d/go-git%2ev4.PlainCloneContext
         0     0%     0%     21.68s 67.14%  gopkg.in/src-d/go-git%2ev4.(*Worktree).Reset
    20.34s 62.99% 62.99%     20.34s 62.99%  syscall.Syscall
         0     0% 62.99%     15.10s 46.76%  gopkg.in/src-d/go-git%2ev4.(*Worktree).resetWorktree
     0.01s 0.031% 63.02%     13.64s 42.24%  gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).EncodedObject
         0     0% 63.02%     12.48s 38.65%  gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChange
         0     0% 63.02%     12.06s 37.35%  gopkg.in/src-d/go-git.v4/plumbing/object.(*Tree).FindEntry
         0     0% 63.02%     12.06s 37.35%  gopkg.in/src-d/go-git.v4/plumbing/object.(*Tree).dir
         0     0% 63.02%      9.59s 29.70%  gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).getFromPackfile
     0.01s 0.031% 63.05%      8.85s 27.41%  gopkg.in/src-d/go-git%2ev4.(*Worktree).checkoutChangeRegularFile
     0.01s 0.031% 63.08%      7.34s 22.73%  gopkg.in/src-d/go-git.v4/storage/filesystem.(*ObjectStorage).decodeObjectAt
         0     0% 63.08%      7.23s 22.39%  internal/poll.(*FD).Seek
         0     0% 63.08%      7.23s 22.39%  os.(*File).Seek
         0     0% 63.08%      7.23s 22.39%  os.(*File).seek
         0     0% 63.08%      7.23s 22.39%  gopkg.in/src-d/go-billy.v4/helper/chroot.(*file).Seek

Go-git sha

The version used in the sample code

- name: gopkg.in/src-d/go-git.v4
  version: e9247ce9c5ce12126f646ca3ddf0066e4829bd14

e9247ce

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions