-
Notifications
You must be signed in to change notification settings - Fork 127
Use git ls-files for loading paths when available. #301
Conversation
I guess this could be a fix for issue #271 |
This needs tests before it can be considered.
Do any of the alternatives handle symlinks correctly?
You figured it out in your next comment 😉
At least on Windows, it is not required to have Git installed when using Atom. |
I can't figure out what kind of test would be useful here. Ultimately, this change is great because all the regression tests pass, which proves that this change is functionally identical where it matters. Do you have any ideas as to what kind of test would be good here? As for the symlink issue, we can potentially use
I don't think that is ideal, but I also see a conflict with core.followSymlinks and core.excludeVcsIgnoredPaths. Symlinks are never followed in git, it sees them as a leaf node, so their contents are implicitly ignored. So, if you want to exclude VCS ignored paths, you would never follow a version controlled symlink. The only time you would follow a symlink in a VCS folder is if it were untracked. With that in mind, I would suggest the equivalent of these steps:
|
Disclaimer: I don't know what the code coverage for fuzzy-finder looks like. It could be that the test I'm suggesting already exists. I'd like to see a test asserting that we don't attempt to use the Git method when loading paths for a non-Git project. |
This is already tested for! |
lib/load-paths-handler.coffee
Outdated
done() | ||
repo = GitRepository.open(@rootPath, refreshOnWindowFocus: false) | ||
if repo?.relativize(path.join(@rootPath, 'test')) is 'test' | ||
args = ['ls-files', '-c', '-o', '-z'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind using the extended command args for clarity? Eg --cached
, --others
, etc.
lib/load-paths-handler.coffee
Outdated
if @ignoreVcsIgnores | ||
args.push('--exclude-standard') | ||
for ignoredName in @ignoredNames | ||
args.push("-x") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. (Also it looks like these should be single quotes to match the other arg quotes?)
Updated! (don't know if you get notified when another commit is pushed) |
👍 Sorry for the late reply; was on vacation. I've brought this PR up to the other team members for consideration. |
sorry for the late arrival to this PR. Here are the following test scenario I plan to run through:
I'll be spending time this week to look into these, please let me know if you feel like anything is missing from this list. I'll be testing first before the changes and reporting back if anything changed or is the same. One thing I see is:
@eeejay do you know if this has been fixed with a more recent commit? |
Thanks for the work you've put into this PR. Unfortunately, we won't be able to accept this as it is currently written because it causes a change in behavior with regard to symlinks. We are also looking at making indexing better by incorporating some of the work we've invested in our file watcher infrastructure. If you can update this PR to address the symlinks issue, we'll reevaluate it after the indexing changes. Thanks again for your help! |
Pull request #367 does this now. |
Description of the Change
Leverage git to index a git-controlled project quickly by using git-ls-files where appropriate
Background
Walking though a large codebase with node fs is very slow. Gecko has almost 200k files, and it takes 17 seconds on a fast machine. The project is constantly being reindexed after each editor focus. There needs to be a better way.
Solution
Allow a native binary to walk the tree. A simple
find
orgit ls-files
is 40x faster than the current nodejs task. I used dugite here to wrap the git commands instead of using child_process because it is more portable and Atom already pulls it in as a dependency elsewhere.EDIT: This is faster because the non-git solution needs to query each file if it is ignored or not. This means reading, at least,
.gitignore
+.git/info/exclude
on each traversed node.Alternate Designs
find
on POSIX systems performs just as well, I decided to go withgit-ls-files
because it is more portable, and should be available on Windows, and because it has a simple flag for excluding VCS ignored files.libgit2
based, library.git-utils
doesn't expose an API that would allow this. I tried replicating git-ls-files withnodegit
to see if it is possible, and it is 4x slower. So I think the git command line solution is the best.child_process.spawn
. This is twice as fast as this patch, because it would allow streaming the stdout.dugite
currently does not allow that and needs a fixed buffer. I went withdugite
because it seemed the most portable, even though it is pretty certain Atom will have access to git in its PATH.Benefits
A 40x speedup.
Time to index Gecko before patch: 16.5s.
After patch: 0.4s
Possible Drawbacks
Following symlink subdirs is broken in this solution.
Applicable Issues
What does this mean??