author | Jeff King <peff@peff.net> | |
Thu, 2 Feb 2012 08:24:28 +0000 (03:24 -0500) | ||
committer | Junio C Hamano <gitster@pobox.com> | |
Thu, 2 Feb 2012 18:36:08 +0000 (10:36 -0800) | ||
commit | 9dd5245c1043dd18fd7b3f44b9e51eef7e4b58d8 | |
tree | 69cb585a0fb10df11ca839fdaca7795193fbe12c | tree | snapshot |
parent | 08265798e1ff6abc1b0aaff31c1471f83bd51425 | commit | diff |
grep: pre-load userdiff drivers when threaded
The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.
Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.
We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:
real 0m0.391s
user 0m1.708s
sys 0m0.584s
to:
real 0m0.360s
user 0m1.576s
sys 0m0.572s
Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level grep_source code will automatically load the
userdiff driver to see whether a file is binary. However,
when we are threaded, it will load the drivers in a
non-deterministic order, handling each one as its assigned
thread happens to be scheduled.
Meanwhile, the attribute lookup code (which underlies the
userdiff driver lookup) is optimized to handle paths in
sequential order (because they tend to share the same
gitattributes files). Multi-threading the lookups destroys
the locality and makes this optimization less effective.
We can fix this by pre-loading the userdiff driver in the
main thread, before we hand off the file to a worker thread.
My best-of-five for "git grep foo" on the linux-2.6
repository went from:
real 0m0.391s
user 0m1.708s
sys 0m0.584s
to:
real 0m0.360s
user 0m1.576s
sys 0m0.572s
Not a huge speedup, but it's quite easy to do. The only
trick is that we shouldn't perform this optimization if "-a"
was used, in which case we won't bother checking whether
the files are binary at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/grep.c | diff | blob | history |