author | Nicolas Pitre <nico@cam.org> | |
Wed, 8 Mar 2006 19:32:50 +0000 (14:32 -0500) | ||
committer | Junio C Hamano <junkio@cox.net> | |
Thu, 9 Mar 2006 09:35:14 +0000 (01:35 -0800) | ||
commit | c13c6bf758457a3e7293b2adf63cc47aec8d83ef | |
tree | a09355dc0b5d0f6462147490ed83711498f822bc | tree | snapshot |
parent | 3d99a7f4fab41bb057d62c87cb596069a5aba106 | commit | diff |
diff-delta: bound hash list length to avoid O(m*n) behavior
The diff-delta code can exhibit O(m*n) behavior with some patological
data set where most hash entries end up in the same hash bucket.
To prevent this, a limit is imposed to the number of entries that can
exist in the same hash bucket.
Because of the above the code is a tiny bit more expensive on average,
even if some small optimizations were added as well to atenuate the
overhead. But the problematic samples used to diagnoze the issue are now
orders of magnitude less expensive to process with only a slight loss in
compression.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The diff-delta code can exhibit O(m*n) behavior with some patological
data set where most hash entries end up in the same hash bucket.
To prevent this, a limit is imposed to the number of entries that can
exist in the same hash bucket.
Because of the above the code is a tiny bit more expensive on average,
even if some small optimizations were added as well to atenuate the
overhead. But the problematic samples used to diagnoze the issue are now
orders of magnitude less expensive to process with only a slight loss in
compression.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
diff-delta.c | diff | blob | history |