X-Git-Url: https://git.tokkee.org/?a=blobdiff_plain;f=Documentation%2Ftechnical%2Fpack-format.txt;h=a80baa4382ce72cd6d3e1f1719a17264fc36ee02;hb=bd8ff616c998da8b08bd59b47644408048b3016d;hp=0e1ffb24276027aa54c6b04fe404367f267cc07d;hpb=75c3a5ccdf114b5485e4828db1923bf4a35b19e2;p=git.git diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt index 0e1ffb242..a80baa438 100644 --- a/Documentation/technical/pack-format.txt +++ b/Documentation/technical/pack-format.txt @@ -1,9 +1,9 @@ GIT pack format =============== -= pack-*.pack file has the following format: += pack-*.pack files have the following format: - - The header appears at the beginning and consists of the following: + - A header appears at the beginning and consists of the following: 4-byte signature: The signature is: {'P', 'A', 'C', 'K'} @@ -21,11 +21,11 @@ GIT pack format which looks like this: (undeltified representation) - n-byte type and length (4-bit type, (n-1)*7+4-bit length) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) compressed data (deltified representation) - n-byte type and length (4-bit type, (n-1)*7+4-bit length) + n-byte type and length (3-bit type, (n-1)*7+4-bit length) 20-byte base object name compressed delta data @@ -34,18 +34,14 @@ GIT pack format - The trailer records 20-byte SHA1 checksum of all of the above. -= pack-*.idx file has the following format: += Original (version 1) pack-*.idx files have the following format: - The header consists of 256 4-byte network byte order integers. N-th entry of this table records the number of objects in the corresponding pack, the first byte of whose - object name are smaller than N. This is called the + object name is less than or equal to N. This is called the 'first-level fan-out' table. - Observation: we would need to extend this to an array of - 8-byte integers to go beyond 4G objects per pack, but it is - not strictly necessary. - - The header is followed by sorted 24-byte entries, one entry per object in the pack. Each entry is: @@ -55,10 +51,6 @@ GIT pack format 20-byte object name. - Observation: we would definitely need to extend this to - 8-byte integer plus 20-byte object name to handle a packfile - that is larger than 4GB. - - The file is concluded with a trailer: A copy of the 20-byte SHA1 checksum at the end of @@ -68,49 +60,87 @@ GIT pack format Pack Idx file: - idx - +--------------------------------+ - | fanout[0] = 2 |-. - +--------------------------------+ | + -- +--------------------------------+ +fanout | fanout[0] = 2 (for example) |-. +table +--------------------------------+ | | fanout[1] | | +--------------------------------+ | | fanout[2] | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | - | fanout[255] | | - +--------------------------------+ | -main | offset | | -index | object name 00XXXXXXXXXXXXXXXX | | -table +--------------------------------+ | - | offset | | - | object name 00XXXXXXXXXXXXXXXX | | - +--------------------------------+ | - .-| offset |<+ - | | object name 01XXXXXXXXXXXXXXXX | - | +--------------------------------+ - | | offset | - | | object name 01XXXXXXXXXXXXXXXX | - | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - | | offset | - | | object name FFXXXXXXXXXXXXXXXX | - | +--------------------------------+ + | fanout[255] = total objects |---. + -- +--------------------------------+ | | +main | offset | | | +index | object name 00XXXXXXXXXXXXXXXX | | | +table +--------------------------------+ | | + | offset | | | + | object name 00XXXXXXXXXXXXXXXX | | | + +--------------------------------+<+ | + .-| offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | +--------------------------------+ | + | | offset | | + | | object name 01XXXXXXXXXXXXXXXX | | + | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | + | | offset | | + | | object name FFXXXXXXXXXXXXXXXX | | + --| +--------------------------------+<--+ trailer | | packfile checksum | | +--------------------------------+ | | idxfile checksum | | +--------------------------------+ - .-------. + .-------. | Pack file entry: <+ packed object header: - 1-byte type (upper 4-bit) - size0 (lower 4-bit) + 1-byte size extension bit (MSB) + type (next 3 bit) + size0 (lower 4-bit) n-byte sizeN (as long as MSB is set, each 7-bit) size0..sizeN form 4+7+7+..+7 bit integer, size0 - is the most significant part. + is the least significant part, and sizeN is the + most significant part. packed object data: If it is not DELTA, then deflated bytes (the size above is the size before compression). If it is DELTA, then 20-byte base object name SHA1 (the size above is the - size of the delta data that follows). + size of the delta data that follows). delta data, deflated. + + += Version 2 pack-*.idx files support packs larger than 4 GiB, and + have some other reorganizations. They have the format: + + - A 4-byte magic number '\377tOc' which is an unreasonable + fanout[0] value. + + - A 4-byte version number (= 2) + + - A 256-entry fan-out table just like v1. + + - A table of sorted 20-byte SHA1 object names. These are + packed together without offset values to reduce the cache + footprint of the binary search for a specific object name. + + - A table of 4-byte CRC32 values of the packed object data. + This is new in v2 so compressed data can be copied directly + from pack to pack during repacking withough undetected + data corruption. + + - A table of 4-byte offset values (in network byte order). + These are usually 31-bit pack file offsets, but large + offsets are encoded as an index into the next table with + the msbit set. + + - A table of 8-byte offset entries (empty for pack files less + than 2 GiB). Pack files are organized with heavily used + objects toward the front, so most object references should + not need to refer to this table. + + - The same trailer as a v1 pack file: + + A copy of the 20-byte SHA1 checksum at the end of + corresponding packfile. + + 20-byte SHA1-checksum of all of the above.