X-Git-Url: https://git.tokkee.org/?a=blobdiff_plain;f=Documentation%2Fgit-fast-import.txt;h=d5119678b59492db4651006391fd693a1a12601a;hb=afc05f9f13beded8caf15d8e58d06fd64e0f7808;hp=0b64d3348b1cf31830ed1e953d551eb74a396d33;hpb=302da67472e322109e6299d38dd1a2c30bde9f4c;p=git.git diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt index 0b64d3348..d5119678b 100644 --- a/Documentation/git-fast-import.txt +++ b/Documentation/git-fast-import.txt @@ -3,7 +3,7 @@ git-fast-import(1) NAME ---- -git-fast-import - Backend for fast Git data importers. +git-fast-import - Backend for fast Git data importers SYNOPSIS @@ -15,15 +15,15 @@ DESCRIPTION This program is usually not what the end user wants to run directly. Most end users want to use one of the existing frontend programs, which parses a specific type of foreign source and feeds the contents -stored there to git-fast-import (gfi). +stored there to git-fast-import. -gfi reads a mixed command/data stream from standard input and +fast-import reads a mixed command/data stream from standard input and writes one or more packfiles directly into the current repository. When EOF is received on standard input, fast import writes out updated branch and tag refs, fully updating the current repository with the newly imported data. -The gfi backend itself can import into an empty repository (one that +The fast-import backend itself can import into an empty repository (one that has already been initialized by gitlink:git-init[1]) or incrementally update an existing populated repository. Whether or not incremental imports are supported from a particular foreign source depends on @@ -34,7 +34,7 @@ OPTIONS ------- --date-format=:: Specify the type of dates the frontend will supply to - gfi within `author`, `committer` and `tagger` commands. + fast-import within `author`, `committer` and `tagger` commands. See ``Date Formats'' below for details about which formats are supported, and their syntax. @@ -62,31 +62,51 @@ OPTIONS Dumps the internal marks table to when complete. Marks are written one per line as `:markid SHA-1`. Frontends can use this file to validate imports after they - have been completed. + have been completed, or to save the marks table across + incremental runs. As is only opened and truncated + at checkpoint (or completion) the same path can also be + safely given to \--import-marks. + +--import-marks=:: + Before processing any input, load the marks specified in + . The input file must exist, must be readable, and + must use the same format as produced by \--export-marks. + Multiple options may be supplied to import more than one + set of marks. If a mark is defined to different values, + the last file wins. + +--export-pack-edges=:: + After creating a packfile, print a line of data to + listing the filename of the packfile and the last + commit on each branch that was written to that packfile. + This information may be useful after importing projects + whose total object set exceeds the 4 GiB packfile limit, + as these commits can be used as edge points during calls + to gitlink:git-pack-objects[1]. --quiet:: - Disable all non-fatal output, making gfi silent when it + Disable all non-fatal output, making fast-import silent when it is successful. This option disables the output shown by \--stats. --stats:: - Display some basic statistics about the objects gfi has + Display some basic statistics about the objects fast-import has created, the packfiles they were stored into, and the - memory used by gfi during this run. Showing this output + memory used by fast-import during this run. Showing this output is currently the default, but can be disabled with \--quiet. Performance ----------- -The design of gfi allows it to import large projects in a minimum +The design of fast-import allows it to import large projects in a minimum amount of memory usage and processing time. Assuming the frontend -is able to keep up with gfi and feed it a constant stream of data, +is able to keep up with fast-import and feed it a constant stream of data, import times for projects holding 10+ years of history and containing 100,000+ individual commits are generally completed in just 1-2 hours on quite modest (~$2,000 USD) hardware. Most bottlenecks appear to be in foreign source data access (the -source just cannot extract revisions fast enough) or disk IO (gfi +source just cannot extract revisions fast enough) or disk IO (fast-import writes as fast as the disk will take the data). Imports will run faster if the source data is stored on a different drive than the destination Git repository (due to less IO contention). @@ -94,28 +114,28 @@ destination Git repository (due to less IO contention). Development Cost ---------------- -A typical frontend for gfi tends to weigh in at approximately 200 +A typical frontend for fast-import tends to weigh in at approximately 200 lines of Perl/Python/Ruby code. Most developers have been able to create working importers in just a couple of hours, even though it -is their first exposure to gfi, and sometimes even to Git. This is +is their first exposure to fast-import, and sometimes even to Git. This is an ideal situation, given that most conversion tools are throw-away (use once, and never look back). Parallel Operation ------------------ -Like `git-push` or `git-fetch`, imports handled by gfi are safe to +Like `git-push` or `git-fetch`, imports handled by fast-import are safe to run alongside parallel `git repack -a -d` or `git gc` invocations, or any other Git operation (including `git prune`, as loose objects -are never used by gfi). +are never used by fast-import). -gfi does not lock the branch or tag refs it is actively importing. -After the import, during its ref update phase, gfi tests each +fast-import does not lock the branch or tag refs it is actively importing. +After the import, during its ref update phase, fast-import tests each existing branch ref to verify the update will be a fast-forward update (the commit stored in the ref is contained in the new history of the commit to be written). If the update is not a -fast-forward update, gfi will skip updating that ref and instead -prints a warning message. gfi will always attempt to update all +fast-forward update, fast-import will skip updating that ref and instead +prints a warning message. fast-import will always attempt to update all branch refs, and does not stop on the first failure. Branch updates can be forced with \--force, but its recommended that @@ -125,37 +145,46 @@ is not necessary for an initial import into an empty repository. Technical Discussion -------------------- -gfi tracks a set of branches in memory. Any branch can be created +fast-import tracks a set of branches in memory. Any branch can be created or modified at any point during the import process by sending a `commit` command on the input stream. This design allows a frontend program to process an unlimited number of branches simultaneously, generating commits in the order they are available from the source data. It also simplifies the frontend programs considerably. -gfi does not use or alter the current working directory, or any +fast-import does not use or alter the current working directory, or any file within it. (It does however update the current Git repository, as referenced by `GIT_DIR`.) Therefore an import frontend may use the working directory for its own purposes, such as extracting file revisions from the foreign source. This ignorance of the working -directory also allows gfi to run very quickly, as it does not +directory also allows fast-import to run very quickly, as it does not need to perform any costly file update operations when switching between branches. Input Format ------------ With the exception of raw file data (which Git does not interpret) -the gfi input format is text (ASCII) based. This text based +the fast-import input format is text (ASCII) based. This text based format simplifies development and debugging of frontend programs, especially when a higher level language such as Perl, Python or Ruby is being used. -gfi is very strict about its input. Where we say SP below we mean +fast-import is very strict about its input. Where we say SP below we mean *exactly* one space. Likewise LF means one (and only one) linefeed. Supplying additional whitespace characters will cause unexpected results, such as branch names or file names with leading or trailing -spaces in their name, or early termination of gfi when it encounters +spaces in their name, or early termination of fast-import when it encounters unexpected input. +Stream Comments +~~~~~~~~~~~~~~~ +To aid in debugging frontends fast-import ignores any line that +begins with `#` (ASCII pound/hash) up to and including the line +ending `LF`. A comment line may contain any sequence of bytes +that does not contain an LF and therefore may be used to include +any detailed debugging information that might be specific to the +frontend and useful when inspecting a fast-import data stream. + Date Formats ~~~~~~~~~~~~ The following date formats are supported. A frontend should select @@ -164,7 +193,7 @@ in the \--date-format= command line option. `raw`:: This is the Git native format and is `