Merge branch 'ph/maint-quiltimport' into maint

[git.git] / Documentation / git-fast-import.txt
diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt

index 5eacab08dc2141360209534bc38a5fc8b7a05b75..bd625ababfc950318714b3271b02da938fbf609b 100644 (file)
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -24,7 +24,7 @@ updated branch and tag refs, fully updating the current repository
  with the newly imported data.
  
  The fast-import backend itself can import into an empty repository (one that
  with the newly imported data.
  
  The fast-import backend itself can import into an empty repository (one that
-has already been initialized by gitlink:git-init[1]) or incrementally
+has already been initialized by linkgit:git-init[1]) or incrementally
  update an existing populated repository.  Whether or not incremental
  imports are supported from a particular foreign source depends on
  the frontend program in use.
  update an existing populated repository.  Whether or not incremental
  imports are supported from a particular foreign source depends on
  the frontend program in use.
@@ -82,7 +82,7 @@ OPTIONS
         This information may be useful after importing projects
         whose total object set exceeds the 4 GiB packfile limit,
         as these commits can be used as edge points during calls
         This information may be useful after importing projects
         whose total object set exceeds the 4 GiB packfile limit,
         as these commits can be used as edge points during calls
-       to gitlink:git-pack-objects[1].
+       to linkgit:git-pack-objects[1].
  
  --quiet::
         Disable all non-fatal output, making fast-import silent when it
  
  --quiet::
         Disable all non-fatal output, making fast-import silent when it
@@ -176,6 +176,15 @@ results, such as branch names or file names with leading or trailing
  spaces in their name, or early termination of fast-import when it encounters
  unexpected input.
  
  spaces in their name, or early termination of fast-import when it encounters
  unexpected input.
  
+Stream Comments
+~~~~~~~~~~~~~~~
+To aid in debugging frontends fast-import ignores any line that
+begins with `#` (ASCII pound/hash) up to and including the line
+ending `LF`.  A comment line may contain any sequence of bytes
+that does not contain an LF and therefore may be used to include
+any detailed debugging information that might be specific to the
+frontend and useful when inspecting a fast-import data stream.
+
  Date Formats
  ~~~~~~~~~~~~
  The following date formats are supported.  A frontend should select
  Date Formats
  ~~~~~~~~~~~~
  The following date formats are supported.  A frontend should select
@@ -211,7 +220,7 @@ variation in formatting will cause fast-import to reject the value.
  +
  An example value is ``Tue Feb 6 11:22:18 2007 -0500''.  The Git
  parser is accurate, but a little on the lenient side.  It is the
  +
  An example value is ``Tue Feb 6 11:22:18 2007 -0500''.  The Git
  parser is accurate, but a little on the lenient side.  It is the
-same parser used by gitlink:git-am[1] when applying patches
+same parser used by linkgit:git-am[1] when applying patches
  received from email.
  +
  Some malformed strings may be accepted as valid dates.  In some of
  received from email.
  +
  Some malformed strings may be accepted as valid dates.  In some of
@@ -232,7 +241,7 @@ been well tested in the wild.
  +
  Frontends should prefer the `raw` format if the source material
  already uses UNIX-epoch format, can be coaxed to give dates in that
  +
  Frontends should prefer the `raw` format if the source material
  already uses UNIX-epoch format, can be coaxed to give dates in that
-format, or its format is easiliy convertible to it, as there is no
+format, or its format is easily convertible to it, as there is no
  ambiguity in parsing.
  
  `now`::
  ambiguity in parsing.
  
  `now`::
@@ -247,7 +256,7 @@ timezone.
  This particular format is supplied as its short to implement and
  may be useful to a process that wants to create a new commit
  right now, without needing to use a working directory or
  This particular format is supplied as its short to implement and
  may be useful to a process that wants to create a new commit
  right now, without needing to use a working directory or
-gitlink:git-update-index[1].
+linkgit:git-update-index[1].
  +
  If separate `author` and `committer` commands are used in a `commit`
  the timestamps may not match, as the system clock will be polled
  +
  If separate `author` and `committer` commands are used in a `commit`
  the timestamps may not match, as the system clock will be polled
@@ -289,6 +298,11 @@ and control the current import process.  More detailed discussion
         This command is optional and is not needed to perform
         an import.
  
         This command is optional and is not needed to perform
         an import.
  
+`progress`::
+       Causes fast-import to echo the entire line to its own
+       standard output.  This command is optional and is not needed
+       to perform an import.
+
  `commit`
  ~~~~~~~~
  Create or update a branch with a new commit, recording one logical
  `commit`
  ~~~~~~~~
  Create or update a branch with a new commit, recording one logical
@@ -302,8 +316,8 @@ change to the project.
         data
         ('from' SP <committish> LF)?
         ('merge' SP <committish> LF)?
         data
         ('from' SP <committish> LF)?
         ('merge' SP <committish> LF)?
-       (filemodify | filedelete | filedeleteall)*
-       LF
+       (filemodify | filedelete | filecopy | filerename | filedeleteall)*
+       LF?
  ....
  
  where `<ref>` is the name of the branch to make the commit on.
  ....
  
  where `<ref>` is the name of the branch to make the commit on.
@@ -325,13 +339,17 @@ commit message use a 0 length data.  Commit messages are free-form
  and are not interpreted by Git.  Currently they must be encoded in
  UTF-8, as fast-import does not permit other encodings to be specified.
  
  and are not interpreted by Git.  Currently they must be encoded in
  UTF-8, as fast-import does not permit other encodings to be specified.
  
-Zero or more `filemodify`, `filedelete` and `filedeleteall` commands
+Zero or more `filemodify`, `filedelete`, `filecopy`, `filerename`
+and `filedeleteall` commands
  may be included to update the contents of the branch prior to
  creating the commit.  These commands may be supplied in any order.
  may be included to update the contents of the branch prior to
  creating the commit.  These commands may be supplied in any order.
-However it is recommended that a `filedeleteall` command preceed
-all `filemodify` commands in the same commit, as `filedeleteall`
+However it is recommended that a `filedeleteall` command precede
+all `filemodify`, `filecopy` and `filerename` commands in the same
+commit, as `filedeleteall`
  wipes the branch clean (see below).
  
  wipes the branch clean (see below).
  
+The `LF` after the command is optional (it used to be required).
+
  `author`
  ^^^^^^^^
  An `author` command may optionally appear, if the author information
  `author`
  ^^^^^^^^
  An `author` command may optionally appear, if the author information
@@ -384,7 +402,7 @@ Here `<committish>` is any of the following:
  +
  The reason fast-import uses `:` to denote a mark reference is this character
  is not legal in a Git branch name.  The leading `:` makes it easy
  +
  The reason fast-import uses `:` to denote a mark reference is this character
  is not legal in a Git branch name.  The leading `:` makes it easy
-to distingush between the mark 42 (`:42`) and the branch 42 (`42`
+to distinguish between the mark 42 (`:42`) and the branch 42 (`42`
  or `refs/heads/42`), or an abbreviated SHA-1 which happened to
  consist only of base-10 digits.
  +
  or `refs/heads/42`), or an abbreviated SHA-1 which happened to
  consist only of base-10 digits.
  +
@@ -393,7 +411,7 @@ Marks must be declared (via `mark`) before they can be used.
  * A complete 40 byte or abbreviated commit SHA-1 in hex.
  
  * Any valid Git SHA-1 expression that resolves to a commit.  See
  * A complete 40 byte or abbreviated commit SHA-1 in hex.
  
  * Any valid Git SHA-1 expression that resolves to a commit.  See
-  ``SPECIFYING REVISIONS'' in gitlink:git-rev-parse[1] for details.
+  ``SPECIFYING REVISIONS'' in linkgit:git-rev-parse[1] for details.
  
  The special case of restarting an incremental import from the
  current branch value should be written as:
  
  The special case of restarting an incremental import from the
  current branch value should be written as:
@@ -469,7 +487,7 @@ start with double quote (`"`).
  If an `LF` or double quote must be encoded into `<path>` shell-style
  quoting should be used, e.g. `"path/with\n and \" in it"`.
  
  If an `LF` or double quote must be encoded into `<path>` shell-style
  quoting should be used, e.g. `"path/with\n and \" in it"`.
  
-The value of `<path>` must be in canoncial form. That is it must not:
+The value of `<path>` must be in canonical form. That is it must not:
  
  * contain an empty directory component (e.g. `foo//bar` is invalid),
  * end with a directory separator (e.g. `foo/` is invalid),
  
  * contain an empty directory component (e.g. `foo//bar` is invalid),
  * end with a directory separator (e.g. `foo/` is invalid),
@@ -481,8 +499,9 @@ It is recommended that `<path>` always be encoded using UTF-8.
  
  `filedelete`
  ^^^^^^^^^^^^
  
  `filedelete`
  ^^^^^^^^^^^^
-Included in a `commit` command to remove a file from the branch.
-If the file removal makes its directory empty, the directory will
+Included in a `commit` command to remove a file or recursively
+delete an entire directory from the branch.  If the file or directory
+removal makes its parent directory empty, the parent directory will
  be automatically removed too.  This cascades up the tree until the
  first non-empty directory or the root is reached.
  
  be automatically removed too.  This cascades up the tree until the
  first non-empty directory or the root is reached.
  
@@ -490,9 +509,60 @@ first non-empty directory or the root is reached.
         'D' SP <path> LF
  ....
  
         'D' SP <path> LF
  ....
  
-here `<path>` is the complete path of the file to be removed.
+here `<path>` is the complete path of the file or subdirectory to
+be removed from the branch.
  See `filemodify` above for a detailed description of `<path>`.
  
  See `filemodify` above for a detailed description of `<path>`.
  
+`filecopy`
+^^^^^^^^^^^^
+Recursively copies an existing file or subdirectory to a different
+location within the branch.  The existing file or directory must
+exist.  If the destination exists it will be completely replaced
+by the content copied from the source.
+
+....
+       'C' SP <path> SP <path> LF
+....
+
+here the first `<path>` is the source location and the second
+`<path>` is the destination.  See `filemodify` above for a detailed
+description of what `<path>` may look like.  To use a source path
+that contains SP the path must be quoted.
+
+A `filecopy` command takes effect immediately.  Once the source
+location has been copied to the destination any future commands
+applied to the source location will not impact the destination of
+the copy.
+
+`filerename`
+^^^^^^^^^^^^
+Renames an existing file or subdirectory to a different location
+within the branch.  The existing file or directory must exist. If
+the destination exists it will be replaced by the source directory.
+
+....
+       'R' SP <path> SP <path> LF
+....
+
+here the first `<path>` is the source location and the second
+`<path>` is the destination.  See `filemodify` above for a detailed
+description of what `<path>` may look like.  To use a source path
+that contains SP the path must be quoted.
+
+A `filerename` command takes effect immediately.  Once the source
+location has been renamed to the destination any future commands
+applied to the source location will create new files there and not
+impact the destination of the rename.
+
+Note that a `filerename` is the same as a `filecopy` followed by a
+`filedelete` of the source location.  There is a slight performance
+advantage to using `filerename`, but the advantage is so small
+that it is never worth trying to convert a delete/add pair in
+source material into a rename for fast-import.  This `filerename`
+command is provided just to simplify frontends that already have
+rename information and don't want bother with decomposing it into a
+`filecopy` followed by a `filedelete`.
+
  `filedeleteall`
  ^^^^^^^^^^^^^^^
  Included in a `commit` command to remove all files (and also all
  `filedeleteall`
  ^^^^^^^^^^^^^^^
  Included in a `commit` command to remove all files (and also all
@@ -579,7 +649,7 @@ recommended, as the frontend does not (easily) have access to the
  complete set of bytes which normally goes into such a signature.
  If signing is required, create lightweight tags from within fast-import with
  `reset`, then create the annotated versions of those tags offline
  complete set of bytes which normally goes into such a signature.
  If signing is required, create lightweight tags from within fast-import with
  `reset`, then create the annotated versions of those tags offline
-with the standard gitlink:git-tag[1] process.
+with the standard linkgit:git-tag[1] process.
  
  `reset`
  ~~~~~~~
  
  `reset`
  ~~~~~~~
@@ -591,12 +661,14 @@ branch from an existing commit without creating a new commit.
  ....
         'reset' SP <ref> LF
         ('from' SP <committish> LF)?
  ....
         'reset' SP <ref> LF
         ('from' SP <committish> LF)?
-       LF
+       LF?
  ....
  
  For a detailed description of `<ref>` and `<committish>` see above
  under `commit` and `from`.
  
  ....
  
  For a detailed description of `<ref>` and `<committish>` see above
  under `commit` and `from`.
  
+The `LF` after the command is optional (it used to be required).
+
  The `reset` command can also be used to create lightweight
  (non-annotated) tags.  For example:
  
  The `reset` command can also be used to create lightweight
  (non-annotated) tags.  For example:
  
@@ -635,29 +707,40 @@ intended for production-quality conversions should always use the
  exact byte count format, as it is more robust and performs better.
  The delimited format is intended primarily for testing fast-import.
  
  exact byte count format, as it is more robust and performs better.
  The delimited format is intended primarily for testing fast-import.
  
+Comment lines appearing within the `<raw>` part of `data` commands
+are always taken to be part of the body of the data and are therefore
+never ignored by fast-import.  This makes it safe to import any
+file/message content whose lines might start with `#`.
+
  Exact byte count format::
         The frontend must specify the number of bytes of data.
  +
  ....
         'data' SP <count> LF
  Exact byte count format::
         The frontend must specify the number of bytes of data.
  +
  ....
         'data' SP <count> LF
-       <raw> LF
+       <raw> LF?
  ....
  +
  where `<count>` is the exact number of bytes appearing within
  `<raw>`.  The value of `<count>` is expressed as an ASCII decimal
  integer.  The `LF` on either side of `<raw>` is not
  included in `<count>` and will not be included in the imported data.
  ....
  +
  where `<count>` is the exact number of bytes appearing within
  `<raw>`.  The value of `<count>` is expressed as an ASCII decimal
  integer.  The `LF` on either side of `<raw>` is not
  included in `<count>` and will not be included in the imported data.
++
+The `LF` after `<raw>` is optional (it used to be required) but
+recommended.  Always including it makes debugging a fast-import
+stream easier as the next command always starts in column 0
+of the next line, even if `<raw>` did not end with an `LF`.
  
  Delimited format::
         A delimiter string is used to mark the end of the data.
         fast-import will compute the length by searching for the delimiter.
  
  Delimited format::
         A delimiter string is used to mark the end of the data.
         fast-import will compute the length by searching for the delimiter.
-       This format is primarly useful for testing and is not
+       This format is primarily useful for testing and is not
         recommended for real data.
  +
  ....
         'data' SP '<<' <delim> LF
         <raw> LF
         <delim> LF
         recommended for real data.
  +
  ....
         'data' SP '<<' <delim> LF
         <raw> LF
         <delim> LF
+       LF?
  ....
  +
  where `<delim>` is the chosen delimiter string.  The string `<delim>`
  ....
  +
  where `<delim>` is the chosen delimiter string.  The string `<delim>`
@@ -666,6 +749,8 @@ fast-import will think the data ends earlier than it really does.  The `LF`
  immediately trailing `<raw>` is part of `<raw>`.  This is one of
  the limitations of the delimited format, it is impossible to supply
  a data chunk which does not have an LF as its last byte.
  immediately trailing `<raw>` is part of `<raw>`.  This is one of
  the limitations of the delimited format, it is impossible to supply
  a data chunk which does not have an LF as its last byte.
++
+The `LF` after `<delim> LF` is optional (it used to be required).
  
  `checkpoint`
  ~~~~~~~~~~~~
  
  `checkpoint`
  ~~~~~~~~~~~~
@@ -674,7 +759,7 @@ save out all current branch refs, tags and marks.
  
  ....
         'checkpoint' LF
  
  ....
         'checkpoint' LF
-       LF
+       LF?
  ....
  
  Note that fast-import automatically switches packfiles when the current
  ....
  
  Note that fast-import automatically switches packfiles when the current
@@ -693,6 +778,32 @@ process access to a branch.  However given that a 30 GiB Subversion
  repository can be loaded into Git through fast-import in about 3 hours,
  explicit checkpointing may not be necessary.
  
  repository can be loaded into Git through fast-import in about 3 hours,
  explicit checkpointing may not be necessary.
  
+The `LF` after the command is optional (it used to be required).
+
+`progress`
+~~~~~~~~~~
+Causes fast-import to print the entire `progress` line unmodified to
+its standard output channel (file descriptor 1) when the command is
+processed from the input stream.  The command otherwise has no impact
+on the current import, or on any of fast-import's internal state.
+
+....
+       'progress' SP <any> LF
+       LF?
+....
+
+The `<any>` part of the command may contain any sequence of bytes
+that does not contain `LF`.  The `LF` after the command is optional.
+Callers may wish to process the output through a tool such as sed to
+remove the leading part of the line, for example:
+
+====
+       frontend | git-fast-import | sed 's/^progress //'
+====
+
+Placing a `progress` command immediately after a `checkpoint` will
+inform the reader when the `checkpoint` has been completed and it
+can safely access the refs that fast-import updated.
  
  Tips and Tricks
  ---------------
  
  Tips and Tricks
  ---------------
@@ -752,7 +863,7 @@ is not `refs/heads/TAG_FIXUP`).
  
  When committing fixups, consider using `merge` to connect the
  commit(s) which are supplying file revisions to the fixup branch.
  
  When committing fixups, consider using `merge` to connect the
  commit(s) which are supplying file revisions to the fixup branch.
-Doing so will allow tools such as gitlink:git-blame[1] to track
+Doing so will allow tools such as linkgit:git-blame[1] to track
  through the real commit history and properly annotate the source
  files.
  
  through the real commit history and properly annotate the source
  files.
  
@@ -762,7 +873,7 @@ to remove the dummy branch.
  Import Now, Repack Later
  ~~~~~~~~~~~~~~~~~~~~~~~~
  As soon as fast-import completes the Git repository is completely valid
  Import Now, Repack Later
  ~~~~~~~~~~~~~~~~~~~~~~~~
  As soon as fast-import completes the Git repository is completely valid
-and ready for use.  Typicallly this takes only a very short time,
+and ready for use.  Typically this takes only a very short time,
  even for considerably large projects (100,000+ commits).
  
  However repacking the repository is necessary to improve data
  even for considerably large projects (100,000+ commits).
  
  However repacking the repository is necessary to improve data
@@ -781,11 +892,20 @@ Repacking Historical Data
  ~~~~~~~~~~~~~~~~~~~~~~~~~
  If you are repacking very old imported data (e.g. older than the
  last year), consider expending some extra CPU time and supplying
  ~~~~~~~~~~~~~~~~~~~~~~~~~
  If you are repacking very old imported data (e.g. older than the
  last year), consider expending some extra CPU time and supplying
-\--window=50 (or higher) when you run gitlink:git-repack[1].
+\--window=50 (or higher) when you run linkgit:git-repack[1].
  This will take longer, but will also produce a smaller packfile.
  You only need to expend the effort once, and everyone using your
  project will benefit from the smaller repository.
  
  This will take longer, but will also produce a smaller packfile.
  You only need to expend the effort once, and everyone using your
  project will benefit from the smaller repository.
  
+Include Some Progress Messages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Every once in a while have your frontend emit a `progress` message
+to fast-import.  The contents of the messages are entirely free-form,
+so one suggestion would be to output the current month and year
+each time the current commit date moves into the next month.
+Your users will feel better knowing how much of the data stream
+has been processed.
+
  
  Packfile Optimization
  ---------------------
  
  Packfile Optimization
  ---------------------
@@ -822,8 +942,8 @@ Memory Utilization
  ------------------
  There are a number of factors which affect how much memory fast-import
  requires to perform an import.  Like critical sections of core
  ------------------
  There are a number of factors which affect how much memory fast-import
  requires to perform an import.  Like critical sections of core
-Git, fast-import uses its own memory allocators to ammortize any overheads
-associated with malloc.  In practice fast-import tends to ammoritize any
+Git, fast-import uses its own memory allocators to amortize any overheads
+associated with malloc.  In practice fast-import tends to amortize any
  malloc overheads to 0, due to its use of large block allocations.
  
  per object
  malloc overheads to 0, due to its use of large block allocations.
  
  per object
@@ -880,7 +1000,7 @@ per active tree
  ~~~~~~~~~~~~~~~
  Trees (aka directories) use just 12 bytes of memory on top of the
  memory required for their entries (see ``per active file'' below).
  ~~~~~~~~~~~~~~~
  Trees (aka directories) use just 12 bytes of memory on top of the
  memory required for their entries (see ``per active file'' below).
-The cost of a tree is virtually 0, as its overhead ammortizes out
+The cost of a tree is virtually 0, as its overhead amortizes out
  over the individual file entries.
  
  per active file entry
  over the individual file entries.
  
  per active file entry
@@ -907,4 +1027,4 @@ Documentation by Shawn O. Pearce <spearce@spearce.org>.
  
  GIT
  ---
  
  GIT
  ---
-Part of the gitlink:git[7] suite
+Part of the linkgit:git[7] suite