Git Concepts

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Object Store

Git places only four types of atomic objects in the object store: blobs, trees, commits and tags. To use disk space and network bandwidth efficiently, Git compresses and stores the objects in pack files, which are also placed in the object store. The Git object store is implemented as a content-addressable storage system: each object has an unique name produced by applying a SHA1 function to the content of the object. Git is a content tracking system - it tracks content, and not file or directory names, which are associated with file content in secondary ways. The repository objects are stored in:

.git/objects/<first-two-digits-of-the-SHA1-value>/<the-rest-of-the-SHA1-value>

Git inserts a / after the first two digits to improve filesystem efficiency. Some filesystems slow down if you put too many files in the same directory; making the first byte of the SHA1 into a directory is an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.


The content of repository objects can be queried with git cat-file.

SHA1, Hash Code, Object ID

Each object in the Object Repository has an unique name produced by applying a SHA1 function to the content of the object. SHA1, hash code and object ID are used interchangeably.

Blob

Each version of a file is represented as a blob (binary large object), treated as opaque. A blob contains the file's data, but none of its metadata, not even the file name. Git internal database stores every version of every file - not their differences - as files are modified and go from one version to the next. Because Git uses the hash of a file's complete content as the name for that file, it must operate on each complete copy of the file. Because Git does not maintain deltas, diffs and patches are derived data, not the fundamental data they are in CVS or Subversion.

Tree

A tree object represents a directory. It records blob identifiers, pathnames, and metadata for all files in the directory. It also contains, recursively, other sub-tree objects. Tree objects are created from the index with git write-tree. The trees are stored in the object store and can be listed with git cat-file.

Commit

A commit object encapsulates an atomic changeset of the workspace. Internally, the commit object contains 1) metadata (author of the change, the committer, commit date and time, log message) 2) points to tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed and 3) the previous (parent) commit. By default, the author and the committer are the same, there are just a few situations when they're different. The commit objects are created internally with git commit-tree and the user-level commit mechanics is encapsulated in git commit.

The initial commit (the root commit) has no parent. The rest of the commits in a repository are derived from at least one earlier commit, where the direct ancestors are called parent commits. Most commits have one parent. When we commit after a merge, that commit has more than one parent. Each commit points back to its parent(s). The commit object are stored in a graph structure, different from the structure used by the tree objects. When you make a new commit, you can give it one or more parent commits. The HEAD commit an implicit reference pointing to the most recent commit on a branch. For more about references, see references. For more about naming, see Names in Git. The primary command to show the history of commits is git log. Other commands useful for locating commits are git bisect and git blame.

DAG and Reachable Commits

The commits form logical Directed Acyclic Graphs (DAGs) and it makes sense to talk about relative positions among commits. Git has a special notation for that: relative commit names. In a Git commit graph, the set of reachable commits are those you can reach from a given commit by traversing the directed parent links. Conceptually, the set of reachable commits is the set of ancestor commits that flow into and contribute to a given starting commit.

Tag

A tag assigns an arbitrary yet presumably human readable name to a specific object, usually a commit. Each tag can point to at most one commit. The tag object contains the log message, author information and the commit it points to. The commit object points to a tree object, which encompasses the total state of the entire hierarchy of files and directories in the repository. A tag is a static name that does not change over time. You can give a tag and a branch the same name. There are two types of tags: lightweight and annotated. Only the annotated tags create objects in the repository.

Tags are created, listed and deleted with git tag. A specific tag can be checked out in a "detached head" with git checkout.

Names in Git

Reference

References available in a remote repository can be listed with git ls-remote.

Ref

Refspec

URL

Relative Commit Names

Local Repository

The repository maintained on a local filesystem that is currently interacted with is called the local or current repository.

Remote Repository

A repository maintained on a remote host, but with which files are exchanged, is called a remote repository. The references available in a remote repository can be listed with git ls-remote.

The local repository tracks a number of branches from any number of remote repositories, via remote-tracking branches.

Remote

A remote is named entity whose definition is maintained in .git/config that represents a reference to a remote repository. The remote can be seen as a short name for a long URL and other configuration information.

[remote "origin"]
       url = git@github.com:NovaOrdis/events-api.git
       fetch = +refs/heads/*:refs/remotes/origin/*

The 'url' is the URL of the remote repository. 'fetch' is a refspec that specifies how a local ref (which usually represents a branch) is mapped from the namespace of the source repository into the namespace of the local repository. The content of these branches will be transferred when git fetch is executed. Instead of specifying * that signifies all branches, individual branches can be listed on their own 'fetch' lines:

...
fetch = +refs/heads/dev:refs/remotes/origin/dev
fetch = +refs/heads/stable:refs/remotes/origin/stable
...

The remote definition maintained in .git/config can be manipulated with git config. The remote is used in assembling the full name for tracking branches, also declared in .git/config. Remotes can be listed, created, removed and manipulated with git remote.

"origin"

The "origin" is a special remote that refers to the repository the current repository was cloned from. The name "origin" is just a default value and it can be changed if so desired with the --origin option of the git clone operation.

Upstream Repository

Remote-Tracking Branch

Also known as remote-tracking branch. Manipulated with git fetch.

It is still a local branch, but "tracks" a remote branch from another repository.

Each Local Branch Has at Most One Configured Remote-Tracking Branch

Each local branch has at most one configured remote-tracking branch:

[branch "master"]
    remote = origin
    merge = refs/heads/master

A local branch can be merged into from a different, non-default remote-tracking branch, but that has to be specified explicitly in the command line.

Branch

Upstream Branch

Detached HEAD Branch

The default operation when working with branches is to check out the HEAD of the branch, by naming the branch in git checkout <branch-name>. However, it is possible to check out an arbitrary commit that is not the HEAD - in this case we end up with a detached HEAD branch:

git checkout <commit-id|tag-id>

Situations:

  • check out a commit that is not the HEAD of the branch.
  • check out a tracking branch - to explore changes recently brought into your repository from the remote repository. More about tracking branches here [Git Concepts#TrackingBranch].
  • check out the commit referenced by a tag.
  • start a [git bisect] operation.
  • use the Template:Git submodule update command.

Example:

{{{ $ git checkout d74deb597957558ed53d045bb8fdbb1adb98f64c Note: checking out 'd74deb597957558ed53d045bb8fdbb1adb98f64c'.

You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may do so (now or later) by using -b with the checkout command again. Example:

 git checkout -b new_branch_name

HEAD is now at d74deb5... Added hello2.txt on branch EXPERIMENTAL }}}

In this situation, git creates an anonymous branch called a detached HEAD.


To convert a detached HEAD into a branch that can be modified from that point forward (turning an anonymous branch into a named branch):

{{{ git checkout -b new_branch }}}

... or drop the detached HEAD branch:

{{{ git checkout <any-other-branch> }}}