Git Concepts

From NovaOrdis Knowledge Base
Revision as of 03:21, 18 December 2017 by Ovidiu (talk | contribs) (→‎Commit)
Jump to navigation Jump to search

Internal

Object Store

Git places only four types of atomic objects in the object store: blobs, trees, commits and tags. To use disk space and network bandwidth efficiently, Git compresses and stores the objects in pack files, which are also placed in the object store. The Git object store is implemented as a content-addressable storage system: each object has an unique name produced by applying a SHA1 function to the content of the object. Git is a content tracking system - it tracks content, and not file or directory names, which are associated with file content in secondary ways. The repository objects are stored in:

.git/objects/<first-two-digits-of-the-SHA1-value>/<the-rest-of-the-SHA1-value>

Git inserts a / after the first two digits to improve filesystem efficiency. Some filesystems slow down if you put too many files in the same directory; making the first byte of the SHA1 into a directory is an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.


The content of repository objects can be queried with git cat-file.

SHA1, Hash Code, Object ID

Each object in the Object Repository has an unique name produced by applying a SHA1 function to the content of the object. SHA1, hash code and object ID are used interchangeably.

Blob

Each version of a file is represented as a blob (binary large object), treated as opaque. A blob contains the file's data, but none of its metadata, not even the file name. Git internal database stores every version of every file - not their differences - as files are modified and go from one version to the next. Because Git uses the hash of a file's complete content as the name for that file, it must operate on each complete copy of the file. Because Git does not maintain deltas, diffs and patches are derived data, not the fundamental data they are in CVS or Subversion.

Tree

A tree object represents a directory. It records blob identifiers, pathnames, and metadata for all files in the directory. It also contains, recursively, other sub-tree objects. Tree objects are created from the index with git write-tree. The trees are stored in the object store and can be listed with git cat-file.

Commit

A commit object encapsulates an atomic changeset of the workspace. Internally, the commit object contains 1) metadata (author of the change, the committer, commit date and time, log message) 2) points to tree object that captures, in one complete snapshot, the state of the repository at the time the commit was performed and 3) the previous (parent) commit. By default, the author and the committer are the same, there are just a few situations when they're different. The commit objects are created internally with git commit-tree and the user-level commit mechanics is encapsulated in git commit.

The initial commit (the root commit) has no parent. The rest of the commits in a repository are derived from at least one earlier commit, where the direct ancestors are called parent commits. Most commits have one parent. When we commit after a merge, that commit has more than one parent. Each commit points back to its parent(s). The commit object are stored in a graph structure, different from the structure used by the tree objects. When you make a new commit, you can give it one or more parent commits. The HEAD commit an implicit reference pointing to the most recent commit on a branch. For more about references, see references. For more about naming, see Names in Git. The primary command to show the history of commits is git log. Other commands useful for locating commits are git bisect and git blame.

DAG and Reachable Commits

The commits form logical Directed Acyclic Graphs (DAGs) and it makes sense to talk about relative positions among commits. Git has a special notation for that: relative commit names. In a Git commit graph, the set of reachable commits are those you can reach from a given commit by traversing the directed parent links. Conceptually, the set of reachable commits is the set of ancestor commits that flow into and contribute to a given starting commit.

Tag

Names in Git

Reference

References available in a remote repository can be listed with git ls-remote.

Ref

Refspec

URL

Relative Commit Names

Local Repository

The repository maintained on a local filesystem that is currently interacted with is called the local or current repository.

Remote Repository

A repository maintained on a remote host, but with which files are exchanged, is called a remote repository. The references available in a remote repository can be listed with git ls-remote.

The local repository tracks a number of branches from any number of remote repositories, via remote-tracking branches.

Remote

A remote is named entity whose definition is maintained in .git/config that represents a reference to a remote repository. The remote can be seen as a short name for a long URL and other configuration information.

[remote "origin"]
       url = git@github.com:NovaOrdis/events-api.git
       fetch = +refs/heads/*:refs/remotes/origin/*

The 'url' is the URL of the remote repository. 'fetch' is a refspec that specifies how a local ref (which usually represents a branch) is mapped from the namespace of the source repository into the namespace of the local repository. The content of these branches will be transferred when git fetch is executed. Instead of specifying * that signifies all branches, individual branches can be listed on their own 'fetch' lines:

...
fetch = +refs/heads/dev:refs/remotes/origin/dev
fetch = +refs/heads/stable:refs/remotes/origin/stable
...

The remote definition maintained in .git/config can be manipulated with git config. The remote is used in assembling the full name for tracking branches, also declared in .git/config. Remotes can be listed, created, removed and manipulated with git remote.

"origin"

The "origin" is a special remote that refers to the repository the current repository was cloned from. The name "origin" is just a default value and it can be changed if so desired with the --origin option of the git clone operation.

Upstream Repository

Remote-Tracking Branch

Also known as remote-tracking branch. Manipulated with git fetch.

It is still a local branch, but "tracks" a remote branch from another repository.

Each Local Branch Has at Most One Configured Remote-Tracking Branch

Each local branch has at most one configured remote-tracking branch:

[branch "master"]
    remote = origin
    merge = refs/heads/master

A local branch can be merged into from a different, non-default remote-tracking branch, but that has to be specified explicitly in the command line.

Branch

Upstream Branch