Git Concepts
Internal
Object Store
Git places only four types of atomic objects in the object store: blobs, trees, commits and tags. To use disk space and network bandwidth efficiently, Git compresses and stores the objects in pack files, which are also placed in the object store. The Git object store is implemented as a content-addressable storage system: each object has an unique name produced by applying a SHA1 function to the content of the object. Git is a content tracking system - it tracks content, and not file or directory names, which are associated with file content in secondary ways. The repository objects are stored in:
.git/objects/<first-two-digits-of-the-SHA1-value>/<the-rest-of-the-SHA1-value>
Git inserts a / after the first two digits to improve filesystem efficiency. Some filesystems slow down if you put too many files in the same directory; making the first byte of the SHA1 into a directory is an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.
The content of repository objects can be queried with git cat-file.
SHA1, Hash Code, Object ID
Each object in the Object Repository has an unique name produced by applying a SHA1 function to the content of the object. SHA1, hash code and object ID are used interchangeably.
Blob
Each version of a file is represented as a blob (binary large object), treated as opaque. A blob contains the file's data, but none of its metadata, not even the file name. Git internal database stores every version of every file - not their differences - as files are modified and go from one version to the next. Because Git uses the hash of a file's complete content as the name for that file, it must operate on each complete copy of the file. Because Git does not maintain deltas, diffs and patches are derived data, not the fundamental data they are in CVS or Subversion.
Tree
A tree object represents a directory. It records blob identifiers, pathnames, and metadata for all files in the directory. It also contains, recursively, other sub-tree objects. Also see [git write-tree]. The trees are stored in the object store and can be listed with git cat-file.
Commit
Tag
Names in Git
Reference
References available in a remote repository can be listed with git ls-remote.
Ref
Refspec
URL
Local Repository
The repository maintained on a local filesystem that is currently interacted with is called the local or current repository.
Remote Repository
A repository maintained on a remote host, but with which files are exchanged, is called a remote repository. The references available in a remote repository can be listed with git ls-remote.
The local repository tracks a number of branches from any number of remote repositories, via remote-tracking branches.
Remote
A remote is named entity whose definition is maintained in .git/config that represents a reference to a remote repository. The remote can be seen as a short name for a long URL and other configuration information.
[remote "origin"] url = git@github.com:NovaOrdis/events-api.git fetch = +refs/heads/*:refs/remotes/origin/*
The 'url' is the URL of the remote repository. 'fetch' is a refspec that specifies how a local ref (which usually represents a branch) is mapped from the namespace of the source repository into the namespace of the local repository. The content of these branches will be transferred when git fetch is executed. Instead of specifying * that signifies all branches, individual branches can be listed on their own 'fetch' lines:
... fetch = +refs/heads/dev:refs/remotes/origin/dev fetch = +refs/heads/stable:refs/remotes/origin/stable ...
The remote definition maintained in .git/config can be manipulated with git config. The remote is used in assembling the full name for tracking branches, also declared in .git/config. Remotes can be listed, created, removed and manipulated with git remote.
"origin"
The "origin" is a special remote that refers to the repository the current repository was cloned from. The name "origin" is just a default value and it can be changed if so desired with the --origin option of the git clone operation.
Upstream Repository
Remote-Tracking Branch
Also known as remote-tracking branch. Manipulated with git fetch.
It is still a local branch, but "tracks" a remote branch from another repository.
Each Local Branch Has at Most One Configured Remote-Tracking Branch
Each local branch has at most one configured remote-tracking branch:
[branch "master"] remote = origin merge = refs/heads/master
A local branch can be merged into from a different, non-default remote-tracking branch, but that has to be specified explicitly in the command line.