Git Concepts

From NovaOrdis Knowledge Base
Revision as of 02:57, 18 December 2017 by Ovidiu (talk | contribs) (→‎Object Store)
Jump to navigation Jump to search

Internal

Object Store

Git places only four types of atomic objects in the object store: blobs, trees, commits and tags. To use disk space and network bandwidth efficiently, Git compresses and stores the objects in pack files, which are also placed in the object store. The Git object store is implemented as a content-addressable storage system: each object has an unique name produced by applying a SHA1 function to the content of the object. Git is a content tracking system - it tracks content, and not file or directory names, which are associated with file content in secondary ways. The repository objects are stored in:

.git/objects/<first-two-digits-of-the-SHA1-value>/<the-rest-of-the-SHA1-value>

Git inserts a / after the first two digits to improve filesystem efficiency. Some filesystems slow down if you put too many files in the same directory; making the first byte of the SHA1 into a directory is an easy way to create a fixed, 256-way partitioning of the namespace for all possible objects with an even distribution.


The content of repository objects can be queried with git cat-file.

SHA1, Hash Code, Object ID

Each object in the Object Repository has an unique name produced by applying a SHA1 function to the content of the object. SHA1, hash code and object ID are used interchangeably.

Blob

Tree

Commit

Tag

Names in Git

Reference

References available in a remote repository can be listed with git ls-remote.

Ref

Refspec

URL

Local Repository

The repository maintained on a local filesystem that is currently interacted with is called the local or current repository.

Remote Repository

A repository maintained on a remote host, but with which files are exchanged, is called a remote repository. The references available in a remote repository can be listed with git ls-remote.

The local repository tracks a number of branches from any number of remote repositories, via remote-tracking branches.

Remote

A remote is named entity whose definition is maintained in .git/config that represents a reference to a remote repository. The remote can be seen as a short name for a long URL and other configuration information.

[remote "origin"]
       url = git@github.com:NovaOrdis/events-api.git
       fetch = +refs/heads/*:refs/remotes/origin/*

The 'url' is the URL of the remote repository. 'fetch' is a refspec that specifies how a local ref (which usually represents a branch) is mapped from the namespace of the source repository into the namespace of the local repository. The content of these branches will be transferred when git fetch is executed. Instead of specifying * that signifies all branches, individual branches can be listed on their own 'fetch' lines:

...
fetch = +refs/heads/dev:refs/remotes/origin/dev
fetch = +refs/heads/stable:refs/remotes/origin/stable
...

The remote definition maintained in .git/config can be manipulated with git config. The remote is used in assembling the full name for tracking branches, also declared in .git/config. Remotes can be listed, created, removed and manipulated with git remote.

"origin"

The "origin" is a special remote that refers to the repository the current repository was cloned from. The name "origin" is just a default value and it can be changed if so desired with the --origin option of the git clone operation.

Upstream Repository

Remote-Tracking Branch

Also known as remote-tracking branch. Manipulated with git fetch.

It is still a local branch, but "tracks" a remote branch from another repository.

Each Local Branch Has at Most One Configured Remote-Tracking Branch

Each local branch has at most one configured remote-tracking branch:

[branch "master"]
    remote = origin
    merge = refs/heads/master

A local branch can be merged into from a different, non-default remote-tracking branch, but that has to be specified explicitly in the command line.

Branch

Upstream Branch