Introduction to Git

This article explains what Git is, a little about how it works internally, what a DVCS (Distributed Version Control System) is, its advantages, and more.

Flavio Silva • December 11, 2012

git

Introduction

If you are already familiar with Git and its distributed model, you can skip this brief introduction and move on to the Git Workflow article.

Git is popularly known as a Version Control System (VCS), but it's more than that. Git is a Software Configuration Management (SCM) system, because besides file version control, it also has software development process features such as integration team facilities, merging changes on files, tagging, branching, and more.

A brief history

Linus Torvalds started Git development in 2005 with some goals in mind: speed, simple design, strong support for non-linear development, fully distribution and ability to handle large projects (Chacon 5). 75 days after the beginning of its development, Git managed the release of the Linux Kernel 2.6.12 ("Git Software").

Git = Distributed Version Control System (DVCS)

Unlike many VCSs (such as SVN) that are Centralized Version Control Systems (CVCS), Git is a Distributed Version Control System (DVCS). This is a crucial difference that sets Git apart from most VCSs (that are CVCSs).

Git's model implies that there's no central repository, so each client is a repository, and not only has a copy of the versioned files. Typically, each client has its own local repository, and in fact all the work can be done by this way (in a single person project). But this is not recommended because any local hard disk issue might become a huge problem. You should have at least one more repository, in another machine, typically a remote repository on a web server.

For teamwork, that remote repository becomes a shared repository, acting as a facilitator for all team members. What one needs to do to keep everyone up to date is simply send (push) their changes to and get (pull) others changes from that repository. Thus the shared repository behaves similarly to a central repository, but this is very different from the centralized model, including the possibility of multiple shared repositories to coexist.

Advantages of a Distributed Model

Because each client has a full local repository, nearly every operation is done locally, which means plenty of speed and easiness. On the other hand, in a centralized model each client has only a copy of the files, so each commit, and many other operations, like consulting the change history, needs to connect to the central repository, often on a remote server, which results in connection, latency and speed issues.

Git's Guts

To really understand how Git works and use it effectively, it's essential to understand its basic concepts about storage and file versioning.

The way some other VCSs (e.g. SVN) handle files and versions is storing the base files and then saving the changes that were made (Chacon 6). So you have the original file and subsequent deltas for each change. Git works differently, for each file change (commit) Git stores a snapshot of all files, but it has intelligence to not duplicate unmodified files. (Chacon 6). So Git works much more like a mini filesystem.

That is the key about how Git works, and it's what makes it better and richer than other VCSs.

Now, head to Git Workflow to learn about the three file states on Git (modified, staged and committed), and their three corresponding Git sections (working directory, staging area and Git directory).

Interesting links

Bibliography

Chacon, Scott. Pro Git. Apress, 2009.
"Git Software" Wikipedia , n.d. Web. 19 January 2011 <http://en.wikipedia.org/wiki/Git_(software)>

Introduction to Git by Flavio Silva is licensed under a Creative Commons Attribution 4.0 International License.