Track changes in a version control system¶

Version control (aka source control, revision control, etc.) helps you keep track of the changes you make to files in a project, their whole history. It’s one of those things that might seem unnecessary and hard to understand before you try it, because there are a few new commands. Once you try it, you gradually realize it is actually much easier than the alternative. Version control is an essential programming tool you should not have to live without.

Basic benefits of version control:

You can quickly rewind to any recorded version after a mistake. This lets you be a lot more confident about making changes. When using version control, it isn’t necessary to make lots copies of your project files ‘just in case,’ which quickly becomes a lot more trouble than using real version control.
Explicit versioning makes it a lot simpler to determine the relationships among copies and backups of your project, like determining which one is the newest, finding what changed, or reconciling divergent branches of development.
You can work with other people on the same set of files without accidentally overwriting each other’s work, or losing track of who made what changes when.

While version control often intimidates beginners, you can get most of the benefit by learning just a few basics. It’s not important to understand everything or know the fancy options. The important thing is to start making a habit of tracking all your changes with a version control tool, even for small projects.

This doc discusses basics of how to do this using a program called git (originally by Linus Torvalds). If you are new to version control, it is totally okay to just use git, even if you don’t completely love it at first. It is a tool that you can grow into.

Note

There are viable version control systems other than git. An alternative that’s also popular in the Python community is named mercurial (aka hg). I recommend git because this isn’t a choice which beginners to version control should have to research, and there are benefits to knowing basic git even if you use mercurial. Follow the mercurial tutorial if you prefer, but consider also learning the basics of git.

Basic git¶

git is one of the most popular, fastest and most flexible version control tools. It’s also used to share or download code on sites like GitHub or Bitbucket. Its main drawback is that it can seem complex and hard to learn. Don’t worry, it doesn’t have to be that way. Trying to learn advanced concepts and uses at the beginning makes things hard, and unfortunately many git tutorials do this. But git can also be used in a very simple way requiring much less knowledge, making it much easier to get started. From the standpoint of just getting things done, that’s much more important than knowing all the fancy uses and shortcuts. If you just stick to simple uses like what’s shown here, then you shouldn’t have a big problem getting started, and you will be better positioned to learn more advanced uses from other tutorials.

(You can also just use the official git tutorial. It’s good. I just try to boil things down a little more for beginners who want a very gentle introduction.)

In order to start using git, you’ll first have to install it. I’ll assume you are competent to figure that out by following git’s installation instructions. I also assume you know how to get to a command line and use it in a very basic way, which any serious programmer should know for other reasons anyway.

git init¶

When you first start using git on a project, you tell it something like this: “git, I want the current directory and everything below it to be part of a project that I will track in git.” From the command line, in a directory you want to be the top level of your project, you’d type:

git init

That creates a directory called .git under the current directory, containing all the special git stuff. But you can safely just ignore everything in there. It’s not for you, unless you get into the advanced stuff. git init creates a ‘git repository’ - a place where you can record changes by using git.

To use a git repository, you need to make sure git knows whenever you have changed files, by repeating a two-step process:

Use git add to tell it which files to start tracking, or track the latest changes to. You can repeat this for any number of files. It’s like selecting those files for what you are about to do next. This doesn’t really do much until you also do the second step with git commit.
Then use git commit to tell git to save all those changes as one ‘commit’ (like a snapshot of tracked files). Before this step, the changes aren’t permanently recorded yet.

Most daily usage of git is based on these simple operations.

git add¶

Here’s an example of that first step, git add. In English, this means “git, include the changes to foo.py in the next commit (and start tracking foo.py now if you haven’t been tracking it before)”:

git add foo.py

You can run git add on as many files as you want, and they will all be included in the next commit. Doing them one at a time lets you make sure you don’t include anything you don’t want to (like secret password files or gigantic temporary files). If you want to add lots of changes at once, I’ll let you research that on your own.

As long as you haven’t committed yet, you can also tell git that you want some other changes to be tracked, and they will all be included together as one snapshot-like “commit” recording the states of tracked files.

Here is a really important detail. Whenever you make further changes to a file that git is already tracking, you should run git add on that file AGAIN. The next time you commit, git will record the changes on that file from the last version you told git about. If you don’t do this, your changes to that file won’t be saved in git.

git rm¶

Now suppose there is a file being tracked that you want to get rid of (and also delete from your disk!) Use git rm for that. The following means: “git, delete bar.py from disk AND stop including it in the stuff you’re tracking.”

git rm bar.py

Just like with git add, the change won’t be properly recorded until you issue a git commit.

After doing this, bar.py is gone from disk. Not just that: if you make another file named bar.py, git isn’t tracking that new file until you tell it to. (The only way to get back the old file at this point is to ‘reset’ to a previous point, which isn’t covered here because it requires more advanced git commands. Anyway, the data you committed is still in there somewhere if you need it.)

git mv¶

If you want to rename or move a file that git tracks, the best thing is to do it with git. For example, this tells git to “rename boo.py to moo.py” (let’s assume boo.py is already tracked, and there is no file named moo.py):

git mv boo.py moo.py

You can keep using git add, git rm and git mv in any order you want, until you are ready to make a new snapshot or “commit”. Each commit groups together the add/rm/mv commands which preceded it, back to the previous commit. If I git add two files, commit, then git add another three files, and commit, I end up with two commits: the first commit adding two files, and the second adding three files.

git commit¶

Whatever commands you’ve used to select changes to record, you always need to use git commit to actually record those changes. If you don’t commit, then the changes are not kept in the repository by git. Every time you run git commit, git requires you to always specify a “commit message” describing what that commit does. This turns out to be a really good idea, since it is so helpful to know why things were changed later, and yet so easy to forget.

Here’s an example of how to use git commit. The following command means: “git, make a snapshot which includes all the changes I’ve given with earlier commands like git add and git rm. Attach to this snapshot a note describing the content of these changes”:

git add foo.py
git rm bar.py
git commit -m "add foo module and remove bar module"

Try to group together changes which are related to each other and always include a commit message meaningfully describing what those changes were meant to achieve. You should probably read about standard ways of formatting git commit messages.

git status¶

Usually you won’t remember what has changed that you might want to add to a commit. You might forget what you already added or marked for removal in the next commit. You might not know what files aren’t yet being tracked. This is where git status gives you a summary of where things stand:

git status

That should tell you where things stand right now in your git repository: whether there are changes to tracked files you might want to record using git add, untracked files you might want to start tracking with git add, what changes will be recorded at the next commit, etc.

git log¶

Whenever you want to look at the history of the project, you can get a longer listing of historical changes like this:

git log

Using this and other commands you can easily look at any commit in the history, what changed between any two commits, and many more ways of analyzing the history of a git repository.

That covers the basic steps to make sure that git keeps track of the changes you are making to your files. As long as you make sure git tracks the history of your project, that history is saved in a well-organized way. You can figure out what to do with all this history later. The vital thing is to always track the history of your projects using a version control tool, because otherwise that data is likely to be lost.

clone, pull and push¶

You can just use git on your local machine if you want. But one of its advantages is how easy it makes it to grab code, keep it up to date, share it and copy it around.

Sometimes you want to check out some code that others have made available. You do this by ‘cloning the repository’. This means: “git, make a repository just like the one at the remote site http://example.com/foo.git, in the local directory named foo. And assign the nickname ‘origin’ to mean http://example.com/foo.git”.

git clone http://example.com/foo.git

Similarly, if you want to clone a git repository stored locally on your computer under /somedir/blah/foo (arbitrary example):

git clone /somedir/blah/foo

After that, you might sometimes want to update a cloned repository to the latest available versions of files. Here is a simple command which will do that (but only in the simple case where you have not made any changes to your copy which conflict with the recent updates):

git pull --ff

If there were no updates, git will just say: Already up-to-date.

In case there were conflicting changes to the same files, that is a problem best resolved by someone who knows more about git. You will want to read about topics such as git fetch, git merge and perhaps git rebase. At this point you should be using a real git tutorial instead of this page.

Other times you want to send your code somewhere else, like Github. I won’t cover setting that up here since it depends on what service you want to use. Github has its own simple help to read for creating a GitHub repo where you can send your code to be shared publicly. If you want to store your code more privately or don’t like Github, consider creating a Bitbucket repo. (The more technically inclined might sync up repositories on multiple machines using ssh, or set up their own remote repositories to function something like GitHub or Bitbucket.)

Once you have a repository, here is a preview of what it looks like to send your code there. This is git-ese for “send your main line of work (i.e., master) to the remote repository nicknamed ‘origin’”:

git push origin master

Understanding these words is a deeper subject. The main thing is: keep tracking your code and keep it backed up.

.gitignore¶

When you use git status you will often see a long list of ‘untracked files’. If you want a file to stop showing up in this list, you can add its name to a file called .gitignore. .gitignore is just a normal text file. You don’t have to do this, but you can also git add .gitignore to tell git to start tracking changes to this file. Often people don’t do this, because they think others who clone the code might not want to ignore those files.

For more information on gitignore, see the gitignore man page.

Other resources¶

If this was easy enough and you are hungry for more, also check out the official git tutorial. Actually, as of 2014, there are now a large number of good git tutorials and books out there, so if you want to learn more about git you should do some digging for yourself. git’s man pages are famously a bit difficult to leran from, but if you are very technically-minded then you may enjoy them.

There are also several popular services for hosting your repositories online, which you may be interested in checking out.