Git Myths Debunked: What Even Some "Experts" Often Get Wrong

Git Myths Debunked: What Even Some "Experts" Often Get Wrong

·

6 min read

I am definitely, absolutely, positively,**not an expert on the Git source code management system. I'm just old and opinionated. This is a short list of untrue things that I've found a vast number of developers seem to believe about Git.

You need a central place like Github, Gitlab, or Bitbucket, to host a Git repo

You'll hear many people talk about Git being a distributed SCM system, but it's rare they understand what that really means. Part of the confusion stems from the way that traditional SCM systems (e.g. Subversion, CVS) have been used: you have a central repository of truth, and developers push and pull changes to it. This is the model that most users will be familiar with, and it has a ton of benefits, but unlike SCM systems of yore, it's absolutely not necessary with Git.

The key difference is that when you clone a repository, you are doing exactly that: making an exact copy. Your copy is a full-on repository, every bit as valid as the one you cloned from. If you cloned it to a place where multiple people have ssh access, other users can push to and pull from it directly, as if it were hosted by one of the major providers. Obviously you have to ensure the filesystem permissions are correctly set up to make this work with your team, and most of the time you'll probably want the other features the providers offer (CI, code-review etc etc), but it's good to know it's possible.

Additionally, it allows you to do things like have a backup copy of a repository on a remote machine; all you need is to be able to ssh to it. Create a empty repo on the remote machine, add a new remote to your local repo, and you can push and pull to your hearts content, with no extra infrastructure required. Don't want to push your unfinished experimental branch to the production Git repo? Push it to your backup instead.

Some tools, such as Gerrit, take full advantage of the distributed aspect of Git by using multiple repositories for development and production. Changes can be pushed to the development system, worked on until they are ready, and then only the final version of the code is committed to the production repo, keeping out a lot of unnecessary junk.

Git is only useful if you have multiple developers

Even if you're writing a quick shell script for your own use, putting it into a tiny little Git repo has a lot of advantages, and it's easy: all you have to do is git init and git add. It doesn't have any disadvantages. But what does it give you? Well right off the bat, it gives you a backup. You can commit at every stage of your development and then always get the code back if you should accidentally delete it or bugger it up in some way. That alone should be enough reason to do it. But there are a bunch of other advantages too. For now I'll just mention one: git bisect.

git bisect is a criminally underused feature of Git that is phenomenally useful for tracing bugs in your code. I think that part of the reason people don't use it is that the Git documentation can be a little...confusing. But it's dead easy. Really!

Let's say there is a bug in your code, and you want to find where it crept in. Summon the power of bisect by commanding:
git bisect start

Assuming you're at the HEAD of your branch which contains the bug, let Git bisect know by typing:
git bisect bad

Then check out a revision of the code that doesn't have the bug (git log will probably be useful for locating that). Once a good revision is checked out, let bisect know with:
git bisect good

And you're off!
Bisect will check out another revision for you. Check if this revision exhibits the buggy behaviour, and if it does, type:
git bisect bad

and if it doesn't, type:
git bisect good

Keep doing this, and bisect will eventually be able to tell you exactly which revision caused the bug.

This is the tip of git bisect's iceberg; it can do way more than mentioned here, including automatically testing if the bug exists at each step, so you can let it go off and hunt the bug itself, without you having to let it know good or bad at each step.

"Forking" is a Git thing

A fork is just a copy of a repository. As was mentioned above, a clone is a copy of a repository, which is probably why there's no need for a git fork command. Github created a fork feature that lets you clone a repo and logs the fact you did it in a database. As a result, they've popularised the expression "fork" to such a degree that people talk about "forking repos" as if there was something special about it. If you need to fork a repo, just clone it. No magic involved.

Deleting things actually deletes them

Actually removing commits from a Git repository is difficult. If you accidentally commit sensitive info to a Git repository, deleting it from your local repo and then committing it will not do the trick; it's still there in the repo. In fact, unless you have direct access to the metadata in the repo - you simply can't do it [please correct me if I'm out of date].

Github, at least, provides a way to delete things properly, but it's not a normal part of Git. And you really need to read the Github docs to ensure you delete things properly; even then you have to trust that Github is actually deleting it.

From the point of view of someone who likes to audit things, this is great: you can trace things back to a level of accuracy that would impress CSI. Whilst this means the repo contains all of the broken, embarrassing, things you'd rather forget about, it also means that it contains the stuff you thought you'd accidentally deleted. Unless you invoke some meta-deletion stuff, you can always get it back.

A classic example is when people delete branches or tags: all you're deleting is effectively a bookmark to a commit. The commit, and therefore the commits that formed the rest of the branch, are still there. You can get that branch/tag back. All you need to do is find some way to identify the commit - a comment for example. Then you can search through the repo to find the hash id, and either recreate the branch or tag it. git reflog and the --grep-reflog flag are what you need to Google for that. Just feel safe in the knowledge that unless you went out of your way to remove metadata, they'll still be in the repo. Relax.