Removing Files From Github
Once in a while when working with Github, you may accidentally commit a file (or group of files), that you didn’t intend to, and need to remove all traces of it from Github history. This situation often occurs when your .gitignore
isn’t correctly configured to ignore a particular file.
I recently encountered this situation with pytest
and it’s auto generated .pyc
files, and spent part of this morning figuring out what to do. Here’s what I found.
The Answer
This post on Githubs help articles isn’t the most beginner friendly, but is the perfect solution to the problem Removing sensitive data from a repository.
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA' \
--prune-empty --tag-name-filter cat -- --all
This command in particular is very dense, and no explanation is given to how it accomplishes our goal.
One way to understand whats happening, is to realize that the actual command is git filter-branch
and the rest of the string are simply options.
Command
filter-branch
Lets you rewrite Git revision history by rewriting the branches mentioned in the “rev-list options”, applying custom filters on each revision. Those filters can modify each tree (e.g. removing a file or running a perl rewrite on all files) or information about each commit.
Basically ‘filter branch’ will apply every option to the target file, for the entire history of your repository. This is extremely powerful, but disastrous if you modify the wrong file, as you won’t be able to run git revert
as the repository history will have changed.
If you’re looking a deeper dive into how git filter-branch
works, reading the official explanation on git-scm really helps to clarify the details.