How to detect and clean up leaked secrets in your Git Repositories

Introduction

At multiple points in your software or infrastructure development lifecycle, you are going to need to use secrets to interact with your environment. These will be things like ssh keys, connection strings, API tokens etc.

At some point in your CI processes, you should have a secret scanning tool running, which detects leaked secrets and can prevent such secrets from being merged into codebases. Ideally, you will also have a background scanning process that is routinely scanning all of your repositories for leaked secrets periodically.

There is quite a lot of content out there on this subject, but we couldn't find many full practical examples. This document describes the process of detecting and cleaning up secrets that have leaked into Git source control.

Note: All the scripting is in bash and assumes you are switching out the example directory paths when using for your own purposes

Test repo

Pull down the example repo for example code and dummy secrets for testing. Most of the following scripts assume a clone is available @ /c/source/secretleaks. Please amend as appropriate if following along. You won’t be able to push to our repo so you just omit these steps.

Detecting

There are a number of tools that can be used interactively and on pipeline agents to detect secrets. Two that we use are GitLeaks and Trufflehog and we will look at using both interactively here. The following code examples are running within a Bash shell (Ubuntu 20.04 in WSL in my case).

Gitleaks

Helper install script. Official install notes are here:

get_latest_release() {
    curl --silent "https://api.github.com/repositories/119190187/releases/latest" |
    grep '"tag_name":' |
    sed -E 's/.*"([^"]+)".*/\1/'
}

version=$(get_latest_release)
curl -L https://github.com/zricethezav/gitleaks/releases/download/$version/gitleaks_${version:1}_linux_x64.tar.gz --output gitleaks.tar.gz
tar -xzvf gitleaks.tar.gz && rm gitleaks.tar.gz
chmod +x gitleaks
sudo mv gitleaks /usr/local/bin/gitleaks

To run Gitleaks move to the root directory of the Git repository you would like to check and execute:

cd /c/source/secretleaks
gitleaks detect -v --redact --no-banner
# -v: verbose
# --redact: redact secrets from logs and stdout
# --no-banner: suppress banner

When a secret is detected it is reported like so:

Finding:     apikey = "REDACTED""
Secret:      REDACTED
RuleID:      generic-api-key
Entropy:     3.659387
File:        dummy_secrets/apikey.txt
Line:        1
Commit:      078be5ef841edd9a4903b92676e7a7116c697081
Author:      Stuart Anderson
Email:       stuart.anderson@blakyaks.com
Date:        2024-06-14T08:46:56Z
Fingerprint: 078be5ef841edd9a4903b92676e7a7116c697081:dummy_secrets/apikey.txt:generic-api-key:1

Trufflehog

Helper install script. Official install notes are here:

get_latest_release() {
  curl --silent "https://api.github.com/repos/trufflesecurity/trufflehog/releases/latest" |
  grep '"tag_name":' |
  sed -E 's/.*"([^"]+)".*/\1/'
}

version=$(get_latest_release)
curl -L https://github.com/trufflesecurity/trufflehog/releases/download/$version/trufflehog_${version:1}_linux_amd64.tar.gz --output trufflehog.tar.gz
tar -xzvf trufflehog.tar.gz && rm trufflehog.tar.gz
chmod +x trufflehog
sudo mv trufflehog /usr/local/bin/trufflehog

To run Trufflehog move to the root directory of the Git repository you would like to check and execute:

cd /c/source/secretleaks
trufflehog filesystem . --fail --no-update --only-verified
# filesystem: The path to scan on the local filesystem
# --fail: Exit with code 183 if results are found
# --no-update: Don't check for updates
# --only-verified: Only output verified results

Trufflehog can aggressively flag things like GUIDs so you may need to use the `--only-verified` flag on some repositories.

When a secret is detected it is reported like so (redacted manually as Trufflehog doesn't have this feature):

Detector Type: AWS
Decoder Type: PLAIN
Raw result: AKIAQYLPMN5HHHFPZAM2
Is_canary: true
Message: This is an AWS canary token generated at canarytokens.org, and was not set off; learn more here: https://trufflesecurity.com/canaries
Arn: arn:aws:iam::052310077262:user/canarytokens.com@@c20nnjzlioibnaxvt392i9ope
Resource_type: Access key
Account: 052310077262
File: .git/objects/38/91b66a3876955ccd17d6e41810a4b4de4b246d
Line: 1

You will notice that we don’t have as many flags from Trufflehog. This is because the tool actively tries to prevent notifying you about false positives.

Clean up

One of the big issues with leaking a secret into Git is that the secret leak is preserved within the commits in the history of the repo. So just removing the secret doesn’t do much for you. You have to clean out the whole history where that secret has leaked. The upshot is that it is a non-trivial and potentially destructive operation, so care needs to be exercised.

Git recommend using the BFG Repo-Cleaner so we will look at that here.

Warning: Some of the clean up tasks have scope to destory your repo. 
ALWAYS make sure you have a good backup

Install

Helper Install script (A Java 8+ runtime environment is required):

function get_latest_version() {
    curl -s https://repo1.maven.org/maven2/com/madgag/bfg/ | grep -oP '(?<=href=")[0-9]+(\.[0-9]+)*' | sort -V | tail -n 1
}

latestVersion=$(get_latest_version)
sudo curl -L https://repo1.maven.org/maven2/com/madgag/bfg/$latestVersion/bfg-$latestVersion.jar -o /usr/local/bin/bfg.jar
sudo chmod +x /usr/local/bin/bfg.jar
echo "alias bfg='java -jar /usr/local/bin/bfg.jar'" >> ~/.bashrc
source ~/.bashrc

BFG will clean up our git history for us but not our current commit. First we should clean up any current branches and manually commit. We will then worry about the history.

Take a backup of the repo before this step. 

Let's create a file that contains secrets to redact from the repository:

echo "DUMMY2A93A28E9999936C89221856EC98B8FFC" >> /c/temp/secrets.txt
echo "1tUm636uS1yOEcfP5pvfqJ/ml36mF7AkyHsEU0IU" >> /c/temp/secrets.txt

Remove from the current branch:

DIRECTORY_TO_SEARCH="/c/source/secretleaks"
SECRETS_FILE="/c/temp/secrets.txt"
total_files=0
files_processed=0

sed_command=""
while IFS= read -r secret
do
  # Add escaping for special characters
  escaped_secret=$(printf '%s\n' "$secret" | sed -e 's:[][\/.^$*]:\\&:g')
  sed_command+="s/$escaped_secret/***REMOVED***/g;"
done < "$SECRETS_FILE"

total_files=$(find "$DIRECTORY_TO_SEARCH" -type f | wc -l)

find "$DIRECTORY_TO_SEARCH" -type f -print0 | while IFS= read -r -d '' file
do
  echo "Processing file $((++files_processed)) of $total_files: $file"
  # Use the -e option to execute the command
  sed -i -e "$sed_command" "$file"
done

echo "Processing completed. $files_processed files processed."

Example of removing sensitive files (SSH key related files for example):

DIRECTORY_TO_SEARCH="/c/source/secretleaks"
SSH_FILES=("id_rsa" "id_dsa" "id_ecdsa" "id_ed25519" "known_hosts" "authorized_keys" "config" "ca.pem" "key.pem")

for file_name in "${SSH_FILES[@]}"; do
  find "$DIRECTORY_TO_SEARCH" -name ".git" -prune -o -name "$file_name" -type f -print -exec rm -f {} \;
done

Commit back to source:

git add -A && git commit -m "Security cleanup" && git push

You now need to pull down a new, full clone of the repo to be fixed. Taking a backup in case anything goes wrong:

cd /c/source/
git clone --mirror git@github.com:blakyaks/secretleaks.git
tar -czvf secretleaks.tar.gz secretleaks.git/

We can now use BFG to redact the secrets from the history:

echo "DUMMY2A93A28E9999936C89221856EC98B8FFC" > /c/source/secrets.txt
echo "1tUm636uS1yOEcfP5pvfqJ/ml36mF7AkyHsEU0IU" >> /c/source/secrets.txt
bfg --replace-text secrets.txt secretleaks.git
cd secretleaks.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push
rm /c/source/secrets.txt

I won’t show an example of removing the ssh keys as the example documentation in good here.

Conclusion

It is trickier to remove secrets from git than you might imagine if you haven’t tried it before so hopefully this document will provide some helpful insights in to the process. It is of course better to never let the secrets in there in the first place so good Git hygiene and CI practices should always be adhered to.

Stuart Anderson

Chief Technology Officer

Previous
Previous

Meet the Herd: Paul Jones

Next
Next

YakChat - Episode 5: Azure AI with Microsoft