Secrets Patterns DB: Building Open-Source Regex Database for Secret Detection

- 4 mins

Ensuring the security of your organization’s sensitive information is critical for any security team, and detecting secrets within your AppSec program is crucial to this effort. However, even if you have implemented advanced security controls, your program may still be at risk if passwords and API keys are committed to GitHub and subsequently exposed in a production environment, whether through a live web application, a compiled mobile app, or minified JavaScript code. To mitigate this risk, it is essential to properly secure and manage these secrets at every stage of the development process.

Detecting secrets is possible and can be automated. There are open-source tools for it that do an excellent job of analyzing the Git tree for potential secrets through two approaches:

Regular Expressions

A dataset of regular expressions (Regex) rules that point to valid and known patterns of passwords, API keys, API Tokens, and Cloud API Keys.

Pros

Cons

Shannon’s Entropy Checks

Shannon’s Entropy estimates the average amount of information stored in a random value. Shannon’s Entropy measures the predictable information contained in a message. It has a variety of use cases in Computer Science, including data compression, validating cryptography, and, here, finding passwords and secrets.

Pros

Cons

We’re doing Regex scanning wrong. Let’s fix this together

While several open-source tools utilize regular expressions to detect secrets in codebases, the number of built-in rules for these tools is limited. TruffleHog v2 offers approximately 40 rules, TruffleHog v3 offers around 790 patterns, and GitLeaks offers about 60 rules. While it’s a good start, more is needed.

This project was initially done before TruffleHog v3 was released. At that time, the largest rules database was GitLeaks, with 60 rules available. TruffleHog v3 helped a lot in collecting large datasets, but it still needs to be in a format that can not be ingestible with other tools since the new detector format is placed as Golang modules for each detection rule. This means we would have to use Trufflehog v3 if we would like to use their detection rules.

I have compiled and curated a database of regular expression patterns for secrets, API tokens, keys, and passwords to improve the detection of secrets in codebases. This project I built, Secrets-Patterns-DB, contains over 1600 patterns and is open-sourced in the hope that security teams will contribute to and improve it.

To ensure the quality and effectiveness of these patterns, I have written scripts to validate them against ReDoS attacks and created CI jobs to load and validate the patterns. I have also manually cleaned up any invalid patterns.

I encourage security teams to use and contribute to Secrets-Patterns-DB to enhance the security of their codebases.

The project is in Beta. There’s a lot of room for improvement on the project. I look forward to your Pull Requests and Issues on Github to enhance Secrets-Patterns-DB for everyone. Unified Pattern Format for all tools

The Secrets-Patterns-DB has a unified pattern format that can be converted to all tools of choice. If you use TruffleHog, GitLeaks, or other tools in your organization, Secrets-Patterns-DB can be exported to the format your tool supports.

For Trufflehog v2

$> ./convert-rules.py ./db/rules.yml trufflehog

For Gitleaks

$> ./convert-rules.py ./db/rules.yml gitleaks

And then, you can use the output rules with your tool.

Project: github/mazen160/secrets-patterns-db

License

This project is licensed under Creative-Common. If you’re building a tool or a product that uses Secrets-Patterns-DB, you should explicitly reference Secrets-Patterns-DB.

Mazin Ahmed

Mazin Ahmed

Thoughts of a hacker

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora