Secret Scanning with Trufflehog

In today’s fast-paced software development environment, sensitive information may inadvertently be committed to your Git repositories and hidden within the commit history. This is a common issue that presents a significant security risk. With countless lines of code being continuously written and modified, it is easy for sensitive data such as API keys, passwords, and confidential access tokens to unintentionally be included in commits, potentially exposing your project to security breaches.

These oversights compromise not only data security but also the integrity of the entire project. To address this, Trufflehog offers a robust solution. It is capable of scanning the entire commit history to identify secrets that should not have been committed, using advanced pattern-matching techniques. By leveraging this tool, development and security teams can proactively detect and safeguard sensitive information, ensuring the security and integrity of the codebase.

Data Security Posture Management

The image below illustrates a model of Data Security Posture Management (DSPM), a comprehensive approach to maintaining and improving security measures to protect data assets. DSPM consists of four key components: Data, Security, Posture, and Management. Each element contributes to securing sensitive information, such as secrets, within an organization.

What is DSPM?

Data for Data Discovery and Classification Solutions: Understanding what data exists and where it resides is essential to preventing inadvertent exposure..

Security for Data Security Solutions: Implementing strong security controls and policies helps protect data from unauthorized access or exposure.

Posture for Solutions Showing Risk Posture for Data: Regular monitoring and analysis of security posture allow for a rapid response to vulnerabilities or misconfigurations.

Management for Solutions to Remediate Data Security Gaps: Proper governance ensures data security is aligned with organizational goals and industry standards.

Each component of DSPM is crucial, and they work in synergy. Secret scanning tools like Trufflehog, operate primarily in the ‘Data’ component by discovering data (secrets). However, the insights they provide feed into the ‘Security’, ‘Posture’, and ‘Management’ components, enabling organizations to maintain a robust security posture and manage their data security more effectively. By analyzing the flow from one component to the next, organizations can create a dynamic and responsive data security strategy that not only reacts to current threats but also proactively prepares for future challenges.

Trufflehog Secrets Scanner

Trufflehog offers a versatile solution for keeping your code’s secrets under wraps, available both as an open-source tool and as an enterprise solution. The open-source version is perfect for those who prefer accessible, community-driven tools, capable of scanning common platforms like GitHub and Docker. For larger organizations, the enterprise edition expands its reach to scan across 17 different sources, from Slack and GitLab to Azure DevOps and CircleCI, ensuring comprehensive coverage.

What’s impressive is Trufflehog’s arsenal of over 750 detectors, which efficiently identify anything that should remain confidential. It also seamlessly integrates with your GitHub workflow, automating the security checks with each new commit, making it easier to maintain a secure codebase.

Perhaps one of Trufflehog’s standout features is its customizability. You’re not just stuck with the standard settings; you can craft custom detectors and even set up a validation server to cross-check against your internal systems. This ensures that any secrets detected are not just false alarms but actual concerns that need attention.

Custom Detector and Validation Server

Next, we’ll dive deeper into the custom detectors and validation server, as they’re key to tailoring Trufflehog to your organization’s specific needs. It’s this level of customization that elevates Trufflehog from a good security tool to a great one, empowering teams to protect their projects proactively.

Custom Detector and Validation Server

The accompanying diagram provides a visual breakdown of how Trufflehog incorporates custom detectors and utilizes a validation server to authenticate the secrets it uncovers, ensuring they are current and active. As we walk through our case study, we’ll dissect each segment of this process to fully grasp how Trufflehog operates to secure your digital environment effectively.

Custom Detector and Validation Server

Ideas For Handling Detected Secrets and Preventative Measures

Upon the detection of a secret within our systems, prompt notification plays a pivotal role. The diagram shows how we can amplify this alert through various channels: Slack notifications for real-time team awareness, email alerts for detailed follow-up, and Jira tickets to kick-start the remediation process. These proactive alerts are essential to boosting our DSPM’s overall defensive stance.

Custom Detector and Validation Server

To effectively implement preventative measures and safeguard against potential code leaks, we recommend several proactive strategies. Firstly, it’s crucial to invest in developer education; this involves training on the secure handling of secrets within the codebase and instilling a quick-response mindset for identifying and addressing any leaks that do occur. Secondly, incorporating pre-commit hooks into tools like Git can serve as an automated defense mechanism, scanning and flagging sensitive data before it is even committed, thus stopping secrets at the source. Finally, setting up workflow actions adds another critical layer of automated checks that run with every new push or pull request, enhancing overall security. While these are our suggested strategies, it’s important to note that each organization may find additional or alternative measures that align better with their specific security needs and reduce the risk of sensitive data exposure.

Custom Detector and Validation Server

Case Study

The theoretical groundwork has been laid, and now it’s time to dive into a practical case study. Let’s examine the setup at TachTech, a company with an internal portal containing sensitive employee data, accessible only to HR and management. The source code for this portal is maintained on GitHub. Concerned about potential exposure of confidential information, the TachTech security team initiates a source code review to detect any embedded secrets.

Here lies the crux of the challenge: the portal is internal, and the patterns of secrets unique to TachTech aren’t inherently recognized by Trufflehog. To address this, the security team leverages Trufflehog’s capability to utilize custom detectors and a validation server. This approach tailors the scan to pinpoint the specific type of secret that may be lurking in the codebase repository. They craft a config.yaml file, configuring it to search for patterns indicative of authentication tokens, as follows:

detectors:
  - name: auth_token
    keywords:
      - Bearer
    regex:
      token: 'v[\w\W]{31}'
    verify:
      - endpoint: https://trufflehog.tachtech.net/portal

Now, let’s break down what each line in the config.yaml file signifies:

detectors - This key signifies the start of custom detector definitions.
name - Each detector has a name, which is used in the scanning logs to identify which detector found a secret.
keyword - This array lists specific words to look for that often precede secrets, giving Trufflehog a signpost for potential discoveries.
regex - The regex pattern defined here is what Trufflehog will use to recognize the format of a secret.
token - This is a custom key within the detector that attaches the detected secret for further processing.
verify - Points to a validation endpoint where the detected secrets are sent to confirm their activity and validity.

Next, let’s review the source code of the validation server, which runs on a separate machine.

import json
import requests
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging
 5
# Define a HTTP request handler class that processes verification of secrets
class SecretVerifier(BaseHTTPRequestHandler):
    # Endpoint for the authorization server
    AUTH_ENDPOINT = "https://portal.tachtech.net/api/v1/users"
10
    # Handle GET requests with an error since they are not supported
    def do_GET(self):
        self.respond_with_error(405, 'Method Not Allowed')
14
    # Handle POST requests to perform secret verification
    def do_POST(self):
        content_length = self.headers.get('Content-Length')
        if content_length:
            self.verify_secret(content_length)
20
    # Verifies the secret received in the request body
    def verify_secret(self, content_length):
        try:
            # Parse the JSON body of the request
            body = json.loads(self.rfile.read(int(content_length)))
            # Retrieve the token from the parsed JSON
            token = body.get('auth_token', {}).get('token')
            # If token is missing, respond with an error
            if not token:
                self.respond_with_error(400, 'Bad Request: Token not found')
                return
32
            # Check if the token is valid by sending it to the authorization server
            if self.is_token_valid(token[-1]):
                self.respond_with_success()
                logging.info("Valid secret detected: %s", token)
            else:
                self.respond_with_error(404, 'Not Found: Secret is invalid')
        # Handle any JSON parsing errors
        except json.JSONDecodeError:
            self.respond_with_error(400, 'Bad Request: Invalid JSON')
        # Handle any other exceptions
        except Exception as e:
            logging.error("An error occurred: %s", str(e))
            self.respond_with_error(500, 'Internal Server Error')
46
    # Sends a request to the authorization server to check if the token is valid
    def is_token_valid(self, token):
        response = requests.get(self.AUTH_ENDPOINT, headers={"Authorization": f"Bearer {token}"})
        return response.status_code == 200
51
    # Sends a 200 OK response with a success message
    def respond_with_success(self):
        self.send_response(200)
        self.send_header('Content-Type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps({'message': 'Secret is valid'}).encode())
58
    # Sends an error response with the specified status code and message
    def respond_with_error(self, code, message):
        self.send_response(code)
        self.send_header('Content-Type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps({'error': message}).encode())
65
# Entry point for the server application
if __name__ == "__main__":
    server_address = ('', 8000)
    # Setup the HTTP server with our handler class
    with HTTPServer(server_address, SecretVerifier) as server:
        # Configure logging to output information to the console
        logging.basicConfig(level=logging.INFO)
        logging.info("Starting verification server...")
        try:
            # Run the server indefinitely until a keyboard interrupt
            server.serve_forever()
        except KeyboardInterrupt:
            # Handle keyboard interrupt to gracefully shutdown the server
            logging.info("Stopping server...")
            server.server_close()
            logging.info("Server stopped successfully.")

Note how, in line 27 of the code, the validation server retrieves the detected secret from the request body. To accurately extract the secret, it uses the detector’s name auth_token to identify and pull the token value from the JSON payload. This ensures the server processes the correct piece of information for validation.

The final piece of code crafted by the security team is set to run a daily scan using a cron job, which alerts the security operations team via a Slack channel if any secrets are detected. This ensures proactive monitoring and immediate response to any security vulnerabilities in the repository.

#!/bin/bash
# Run Trufflehog scan on the portal.git repository and send notifications
trufflehog --repo https://github.com/org/repo.git --config=config.yaml --json > scan_results.json

# Extract results and send to Slack
python send_results_to_slack.py \
    --scan_results scan_results.json \
    --channel '#security-alerts' \
    --message 'Daily Secret Scanning Report' \
    --file scan_results.json

This script is ready to be scheduled as a cron job to run daily, ensuring that any secrets leaked in the source code are promptly detected and reported. The use of the --config=config.yaml parameter in the Trufflehog command is crucial as it specifies the custom detector settings that the scan should utilize.

Upon initiating the scan, the security team successfully identified an active secret in the portal.git repository. A notification was immediately sent to the relevant Slack channel, ensuring that the team could quickly address the security concern.

Slack Notification

Conclusion

In today’s digital landscape, securing sensitive information is crucial, especially when it may be exposed in public or private code repositories. For instance, in July 2023, hackers exploited a compromised Personal Access Token to inject malicious code into hundreds of GitHub projects, disguising it as Dependabot contributions¹. This highlights the importance of tools like Trufflehog, which enhance security by scanning for secrets that could lead to breaches. Integrating such tools into your security practices enables your team to proactively identify and address vulnerabilities. These tools detect and validate secrets, ensuring relevant alerts. As cybersecurity threats evolve, secret scanning tools provide essential protection, reinforcing your data security with every scan.

Armen
GitHub: @armartirosyan
Twitter: @armartirosyan

TachTech Engineering can help make your DevOps initiatives successful and align to best practices. Our sales team can help connect you to the TachTech Engineering team.

GitHub Repositories Hit by Password-Stealing Commits Disguised as Dependabot Contributions ↩