Preventing Sensitive Data Leaks with Gitleaks in GIS Projects

In the previous article 'Apple's Source Code Leak Incident: Key Takeaways for GIS Frontend Development', the author mentioned the Apple front-end source code 'leak' incident and highlighted several serious security risks in current front-end development. So, besides improving developers' skills, is there a more perfect detection mechanism for these risks? If you are a team leader, how can you avoid these problems as much as possible? The author believes that in addition to proper build configuration and production environment security hardening, a mechanism for continuous detection of sensitive information submitted to the repository is needed, and Gitleaks is such a tool.

What is Gitleaks?

Gitleaks is an open-source tool that can scan Git repositories (including commit history) or directories/files to detect hardcoded sensitive information, such as passwords, API keys, tokens, credentials, etc. It supports multiple scanning modes (e.g., git mode, dir mode, stdin mode) as well as custom rules, ignore rules, baseline reports, etc. Its installation methods are flexible: it supports Homebrew (Mac), Docker images, Go source builds, etc. The community is active, with nearly 24k stars on GitHub, and it is widely adopted.

In short: if your project may have sensitive credentials, tokens, or keys (especially in front-end, back-end, DevOps, CI/CD processes) accidentally submitted or left in history, Gitleaks is a tool that significantly adds assurance.

Official website: https://gitleaks.io/

GitHub address: https://github.com/gitleaks/gitleaks

Installation and Usage

Example (using Mac + Homebrew):

brew install gitleaks

Docker method:

docker pull ghcr.io/gitleaks/gitleaks:latest
docker run -v ${host_folder}:/path ghcr.io/gitleaks/gitleaks:latest [COMMAND] [OPTIONS] /path

Source code method:

git clone https://github.com/gitleaks/gitleaks.git
cd gitleaks
make build

Basic Usage

Gitleaks supports several modes, mainly as follows:

git mode: Scan Git repositories (including history differences).
```
gitleaks git -v path_to_repo
```
dir (or directory, files) mode: Scan specified directories or files.
```
gitleaks dir -v path_to_directory_or_file
```
stdin mode: Read input from stdin stream and scan.
```
cat some_file | gitleaks -v stdin
```

The author uses the first mode.

Implementation Suggestions (Combined with GIS Projects)

Through exploration, the author has summarized the following experiences:

Introduce Gitleaks early in development: Do not wait until deployment or when problems are discovered to start scanning. It is recommended to integrate Gitleaks into CI or pre-commit processes early in the project. The author uses Husky integrated with Gitleaks.
Define a list of sensitive information: Based on your business (e.g., map tokens, map platform credentials, IoT device API keys, database connection strings, third-party service keys, etc.), compile a list of "what constitutes sensitive information" and then customize rules for them in the Gitleaks configuration.
Scan historical repositories: Review existing repositories (especially those with long-term accumulation that may have old commits and issues) for a comprehensive scan, including commit history. If sensitive information is found, it is recommended to perform Git history cleaning or key updates.
Clarify operational procedures: Once sensitive information is detected, establish an emergency process: key revocation, credential review, code or configuration refactoring, repository history cleaning (e.g., using git filter-repo / BFG), and record audits. The author currently uses a runner on a private GitLab to scan for sensitive information; if any is reported, merging to the main branch is not allowed.

Summary

The author's practice shows that common token leaks can be easily detected, but it is not 100% effective. Some tokens may still not be detected and require custom rules to intercept. The advantage of this tool is that it can detect some basic security issues at the earliest stage, avoiding simple accidents, especially when team members are numerous and skill levels vary. However, to achieve higher security detection, this may not be sufficient, as security has no upper limit. Everyone should make decisions based on their team's practical situation.

MalaGIS
Sharing GIS Technologies, Resources and News.