Data Leakage Discovery

Data Leakage Discovery is the cybersecurity use case that focuses on proactively searching and identifying an organization's sensitive, proprietary, or confidential data that has been unintentionally exposed or released outside its secured perimeter. This exposure typically results from misconfigurations, human error, or vulnerabilities in external-facing assets. The goal is to discover the leak before threat actors exploit it.

Key examples of leaks ThreatNG seeks to discover include:

Hardcoded credentials (passwords, API keys) in publicly accessible source code.
Misconfigured cloud storage (like an open S3 bucket) containing customer PII or internal documents.
Forgotten development or staging servers with sensitive data accessible through search engines.

How ThreatNG Helps with Data Leakage Discovery

ThreatNG, as an External Attack Surface Management and Digital Risk Protection solution, provides continuous, unauthenticated visibility into the external environment where data leaks reside.

External Discovery

ThreatNG performs purely external unauthenticated discovery using no connectors. This is the foundation for Data Leakage Discovery, as it maps the assets that could be leaking data, often those missed by internal tools ("shadow IT").

Example: It continuously scans the internet to find forgotten or unmanaged assets, such as a staging server with an open directory index or a subdomain used years ago for a promotional campaign, which could be hosting an old, exposed file.

External Assessment

ThreatNG's assessments highlight the overall risk profile related to data exposure:

Data Leak Susceptibility: The comprehensive investigation modules below provide the necessary data to calculate this score.
- Example: By finding multiple instances of Associated Compromised Credentials linked to employee email domains, ThreatNG flags a high risk of current or imminent data exposure stemming from internal accounts that are already compromised.
Web Application Hijack Susceptibility: This score, substantiated by Domain Intelligence, helps identify misconfigured domains that could be used to host fake sites to harvest credentials, which are a form of data leak.
- Example: It assesses a web application's exposure and may detect an unsecured login page that is easily discoverable or a domain that lacks proper security configurations, increasing the chance of credential exposure.

Reporting

ThreatNG delivers actionable reports to facilitate rapid remediation of leaks:

Prioritized Report: ThreatNG utilizes its granular findings to classify data leaks by severity. For example, the discovery of a database backup file or a private encryption key exposed on an archived web page would be flagged as a Critical risk, complete with the path to the exposed file, ensuring immediate action.

Continuous Monitoring

ThreatNG performs continuous monitoring of the external attack surface and digital risk. This is vital because exposed data can appear and disappear rapidly (e.g., a file being accidentally made public for a short time).

Example: ThreatNG consistently monitors the public Dark Web and code repositories. Suppose an employee accidentally pushes a file with API keys to a public repository at 2 AM. In that case, the continuous monitoring detects it instantly, enabling the security team to revoke the key before threat actors can use it.

Investigation Modules

The Investigation Modules are the core components used to find and validate the leaked data itself:

Sensitive Code Exposure: This module primarily addresses technical leaks.
- Example: It scans code-sharing platforms (like GitHub) and mobile apps to find exposed Access Credentials (e.g., a hardcoded AWS Access Key ID) or Security Credentials (e.g., an RSA Private Key) within code, configuration files, or other documentation.
Archived Web Pages: This module checks for data that was made public in the past and is still accessible via web archives.
- Example: It discovers an archived version of a former employee's Document File (PDF) that contains sensitive, unredacted client PII that should have been securely destroyed. It also checks for the archival of API definitions, JSON Files, and Directories that might expose infrastructure details or data.
Dark Web Presence: This module confirms when a leak has been actively exploited or is being sold.
- Example: It identifies Associated Compromised Credentials for the organization's user base being traded on a dark web forum, confirming that a data leak event (or credential stuffing source) is active.
Technology Stack: While primarily for asset tracking, this can pinpoint misconfigured technologies.
- Example: It identifies the version of Web Servers or Databases being used. If the version is known to have a default password or a specific public configuration that leads to data exposure, the risk is elevated.

Intelligence Repositories (DarCache)

The Intelligence Repositories provide the context and raw data needed to identify mass leaks:

Compromised Credentials (DarCache Rupture): A massive database of leaked credentials used to cross-reference against employee and customer identities, immediately identifying high-risk, leaked accounts.
Dark Web (DarCache Dark Web): A continuous feed of intelligence from closed sources and forums, often containing early mentions or samples of leaked corporate data or trade secrets being offered for sale.

Working with Complementary Solutions

ThreatNG's external, high-fidelity intelligence is crucial for triggering automated containment and response in complementary solutions.

Data Loss Prevention (DLP) Tools: ThreatNG’s Sensitive Code Exposure module identifies a proprietary algorithm’s source code and an employee's name in a public repository. This highly actionable intelligence is sent to the complementary DLP tool, which primarily operates inside the network. The DLP tool can use this external evidence to identify the specific internal workstation or process responsible for the data exfiltration and then automatically block the user's ability to copy, print, or transfer any further sensitive files, thereby stopping an ongoing leak.
Cloud Security Posture Management (CSPM) Tools: ThreatNG's External Discovery and Archived Web Pages modules identify publicly accessible Subdomains hosting outdated, misconfigured cloud storage links (e.g., publicly open Azure Blob Storage links). This external exposure finding is sent to the complementary CSPM tool. The CSPM tool can immediately use this information to scan its internal configuration baselines, locate the specific misconfigured cloud resource, and then automatically remediate the configuration (e.g., change the access policy to private), instantly closing the data leak path identified externally by ThreatNG.