Identity Harvesting
Identity Harvesting in the context of cybersecurity is a systematic, often automated, process used by attackers to collect and compile pieces of Personally Identifiable Information (PII) and other digital identifiers about a target individual or group. The goal is to aggregate this disparate data into a comprehensive profile that can be used for malicious purposes, primarily identity theft, account takeover, or large-scale fraud.
Defining Identity Harvesting
Unlike simple phishing, which targets one credential at a time, identity harvesting is a broader, sustained data-collection effort. It focuses on gathering information that, when combined, creates a complete digital identity, often including:
Identifiers: Full names, birth dates, social security numbers, and national identification numbers.
Contact Information: Email addresses, phone numbers, and physical addresses.
Authentication Details: Usernames, hashed passwords, answers to security questions, and PINs.
Financial Details: Credit card numbers, bank account numbers, and transaction histories.
Biographic Data: Employment history, family relationships, educational background, and location data.
This process is a form of advanced reconnaissance in which the harvested identity is the key asset sought.
Methods of Identity Harvesting
Attackers use a combination of techniques, often spanning both passive and active methods, to acquire data from multiple vectors:
Passive OSINT and Data Scraping: This involves systematically extracting publicly available information from social media profiles, public records (like real estate or voter registries), forum posts, news articles, and company websites. Automated tools are used to scrape thousands of profiles for common data points such as names, dates of birth, and email addresses.
Data Breaches and Leaks: Attackers buy, trade, or access databases from previous large-scale security incidents. These leaked databases often contain combinations of usernames, passwords, and other PII that form the foundation of the harvested identities.
Phishing and Malware Campaigns: Targeted campaigns are deployed to actively trick victims into submitting PII on fraudulent websites (phishing) or to deploy keystroke loggers and spyware (malware) to steal credentials and data directly from compromised devices.
Insecure APIs and Applications: Attackers exploit poorly secured Application Programming Interfaces (APIs) on websites or mobile apps. These APIs may allow attackers to enter a known identifier (such as an email address) and receive a wealth of associated PII (such as a full name, address, and recent purchases), effectively automating the harvesting process.
Significance in Cybersecurity
Identity harvesting is the foundation for some of the most damaging cybercrimes:
Account Takeover (ATO): The harvested data is used to bypass authentication mechanisms across multiple platforms, as people often reuse passwords or answer the same security questions.
Synthetic Identity Fraud: Attackers combine real PII with fabricated details to create a new, entirely synthetic identity, often used to open credit accounts and steal funds.
Spear-Phishing and Extortion: The comprehensive profile enables attackers to craft highly personalized, believable social engineering attacks that appear legitimate, increasing the success rate of financial fraud or executive-level extortion.
Insider Access: Harvesting professional identity details enables an attacker to masquerade as an employee and gain initial access to corporate networks, or to purchase additional compromised credentials.
ThreatNG is an extremely effective solution for countering Identity Harvesting because the technique relies entirely on collecting scattered, publicly exposed digital artifacts—exactly what ThreatNG is designed to discover, centralize, and risk-score from an external adversary's perspective. It unifies these disparate external data sources into a complete picture of an organization’s identity exposure.
ThreatNG's Role in Preventing Identity Harvesting
External Discovery
ThreatNG's ability to perform purely external, unauthenticated discovery without connectors is fundamental to combating identity harvesting. It mimics the actions of a harvesting attacker by mapping all digital footprints that contain PII or other identifiers.
Example of ThreatNG Helping: ThreatNG's discovery process identifies NHI Email Exposure by grouping discovered emails associated with critical Non-Human Identities (NHI) such as Admin, Security, System, and Integration. An attacker harvesting identities would target these high-privilege addresses. ThreatNG finds them first, allowing the organization to shield these accounts from external view.
External Assessment
ThreatNG’s security ratings directly quantify the risks stemming from successful identity harvesting, such as the use of stolen credentials and social engineering.
Data Leak Susceptibility Security Rating (A-F): This rating is heavily based on uncovering external digital risks related to Compromised Credentials.
Example in Detail: If a user’s personal login, which uses a common company email format, is found in a data breach on the dark web, ThreatNG rates this a high Data Leak Susceptibility. This identity artifact is a key component of a harvested profile, and its pre-compromised status significantly raises the risk of a successful Credential Stuffing attack against the organization.
BEC & Phishing Susceptibility Security Rating (A-F): This rating is based on findings like Email Format Guessability and Domain Name Permutations.
Example in Detail: ThreatNG finds that the company's domain name, company.com, has an available permutation, such as c-ompany.com. An identity-harvesting attacker would register this domain to create a lookalike phishing site that tricks employees into surrendering their credentials, which are then harvested. ThreatNG identifies the risk of this permutation, allowing the organization to register the defensive domain and prevent the phishing vector.
Reporting
The reporting features ensure that harvested PII and associated high-risk exposures are surfaced with the necessary context for immediate action.
Prioritized Reports: These reports classify findings such as an exposed API Key or a leaked username as high risk, ensuring security teams address the most critical identity exposures that enable harvesting and subsequent account takeover.
Inventory Reports: These reports unify all discovered external assets, including Subdomains, Mobile Apps, and Emails, providing a single source to manage all potential identity exposure points.
Continuous Monitoring
Continuous Monitoring of the external attack surface and digital risk is essential, as identity harvesting is a constant, ongoing process.
Example of ThreatNG Helping: An employee inadvertently posts a sensitive document containing a list of internal server usernames to a third-party file-sharing site. ThreatNG's continuous discovery detects this Online Sharing Exposure immediately, identifies the exposed PII, and prevents the attacker from adding these new, high-value identity credentials to their harvested profile.
Investigation Modules
ThreatNG provides specialized modules to hunt down the scattered identity fragments that attackers collect actively.
Social Media / Username Exposure: This module performs a Passive Reconnaissance scan for usernames across over 200 platforms and forums.
Example in Detail: An analyst uses this module to check a list of core employee email addresses and finds corresponding usernames on sites like GitHub or Stack Overflow. This confirms active digital identities and enables the organization to correlate external usernames with internal accounts, proactively enforcing MFA or flagging them for targeted monitoring.
Email Intelligence: This module provides Format Predictions and Harvested Emails.
Example in Detail: The module confirms that a target organization uses the email format firstinitial.lastname@company.com. An attacker can exploit this predictability to generate a thousand valid employee email addresses, a mass-harvesting technique. ThreatNG provides this intelligence to the organization so they can understand and mitigate their Email Format Guessability risk.
Sensitive Code Exposure / Code Repository Exposure: This identifies public code repositories that contain digital risks, such as Access Credentials (e.g., AWS Access Key ID).
Example in Detail: ThreatNG discovers an exposed GitHub repository containing an API Key or a plaintext Password in the URL. These are Non-Human Identities (NHI) that attackers harvest to gain unauthorized systemic access. ThreatNG flags the exposure of this machine identity, which is often easier to harvest than a human's.
Intelligence Repositories (DarCache)
ThreatNG uses its intelligence repositories to provide external validation and context for harvested identities.
Compromised Credentials (DarCache Rupture): This repository is the direct source for cross-referencing harvested emails and usernames against known data breaches, immediately confirming which identities are currently compromised and available to attackers.
Dark Web (DarCache Dark Web): This repository monitors mentions of the organization and its associated individuals on underground markets where harvested PII and account access are sold.
Complementary Solutions
ThreatNG's external, high-context intelligence on harvested identities is highly valuable for internal security tools.
Cooperation with IAM Solutions: When the Compromised Credentials (DarCache Rupture) identifies an employee whose identity has been harvested, this finding can be pushed to a complementary Identity and Access Management (IAM) solution. The IAM system can then be automatically triggered to enforce a session termination, a mandatory password reset, and require the strongest form of Multi-Factor Authentication (MFA) for the compromised account, preventing the attacker from using the harvested credentials.
Cooperation with Email Security Solutions: The Domain Name Permutations and Email Intelligence findings can be shared with a complementary Email Security Solution. This integration allows the email filter to immediately flag or quarantine emails originating from any of the discovered typo-squatting or look-alike domains, blocking the delivery of phishing lures that rely on harvested identities.

