Shadow AI Discovery

Oct 19

Shadow AI Discovery in the context of cybersecurity is the continuous process of identifying, monitoring, and gaining visibility into the unauthorized or unsanctioned use of Artificial Intelligence (AI) and Machine Learning (ML) tools, services, and applications by employees within an organization.

This practice is a modern and more complex iteration of "Shadow IT," where employees adopt technology—such as popular generative AI chatbots (like ChatGPT or Gemini), external AI coding assistants, or third-party ML model APIs—without formal approval, security vetting, or governance from the IT or security departments.

The Components of Shadow AI

Shadow AI poses unique cybersecurity challenges because it involves the processing of sensitive data by opaque, external systems. The "shadow" typically covers:

Generative AI Tools: Employees use public-facing chatbots to summarize confidential documents, draft client emails, or write proprietary code.
External Model APIs: Developers use unapproved third-party APIs for tasks like natural language processing (NLP) or image recognition, sending corporate data to an external provider's server.
Unvetted Cloud Services: Teams subscribe to new, specialized SaaS tools with embedded AI capabilities (e.g., for analytics or marketing) that have not been reviewed for compliance or data handling policies.

The Risks Driving the Need for Discovery

The fundamental risk of Shadow AI is the lack of visibility, which leads to:

Data Leakage and IP Exposure: The most significant threat. When an employee pastes sensitive information (customer PII, financial data, proprietary source code) into a public AI tool, that data often leaves the corporate security perimeter and may be logged or even used to train the AI provider's model, resulting in a loss of confidentiality and intellectual property.
Compliance Violations: The unauthorized processing of regulated data (such as HIPAA-protected health information or GDPR-covered European data) on an unvetted third-party system can result in substantial fines and severe legal consequences.
Security Vulnerabilities: Unauthorized tools often bypass corporate endpoint security and network monitoring, creating new, unmanaged attack vectors that adversaries, such as prompt injection vulnerabilities in unvetted chatbot interfaces, could exploit.
Misinformation and Bias: Decisions based on unmonitored AI model outputs may be inaccurate, biased, or "hallucinated," resulting in poor business outcomes or reputational damage.

Techniques for Shadow AI Discovery

Effective Shadow AI Discovery requires a multi-faceted approach, combining network, endpoint, and financial monitoring:

Network and DNS Monitoring: Analyzing internal network traffic, DNS requests, and firewall logs to identify outbound connections to known AI service domains (e.g., openai.com, gemini.googleapis.com, or common AI platform domains). This detects the flow of data to external AI providers.
Cloud Access Security Broker (CASB) and Web Proxy Logs: Monitoring HTTPS traffic and application transactions to identify and log which specific SaaS applications are being accessed by employees and classify them as known AI tools.
Endpoint Agents: Using endpoint security tools to monitor processes, installed browser extensions, and API keys stored on employee devices, which can reveal the presence of unauthorized AI applications or plugins.
Financial Auditing: Reviewing expense reports for micro-transactions or subscriptions to popular AI services, which are often purchased by individuals or teams outside the IT procurement process.
OAuth/Identity Provider Audits: Reviewing logs from identity management systems to check for third-party AI applications that employees have granted access to corporate data (e.g., permission to read their email or access their company cloud storage).

The goal of Shadow AI Discovery is to bring these hidden tools into the light, allowing the organization to either formally approve and secure them or block their use to mitigate risk.

ThreatNG, an all-in-one solution for external attack surface management (EASM), digital risk protection (DRP), and security ratings, is highly effective for Shadow AI Discovery. It continuously identifies and assesses unsanctioned or unmanaged external AI/ML components within an organization from an attacker's perspective.

ThreatNG’s External Discovery and Continuous Monitoring

ThreatNG performs purely external unauthenticated discovery using no connectors, making it ideal for finding the parts of "Shadow AI" that are exposed to the public internet. The platform maintains a Continuous Monitoring cycle of the external attack surface and digital risk, ensuring that any new, unsanctioned AI service or tool spun up by a developer is quickly found.

Discovery Examples

Cloud and SaaS Exposure: ThreatNG discovers both Sanctioned and Unsanctioned Cloud Services and SaaS implementations associated with the organization. Suppose a team starts using an unapproved AI-powered customer service platform or a new, third-party Data Analytics SaaS solution (like Amplitude or Snowflake) to process confidential data. In that case, ThreatNG identifies the external footprint of this shadow service.
Code Repository Exposure: A developer might accidentally commit an API key or configuration file for an external AI coding assistant to a public repository. ThreatNG's Code Repository Exposure discovers public repositories. It investigates their contents for Access Credentials, Configuration Files, and Database Credentials that could be used to access an unauthorized AI platform or its data.

External Assessment for Shadow AI Risk

ThreatNG's assessments directly quantify the risks introduced by Shadow AI by assigning a security rating (A through F).

Detailed Assessment Examples

Data Leak Susceptibility: A critical Shadow AI risk is data leakage to an unmanaged third-party service. This susceptibility score is derived from Cloud and SaaS Exposure and Dark Web Presence. If a team is accidentally routing confidential data to an Open Exposed Cloud Bucket (in AWS, Azure, or GCP) for model training, ThreatNG flags this as a direct data leak exposure, even if a shadow team created the bucket.
Cyber Risk Exposure: This score includes Code Secret Exposure, which identifies repositories and sensitive data within them. Suppose a developer's public repository exposes the Stripe API Key or Google Cloud Platform OAuth credentials linked to an external AI service account. In that case, the Cyber Risk Exposure score reflects this critical weakness.
BEC & Phishing Susceptibility: Shadow AI can be leveraged by attackers. This score factors in Domain Name Permutations, which identify typosquatted domains (e.g., companyai.com vs. company-ai.com). These spoofed domains can be used to host convincing phishing pages that mimic a legitimate internal AI login page, allowing an adversary to steal credentials.

Investigation Modules and AI Technology Identification

ThreatNG’s Investigation Modules provide the granular intelligence to pinpoint and classify Shadow AI use.

Detailed Investigation Examples

DNS Intelligence for AI/ML Vendors: The DNS Intelligence module includes Vendors and Technology Identification. This is the most direct way ThreatNG identifies Shadow AI. It can detect if an organization's subdomains or external assets are actively using services from specific AI Model & Platform Providers such as Anthropic, Cohere, Hugging Face, or OpenAI. For example, suppose the organization's DNS records point to a server hosting a new model service. In that case, ThreatNG can identify that the service is LangChain or Pinecone (from the AI Development & MLOps category), immediately indicating unsanctioned or unmanaged AI activity.
Subdomain Intelligence for Development Environments: Shadow AI often lives in unmanaged Development Environments. Subdomain Intelligence can flag subdomains as being an API, a Development Environment, or a Cloud Hosting provider, indicating a new, unvetted ML model deployment. For instance, finding a subdomain like dev-ai-test.company.com hosted on Vercel or Heroku exposes a shadow model's attack surface.
Search Engine Exploitation for Exposed Data: The Search Engine Attack Surface facility can uncover an organization's susceptibility to exposing Potential Sensitive Information or Susceptible Files via search engines. This helps identify configuration files or log snippets from shadow AI services that contain private information or internal network details, which an attacker could exploit to pivot into the main infrastructure.

Intelligence Repositories and Reporting

ThreatNG's intelligence repositories, branded as DarCache (Data Reconnaissance Cache), and its reporting mechanisms, enable security teams to prioritize Shadow AI risks.

DarCache Vulnerability and KEV: By linking identified vulnerabilities to the KEV (Known Exploited Vulnerabilities) list, ThreatNG ensures that security teams prioritize fixing vulnerabilities in unmanaged Shadow AI infrastructure that are already being exploited in the wild.
DarCache Rupture (Compromised Credentials): This repository tracks Compromised Credentials that are often leaked from unmanaged AI accounts, directly contributing to the Data Leak Susceptibility score.
Reporting: ThreatNG provides Prioritized Reports (High, Medium, Low) and detailed Knowledgebase information, which includes Reasoning and Recommendations. This allows security teams to efficiently communicate the risk of a discovered Shadow AI asset (e.g., "High Risk: Exposed API for Cohere platform, reasoning: data leakage susceptibility is high due to lack of authentication, recommendation: block access or implement a WAF").

Complementary Solutions

ThreatNG's EASM data can work with complementary internal security solutions to build a comprehensive Shadow AI defense.

Security Monitoring (SIEM/XDR) Tools: ThreatNG can identify whether an organization is using an AI Model Provider. This external intelligence can be fed to a complementary SIEM or XDR solution (like Splunk or Cortex XDR) to inform internal network monitoring rules. For example, the security team can configure the SIEM to flag any significant volume of internal data being uploaded to the specific external IP address ThreatNG identified as hosting the Shadow AI service.
Identity and Access Management (IAM) Platforms: When ThreatNG discovers a Leaked Credential (e.g., an AWS Access Key ID ) associated with a Shadow AI cloud instance, this finding can be used by a complementary IAM platform (like Okta or Microsoft Entra) to automatically revoke or restrict the corresponding identity's access to all cloud resources, immediately neutralizing the threat posed by the exposed key.
Vulnerability & Risk Management (GRC) Tools: ThreatNG performs External GRC Assessment Mappings to frameworks like NIST CSF and GDPR. This information can be used by a complementary internal GRC platform (like Drata or Vanta) to automatically document the external compliance gap created by a Shadow AI component and trigger internal audit workflows to address the issue.

Shadow AI DiscoveryShadow Artificial Intelligence Discovery

Threat NG Staff