AI Asset Discovery

Dec 11

AI Asset Discovery, in the context of cybersecurity, is the systematic process of identifying, locating, and cataloging all components that make up an organization's Artificial Intelligence (AI) footprint. This practice is fundamentally about gaining comprehensive visibility into the entire AI environment to secure it.

It goes significantly beyond traditional IT asset management by focusing on the unique, often non-standard resources used in the AI lifecycle.

The primary goal is to create a complete and continuously updated inventory of AI assets that security teams and governance bodies can use for risk assessment, compliance, and policy enforcement.

A thorough AI asset discovery process identifies the following key elements:

AI Models and Agents: This includes the trained machine learning models themselves (e.g., Large Language Models, deep learning networks), their versions, their deployment locations (on-premises, cloud, edge), and the autonomous software agents built on top of them.
Data Assets: This involves locating the sensitive data used at every stage: the original training datasets, testing and validation data, and the real-time data used for inference and decision-making. Since this data often contains proprietary or personally identifiable information, its location is critical.
Infrastructure and Endpoints: This covers the computing environments where models are built and run. It includes MLOps platforms, code repositories, API endpoints for model interaction, cloud storage buckets (e.g., S3 or Azure Blob Storage), and the specific servers or containers that host the models.
Prompts and Configurations: For Generative AI, the specific prompts, prompt templates, or chains used to guide the model's behavior are treated as intellectual property and potential attack vectors. Discovery identifies where these prompts are stored and which systems they touch.

Failure to perform effective AI asset discovery leads to "Shadow AI," where unmanaged models or exposed APIs operate outside the security team's control, creating significant, unknown risk.

ThreatNG, as an all-in-one external attack surface management (EASM), digital risk protection (DRP), and security ratings solution, is highly effective at AI Asset Discovery by focusing on the external, unauthenticated, and often-forgotten components of the AI environment.

ThreatNG approaches AI Asset Discovery entirely from the attacker’s perspective, without needing internal access or credentials.

External Discovery and Inventory

ThreatNG's ability to perform purely external unauthenticated discovery using no connectors is the foundation for finding AI assets, particularly Shadow AI resources.

Technology Stack Identification: ThreatNG provides exhaustive, unauthenticated discovery of nearly 4,000 technologies. This is critical for AI discovery because it identifies the tools and frameworks being used, including hundreds of technologies categorized as Artificial Intelligence, as well as specific vendors in AI Model & Platform Providers and AI Development & MLOps. This process directly inventories the exposed AI assets.
Subdomain and Hosting Intelligence: ThreatNG uses Subdomain Intelligence to uncover subdomains hosted on major cloud platforms such as AWS, Microsoft Azure, and Google Cloud Platform, and to identify the technologies used on those subdomains. This helps locate the public-facing API endpoints or applications that interact with the AI models.

Example of ThreatNG Helping: ThreatNG discovers a subdomain, data-api.company.com, running a technology identified as a key AI Development & MLOps vendor. This immediately alerts the security team to the presence of an external, potentially unmanaged, AI asset.

External Assessment for AI Asset Risk

ThreatNG assesses the exposure risk of the discovered AI assets, even if they are unauthenticated.

Data Leak Susceptibility: This security rating is directly derived from uncovering external digital risks, such as Cloud Exposure, specifically exposed open cloud buckets. These misconfigured cloud buckets are often where AI training data or model weights reside. ThreatNG flags this exposure as a critical data leakage risk.
Cyber Risk Exposure (Sensitive Code): This rating is based in part on Sensitive Code Discovery and Exposure (code secret exposure). If proprietary model configurations, API keys, or cloud credentials used by an AI asset are leaked to a public repository, ThreatNG finds the external exposure.
Non-Human Identity (NHI) Exposure: This rating quantifies the organization's vulnerability to threats from high-privilege machine identities, such as leaked API keys and service accounts. Since AI agents and models use such non-human identities for access, finding these exposed keys is a direct discovery of a critical AI asset vulnerability.

Example of ThreatNG Helping: ThreatNG's Data Leak Susceptibility assessment flags a low-grade S3 bucket that is publicly accessible. The Technology Stack module identifies the organization uses Databricks for data processing, confirming the bucket likely contains AI-related data.

Investigation Modules

ThreatNG's Investigation Modules allow security teams to zoom in on specific asset types and exposures critical to AI Asset Discovery:

Cloud and SaaS Exposure: This module identifies both sanctioned and unsanctioned cloud services, as well as Open Exposed Cloud Buckets. This is the most direct way to discover misconfigured AI data storage assets. It also identifies all associated SaaS implementations (SaaSqwatch), which may include AI platforms or tools like Snowflake, Looker, or Splunk.
Sensitive Code Exposure (Code Repository Exposure): This module discovers public code repositories and specifically looks for Access Credentials (like Google Cloud API Keys and AWS Access Key IDs) and Configuration Files. This allows the team to discover an AI model's entire access structure if a developer accidentally commits it.
Online Sharing Exposure: This module identifies organizational presence within online code-sharing platforms like Pastebin and GitHub Gist, where proprietary prompts or model API keys might be inadvertently posted.

Example of ThreatNG Helping: A search using the Advanced Search facility reveals an exposed Pastebin post containing an Artifactory API Token. The Technology Stack shows Artifactory is used , and the associated subdomains show connections to Github, confirming the token's relation to the software development lifecycle, which is highly likely to affect AI assets.

Reporting and Continuous Monitoring

ThreatNG provides Continuous Monitoring of the external attack surface and digital risk.

Reporting: Reports like the Security Ratings (A through F) and the Prioritized Reports (High, Medium, Low) convert technical asset discovery findings into actionable business context. A discovery of an exposed AI API combined with a leaked credential, for instance, would result in a highly prioritized finding and a poor security rating.
MITRE ATT&CK Mapping: ThreatNG automatically translates raw findings on the external attack surface into a strategic narrative by correlating them with specific MITRE ATT&CK techniques. This helps security leaders understand how the discovered AI assets could be used by an adversary for initial access or to establish persistence.

Intelligence Repositories

ThreatNG’s Intelligence Repositories (DarCache) provide the necessary contextual data to validate and prioritize discovered AI assets.

Technology Stack & Vendor List: The DarCache enables the Domain Record Analysis feature, which includes the ability to externally identify vendors and technologies , including AI Model & Platform Providers. This intelligence is what turns a generic exposed IP into a known, critical AI asset.
Vulnerabilities (DarCache Vulnerability): This repository integrates NVD, KEV, EPSS, and Proof-of-Concept Exploits. This allows ThreatNG to assess the exploitability of the exposed infrastructure hosting the AI asset, helping prioritize remediation efforts for assets that pose an immediate, proven threat.

Complementary Solutions

ThreatNG's unauthenticated external discovery and asset inventory can provide foundational intelligence that enhances complementary solutions like Cloud Security Posture Management (CSPM) and Configuration Management Databases (CMDBs).

Complementary Solutions (CSPM): ThreatNG's external discovery flags Open Exposed Cloud Buckets and Non-Human Identity Exposures. When ThreatNG identifies a public-facing cloud exposure, it provides the CSPM with an exact, unauthenticated view of the asset, forcing the CSPM to check the internal access policies for that specific bucket. For example, ThreatNG discovers a publicly open S3 bucket, and the CSPM uses that alert to trigger an internal audit of all associated IAM roles and bucket policies.
Complementary Solutions (CMDB): ThreatNG's Technology Stack discovery provides the actual, unauthenticated external inventory of the organization's assets (including Shadow AI and exposed AI vendors) that may be missing from the CMDB. When ThreatNG flags a subdomain running an unauthorized AI Development platform, this external discovery information updates the CMDB to mark the asset as unmanaged, allowing the CMDB owner to begin governance and tracking for the new AI asset.

AI Asset Discovery

Threat NG Staff

AI Asset Discovery

External Discovery and Inventory

External Assessment for AI Asset Risk

Investigation Modules

Reporting and Continuous Monitoring

Intelligence Repositories

Complementary Solutions

AI Model Footprinting

AI Agent Drift