Governing Unstructured Data for AI Readiness

Nov 26, 2025

Governing Unstructured Data for AI Readiness

This article examines the primary blockers preventing organizations from scaling AI initiatives and demonstrates how we feel Qohash’s Data Security Posture Management platform, Qostodian, operationalizes the Gartner five-step governance framework to enable safe generative AI rollouts.

Get a complimentary copy of the Gartner Governing Unstructured Data for AI Readiness: A Strategic Roadmap report here!

The unstructured data challenge

More than half of organizational data sits in an inert state, providing no value to core AI initiatives. The advent of generative AI has transformed unstructured data governance from a longstanding challenge into an urgent strategic imperative. Without proper governance, unstructured data cannot support AI initiatives effectively—and worse, it becomes a vector for significant security and compliance risks.

This Gartner research reveals the scale of this challenge: unstructured data (documents, emails, images, audio, and video files) accounts for 70% to 90% of organizational data and poses unique governance challenges due to its volume, variety, and lack of coherent structure. Large enterprises will triple their unstructured data capacity across on-premises, edge, and public cloud locations by 2028. Yet existing governance strategies, designed primarily for structured data, are inadequate for managing unstructured content at scale.

The critical question facing CISOs, CIOs, and CDOs is not whether to govern unstructured data, but how to implement comprehensive governance quickly enough to support safe, scalable AI adoption while mitigating security risks.

Main blockers of AI adoption

“In the last 12 months, Gartner has seen a significant increase — approximately 150% — in inquiries about unstructured data management. This highlights the critical demand for GenAI-ready data, as its absence is the top reason for failed GenAI deployments.” This surge in inquiries reflects a harsh reality: organizations are struggling with fundamental barriers that prevent AI initiatives from scaling beyond pilot projects. As organizations accelerate AI and large language model deployments, three critical blockers consistently emerge:

Sensitive Data Loss Through Prompts

Employees routinely input queries containing customer information, proprietary business data, or regulated content into AI interfaces without understanding the exposure risk. Each prompt represents a potential data exfiltration vector, particularly when users copy-paste sensitive content from internal documents into external AI services.

Sensitive Data Loss Through File Uploads

File upload functionality in AI tools creates direct pathways for sensitive document exposure. Users upload contracts, financial reports, medical records, and other confidential materials for summarization or analysis, often bypassing established data loss prevention controls that were not designed to monitor AI interactions.

Unfettered AI Agent Access to Data Sources

This represents the most significant exposure risk. AI agents now maintain ongoing, unfettered access to shared data sources such as OneDrive, SharePoint, and cloud storage platforms. Employees are one click away from sharing any data they can access, and agentic systems can retrieve and process information without human oversight or granular access controls.

Traditional data governance approaches assumed human intermediaries would make access decisions. AI agents, operating autonomously and at scale, require fundamentally different security postures—ones that can identify sensitive data in place, enforce access policies in real-time, and remediate exposure risks without moving large quantities of data through centralized systems.

Qohash’s edge computing platform is uniquely positioned to address this third blocker by bringing intelligence to the data rather than moving large quantities of data. This approach of discovering and classifying data allows users to remove unnecessary data and reduce the data security attack surface.

Five Steps to Govern Unstructured Data

The Gartner five-step framework provides a systematic approach to unstructured data governance. Qostodian maps directly to each step, accelerating the path to AI-ready data while maintaining security and compliance.

Get a complimentary copy of the Gartner Governing Unstructured Data for AI Readiness: A Strategic Roadmap report here!

Step 1: Discover and Catalog

Organizations must locate unstructured data dispersed across workstations, cloud storage, business applications, email systems, and other repositories. Manual discovery processes are not feasible at enterprise scale—automated tools are essential for large volumes of data in various formats.

Qostodian’s Capability: Data discovery at scale using predefined and custom sensitive information types. The platform provides continuous monitoring and discovery of sensitive information, ensuring new data is automatically cataloged as it is created or modified.

Outcome: Qostodian maintains current visibility into the data landscape through continuous discovery, identifying where sensitive data resides across the organization without requiring data movement to centralized systems.

Step 2: Preprocess and Analyze

Unstructured data often suffers from quality issues—inconsistent formatting, unrecognizable characters, unwanted noise. Analysis work converts unstructured data into meaningful, structured content through technologies like sensitive data identification (PII, PHI detection), topic modeling, sentiment analysis, OCR, and speech-to-text transcription.

Qostodian’s Capability: Summarized metadata that includes file type, type of sensitive data present, and data source location. The platform analyzes content automatically to extract this critical metadata without manual review.

Outcome: Organizations can prioritize sensitive information with detailed metadata, understanding not just where data resides but what type of sensitive content it contains. This enables risk-based prioritization for governance and AI readiness.

Step 3: Tag and Classify

Metadata tagging attaches descriptive information to files, making data easier to organize, search, manage, and secure. Classification enables application of appropriate governance approaches for security and compliance. Common categories include confidential/sensitive data, internal-use-only data, public data, regulatory data, and department-specific classifications.

Qostodian’s Capability: Automated tagging and labeling based on content analysis and predefined policies. Tags and labels are immediately actionable for access control.

Outcome: Qostodian creates tags and labels that can be used in blacklists to prevent specific data sources and files from being accessed by large language models. This direct integration between classification and AI access controls is critical for blocking the three main AI adoption blockers, particularly preventing AI agents from accessing sensitive data they should not retrieve.

Step 4: Connect and Share

Individual pieces of unstructured data hold limited value unless analyzed and mapped to other data elements or business processes. Graph databases and knowledge graphs map connections between unstructured and structured data, representing relationships between entities through interconnected nodes and edges.

Qostodian’s Capability: Creates relationships between data, data sources, data containers, users, and file types. This relationship mapping provides comprehensive visibility into data flows and usage patterns.

Outcome: Data mapping, labeling, and enforcement of sharing rules within LLM and other systems. Organizations gain understanding of how data flows through their environment, which users access what data, and how AI agents interact with data sources—enabling effective access controls for agentic systems with broad permissions.

Step 5: Define, Execute and Enforce Data Policies

Clear, practical policies guide how data is handled, protected, and used throughout its life cycle. Key policy types include compliance and legal policies, classification policies, access and

permission policies, data life cycle management policies, security policies, storage policies, sensitivity policies, and privacy policies.

Once policies are defined, they must be executed and enforced in real-time. Policies integrated into business workflows or validation checks ensure adherence. Different enforcement levels should be implemented based on consequences from policy violations.

Qostodian’s Capability: Leverage automation workflow rules to define and enforce AI governance policies in real-time. The platform’s edge-based remediation capabilities allow deletion, quarantine, or access modification while keeping data in place.

Outcome: Documented governance rules feed into guardrail policies that prevent unauthorized access, enforce retention requirements, and ensure compliance across AI systems. Organizations can enforce policies without moving data, minimizing movement costs and third-party risks while maintaining security posture.

Final thoughts and recommendations

The governance of unstructured data is no longer optional for organizations pursuing AI initiatives. We believe the Gartner research confirms that inadequate governance is the primary reason GenAI deployments fail to scale. The convergence of data security, AI governance, and data governance is essential for safe AI adoption.

Gartner predicts that by 2027, 60% of data governance teams will be mandated to prioritize governance of semistructured and unstructured data to extract value and improve decision quality through GenAI use cases. By 2029, over 80% of unstructured data will be deployed on consolidated storage platforms instead of separate file and object products, up from 40% in early 2024. Organizations that establish comprehensive unstructured data governance now – leveraging platforms like Qostodian that operationalize the five-step framework – will gain competitive advantages in AI deployment speed, model accuracy, and risk mitigation.

The framework exists. The technology is available. The imperative is clear. There is no safe AI adoption without data security, and no effective data security without comprehensive governance of unstructured data. Organizations that recognize this convergence and act decisively will transform their vast repositories of unstructured data from security liabilities into competitive differentiators in an increasingly AI-powered world.

Authored by:

GARTNER is a registered trademark and service mark, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.

Gartner, Governing Unstructured Data for AI Readiness: A Strategic Roadmap, Melody Chien, 13 August 2025