Sensitive Data Classification 101

Sensitive Data Classification 101

Sensitive Data Classification 101

Each day, your organization creates thousands if not millions of files and records containing corporate data. Some of this data could escape into the wild and nobody would particularly care. But some of this data created each day could lead to regulatory fines, the release of corporate secrets, or even a public relations scandal that could hurt or destroy the company if it fell into the wrong hands.


Knowing which files to protect and how carefully to protect them is an essential element of data security. This is where data classification comes in.

Data classification is the process of organizing and categorizing structured and unstructured data so it can be managed and protected more effectively. Company flowcharts might get a limited amount of protection, for instance, while intellectual property documents are given more security and customer social security numbers are walled off even more.

Why You Need Data Classification

Security is of course one reason for data classification. But research firm, Gartner, outlines four basic use cases for data classification:

  • Risk Mitigation. This includes limiting access to personally identifiable information, controlling location and access to intellectual property, reducing attack surface area for sensitive data, and integrating classification into security and policy-enforcing applications.
  • Governance/Compliance. Identifying data governed by regulations such as GDPR, HIPAA, CCPA, PCI, SOX and those not yet developed, applying metadata tags to protected data for additional tracking and controls, enabling quarantining, legal hold, archiving and other regulation-required actions, and facilitating “Right to be Forgotten” and Data Subject Access Requests (DSARs).
  • Efficiency and Optimization. Enabling efficient access to content based on type, usage, etc., discovering and eliminating stale or redundant data, and moving heavily utilized data to faster devices or cloud-based infrastructure
  • Analytics. Enabling metadata tagging to optimize business activities, and informing an organization on the location and usage of data.


Despite these important reasons for data classification, however, more than 52 percent of data within the typical organization remains unclassified, according to a recent study.


Typical Classification Schemes

There are many ways that organizations can classify data sensitivity. The U.S. government has seven levels of classification, for instance, including Restricted Data, Top Secret, and Controlled Unclassified Information, among others.

Each organization will want to develop a classification scheme that best meets its needs, but generally most corporate data classification schemes include a minimum of four high-level sensitivity categories:

  • Restricted. The highest level of sensitive data. This includes the data that, if compromised, could put a firm at risk for financial, legal, regulatory or reputational damage.
  • Confidential. Exposed data that would inflict a moderate risk to the organization or one of its employees. Unintentional access would bring consequences greater than short-term embarrassment, and could possibly have a negative impact on company operations or long-term reputation.
  • Internal. Data that is not meant for the public, but has a relatively low impact if exposed. The company wouldn’t want this data leaked, and it might cause some short-term embarrassment or reputational damage. But access to this data wouldn’t have regulatory or significant lasting repercussions.
  • Public. Data that anyone can see, and is not of a personal nature. Exposure of this data would result in little or no risk, and doesn’t need encryption or significant protection.


While these four basic classifications have been in use for decades, privacy regulations and more advanced data management systems have led many organizations to adopt three additional sub-layers. These include:

  • Data Processing layer (consent). Many data privacy regulations now require an individual’s consent for how their private data can be used by the organization.
  • Purpose layer (access). Some privacy regulations, most notably Europe’s GDPR, require organizations to specify the purpose for which specific data was collected.
  • Privacy layer (compliance). Some regulations, including California’s CCPA, make additional demands on organizations that keep an individual’s data. To ensure compliance, organizations therefore often add a specific and advanced set of data classifications around privacy.

How to Implement Data Classification

The process for implementing data classification at an organization varies based on intended outcomes, but setup for most classification programs requires seven key steps.

  1. Define Classification Objectives. What is the company looking for, and why? What regulations apply to the company? Which systems are in scope for the initial classification process? Is classification intended to achieve additional objectives, or just improved data security?
  2. Categorize Data Types. What kinds of data are created and exist within the business? Which data is proprietary, and which is public? Will there be regulated data?
  3. Define Classification Levels. How many levels of classification are necessary? What are examples of data and documents at each level?
  4. Establish an Automated Classification Process. How will classification automation take place? What is the process for defining what data will be scanned first? What frequency and resources will be used to automate classification?
  5. Specify Classification Criteria and Review. What process of review and validation will be used for checking the automated classification process? What classification patterns and labels within the automated classification solution will be used to achieve the correct data classifications?
  6. Define Overall Outcomes and Classified Data Usage. What analytics processes will be used on the classification results? What are the expected outcomes from the analytics analysis? What risk mitigation steps and automated policies will be put in place for various classifications?
  7. Monitor and Maintain Classification. What ongoing process for classifying new and modified data will be used? How will the classification process be reviewed, and with what frequency? How will the company monitor changing business needs and regulations for ongoing classification relevancy?


Data classification can be a daunting project for many businesses. But it doesn’t have to be. Solutions such as Qohash’s cloud-based Qostodian data security platform help firms quickly discover and classify all corporate data assets. Beyond initial data classification, Qostodian also discovers new corporate data as it is created, and can automatically apply classification to this data.


Latest posts

GenAI vs. LLM: What’s the Difference?
Blogs

GenAI vs. LLM: What’s the Difference?

Read the blog →