Data masking approach to data protection

1/27/26 5:00 PM | Hosting

Data Masking: Protecting Data by Not Having It

For Data Protection Day 2026, I would like to share our experiences with data masking, an often undervalued approach to data protection.

I’d like to share how NovoServe applied data masking when building our own internal data warehouse to better protect sensitive information such as the Personally Identifiable Information (PII) of our customers.

As a bare-metal provider, NovoServe has a reasonably complex technical architecture: we operate a number of different systems for customer portals, network management, server provisioning, bookkeeping, and more. We often need to perform analysis on combined data from these multiple systems. To enable this, we built our own data warehouse to gather data from different sources into a unified environment for analysis and archival.

We use data from our warehouse for traditional business intelligence reporting (such as financial KPIs), but also for more technical goals such as data consistency monitoring between different systems.

With great knowledge comes great risk

Bringing together a lot of data is very useful from a business perspective, but it does introduce new risks: in case of a data breach on our data warehouse, the data sourced from many different systems would be at risk all at once.

To reduce the likelihood of a breach, we of course implemented a number of technical measures, such as very limited access by team members, a strict firewall, and a dedicated hardware server for hosting the database.

However, even though we have reduced the likelihood of a breach by implementing technical measures, we like to design with contingencies in mind: if a breach does occur, how can we reduce the impact of a data leak?

Data minimisation: reasoning from necessity, instead of abundance

Where we often advocate options for "going big" in the infrastructure space, we take the opposite view when considering sensitive data: we rather have less than more.

To this end, we have designed our secure data warehouse infrastructure not just for security, but also with data minimisation in mind. If a breach were to occur, the best way to limit impact with respect to stolen data, is to ensure the data is not present in the system at all!

This started with discussing how we wanted to use our data warehouse. It was decided from the start that our data warehouse would be used for reporting and analysis, but not for operational processes such as invoicing. This implies that although some data needs to be related to specific customers (e.g. to be able to analyse customer-specific issues), we do not need address or payment data for our customers inside of our data warehouse.

Our goal was never to store as much data as possible in our warehouse, but rather to store only what was needed! This naturally led us to considering data masking.

Data masking: maximizing the data not there

Data masking is a process for removing sensitive information (e.g. personally identifiable information) from a data set while preserving the usability of the data set for its intended users.

For example, when using a customer list to analyse geographical distribution based on country, customer name, email or exact addresses are not relevant and can be masked out before processing such a list. (For those of you who would like a reference to a standard: ISO 27001 Annex A / ISO 27002:2022 Control 8.11 "Data Masking" describes applicable considerations in more detail.)

In the context of data protection, discussions about information security controls often focus on how to safeguard the confidentiality of data. There is a lot of attention on controls such as encryption, authentication/authorisation and vulnerability management. However, Privacy by Design starts by understanding the minimal set of data that you really need and based on that minimising the data you have. Everything that is not there, cannot be the object of a leak.

Some technical bits (SQL & ETL)

As many of the (open-source) systems we use are based on SQL databases, our data warehouse is also based on SQL. At the lowest level, we simply ETL (extract-transform-load) the data from the different sources into similar schemas within our warehouse. This allows us to port SQL queries on our warehouse back to the original systems without too much effort.

However, the original tables often contain sensitive data. For example: customer data in a billing system contains not only the company name of the customer, but also address information of the customer as this needs to be put on the invoices, and possibly even payment information such as banking details.

To preserve the original table schemas while leaving out the sensitive information, we implemented several mechanisms for data masking in the ETL tool we use: In some cases we need to mask out data based on specific columns. In some other cases we need to mask out data based on specific row criteria.

By masking the data as part of the extraction process from the original source, we ensure it never even enters the data warehouse system. Masked rows are simply left out, while masked columns get a placeholder value (e.g. 'REDACTED') instead of their original value.

But maintaining the original data structure (for example all of the columns) does allow us to port SQL queries between production systems and the data warehouse.

A data analyst can develop an SQL query on masked data tables within our data warehouse to analyse an issue. If the issue is found but some access to sensitive data is required for resolution, the same query can be provided to an engineer with suitable permissions on a production system to obtain any additional sensitive values. Our analysts thus do not have access to sensitive data while still being able to perform analysis on production data in the warehouse, with the ability to transfer queries they write back to (individual) production systems when needed.

Secure infrastructure that you can trust

We have been operating our data warehouse with data masking for several years now, and have never regretted our decision to keep a lot of sensitive data out of our warehouse.

Yes, our data analysts are unable to answer some very customer-specific questions based on what we have in our warehouse. And yes, that is sometimes a bit annoying for other colleagues who sometimes still try to ask such questions. But to be frank: this is data masking working exactly as intended. We’re not only limiting the impact of potential data leaks, but we’re also hard limiting access to potentially sensitive data for roles such as data analysts that are not supposed to work with that data anyway.

By designing our internal systems with this level of rigor—prioritising Privacy by Design and data minimization—we demonstrate the same engineering mindset we apply to your dedicated servers. Whether you are in FinTech, AdTech, or Healthcare, you need a partner that doesn't just talk about security, but builds it into the very architecture of the company. You can be assured the same level of protection is applied to the network and hardware hosting your mission-critical workloads.

Data masking is a powerful, yet often undervalued, approach to data protection. We hope sharing our approach inspires you to take a second look at your own data pipelines for Data Protection Day 2026.

LEARN MORE

Sjoerd van Groning

Written By: Sjoerd van Groning

Sjoerd van Groning brings a multidisciplinary technical background to his role as Product Manager at NovoServe. With deep experience spanning network architecture, server infrastructure, and application hosting (including previous leadership at software firm Phusion), Sjoerd understands the full IT stack—from the physical fiber layer to the application runtime. His expertise lies in translating complex operational requirements into robust hardware designs, ensuring that bare metal configurations are engineered to support specific software workloads. Sjoerd focuses on the intersection of engineering constraints and system performance, designing infrastructure that is technically sound and built for scale.