How to Write Authorization Policies for Big Data
When it comes to securing access to services and data, we see many different use cases and, with that, the enforcement of authorization rules at different layers in the IT stack. This spans all the way from the Web/Presentation tier down to the data tier as illustrated in Figure 1.
Enforcing authorization directly at the data level is incredibly powerful as it could mean minimal or no changes to the applications that are accessing the data itself. The approach could be designed in such a way that, regardless of what application (web application, business analysis, etc.) is accessing the data, access is systematically controlled and consistently enforced. With this model, you can achieve tremendous leverage to cover many applications with a single ABAC integration at the data source.
There are a few different ways of achieving authorization directly on the data tier. At Axiomatics, we have had our Axiomatics Data Access Filter for Multiple Databases (ADAF MD) available for some time now. ADAF MD aims to solve authorization challenges for relational databases. Think Oracle, DB2, MS SQL, Teradata… We are expanding beyond traditional databases with the recent launch of SmartGuard™ for Big Data that enables (as the name hints) authorization for Big Data. So what does that look like? What kind of authorization use cases do we see for Big Data?
Fig 1. Authorization at any layer in the application stack
Usually, when looking at Big Data, there are several aspects that differ from traditional data processing. There is usually an enormous amount of data (hence the name Big Data) and with this comes the challenge of capturing, storing, processing, querying, analyzing, sharing and visualizing the data using traditional data processing methods. The Big Data approach is attempting to help solve some of these challenges. However, in many cases, when we get to querying the data, it is still done in a similar way to what we are used to seeing. This stems from the fact that a lot of the data captured is still in some way relational. Parts of the data relate to other parts of data in one way or another. In fact there is a prevailing school of thought that if there is absolutely no relationship between data parts, there is not much value in analysing them. This is also something we see play a factor in authorizing access to the data and something that Attribute Based Access Control (ABAC) is really good at handling.
Let’s imagine you are building a medical platform capable of gathering data from multiple healthcare organizations, hospitals, clinics, insurance companies and government agencies. The end goal of the Big Data healthcare initiative is to provide better and more preemptive care for patients.
To achieve this goal, we need to collect medical data from many patients nationwide. That medical data will contain extremely sensitive information such as:
- Personally identifiable information (PII) such as the name, social security number, and address of the patient
- Personal health information (PHI): this includes basic information such as blood type but also more sensitive data such as disease, symptoms…information that could be used against a patient if in the wrong hands.
- Financial information: credit card information, credit score, bank account information…
- And many more dimensions
So on the one hand, we have a wealth of information; on the other, a huge privacy concern and potential liability. How can we make sure the medical community can benefit from mining and analyzing the data without breaching any patients’ rights and without leading to massive breaches – whether intentional or accidental?
This is where SmartGuard steps in. It provides the same advanced data security capabilities as ADAF MD, but applied to Big Data deployments this time:
- Dynamic data filtering
- Dynamic data masking
Policies could be implemented that:
- Deny access to view the full social security number (only display last 4) – this is an example of dynamic data masking
- Only allow access to view patient records in the user’s own state – this is an example of dynamic data filtering
- Disease information can only be accessed if user consent has been given – this is an example of dynamic data filtering
- Blood type can be accessed in case of emergency (logged and reported) – this is an example of “break the glass” scenarios.
In fact you may also realize that there is no easy way to enforce any of these policies with existing solutions for Big Data security that provides the same level of granularity or expressiveness.
Big Data usually deals with very large quantities of data and, in many cases, the data might be unstructured. But we still see a lot of data that is relational, especially when it comes to controlling access to the data. Authorization policies using ABAC looks very similar to access policies we see for traditional use cases and applications. SmartGuard™ for Big Data is our solution to enforce fine grained, attribute based access control for Big Data systems.
If you have a question, please leave a comment or send it to firstname.lastname@example.org.