Elasticsearch is often used to store sensitive data. Moreover, a single index may store both sensitive and insensitive data that should be accessible for a broader audience. Search Guard allows you to implement fine-grained access control to these fields by using field level security and field anonymisation.
Search Guard provides to ways of controlling access to fields in a document:
Both of them are performed on the fly when executing a query. This means they do not require any data transformations or duplications at ingest time. The rules controlling field access can be changed at runtime without the need to change the underlying data.
In this post we show a demo of a field anonymisation and its capabilities. We will work on a sample log dataset to extract logs and count unique IP addresses.
Field anonymisation: Hashing and regular expressions
Search Guard provides two ways of anonymising field values: hashing and regular expressions.
- Uses a (salted) hash function to replace the complete content of a field with a consistent hash - E.g. replace the "firstname" and "lastname" fields of a document with a hash
- Uses a regular expression to replace parts of a field value matching the expression with a constant value - E.g. detect IP addresses in a field and replace them with "X.X.X.X"
In this first article of a two-part series we will use hashing.
We start with uploading a sample log dataset
with the following curl call:
curl -k -XPOST 'https://localhost:9200/_bulk?pretty' \
--data-binary @logs.jsonl \
-H 'Content-Type: application/x-ndjson' \
and create a logstash index pattern in Kibana that aggregates the daily indexes.
Let us now create logreader user in sginternalusers.yml:
#password is: pass
and a role in sgroles.yml that masks all ip*_ fields within an index:
Note that we need to use wildcard as the index contains two fields ip and ip.keyword to allow searching by exact values.
We then map Search Guard roles onto backend roles by adding an entry in sgrolesmapping.yml:
We also add salt for hashing in elasticsearch.yml_:_
and reload the cluster configuration:
./sgadmin.sh -cd ../sgconfig/ -icl -nhnv \
-cacert ../../../config/root-ca.pem \
-cert ../../../config/kirk.pem \
That finishes the setup.
We can log into Kibana with the logreader credentials to see how the data looks like. The sample dataset contains logs from 2013, so do not forget to change the “_Time Range” _in Kibana:
Anonymised records should now look like this:
The ip field has been converted into hash on the fly for the sglogsanonymised role. We can also perform a quick cross-check and log in to Kibana with a role that does not have field anonymisation configured. For example, if you are using the Search Guard demo setup you can user the admin user. When viewing the log data with this user, the ip field is displayed in clear text.
Sometimes when changing configuration it may take some time before Kibana clears its caches. In this case, we encourage you to clear browser data or test with CURL:
curl -k -XGET 'https://#es-url#:9200/logstash-*/_search' \
-u logreader:pass | grep --color=auto \"ip\"
Let us now check how many unique IP addresses are within logs:
This query could not have been answered with field-level security alone and shows the advantage of field anonymisation in that specific case. Fortunately, Search Guard implements them both, so you can pick the one that suits best your needs or even mix them together.
Where to got next: