Data governance: Sensitive Data and PII Detection

Context

For compliance and security reasons it is important to monitor and minimize sensitive data (such as PII) moving across your environment.

Solution

Gate can help detecting, encrypting and tokenizing PII and sensitive data to minimize the risk of data leakage.

As a prerequisite, you should have Gate deployed in your infrastructure.

Step 1 - Run Gate in transparent mode.

You should also have Gate deployed in your infrastructure. You should run Gate in transparent mode to ensure that everything was correctly deployed.

Step 2 - Define your detection policies

Gate offers a rich set of options to customize detection behavior. In particular for each PII type you can:

customize the detection with regexp to increase accuracy
decide on the behavior: whether to replace, mask, tokenize, encrypt or simply log the presence of sensitive data

Currently we support:

email addresses (EMAIL_ADDRESS)
phone number (PHONE_NUMBER)
urls (URL)
credit cards (CREDIT_CARD)
IBAN (IBAN)
SSN (US_SSN)
UK NHS numbers (UK_NHS)
US ITIN (ITIN)
US driver license (US_DRIVER_LICENSE)

Once sensitive data is detected we have multiple options:

replace
mask
hash
encrypt
keep

As an example, this is the configuration to monitor by default, mask phone numbers, tokenize email addresses, encrypt urls and replace credit cards.

    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      parameters:
        anonymizers: |
          DEFAULT:
            type: keep
          PHONE_NUMBER:  # (123) 456-7890 ➔ (123) 456-****
            type: mask
            masking_char: "*"
            chars_to_mask: 4
            from_end: true
          EMAIL_ADDRESS:  # john.smith@example.com ➔ 8e621e3d0368631d263d07a351fa8d34fba0d17c15fbcdec11a5f58008d022a0
            type: hash
            hash_type: sha256  # either sha256 (default), sha512 or md5
          URL:  # https://john.smith.me ➔ KshULhOqJrLmuHOsQ/ArGHQK8Wrjg0BKdypb/77PYf+64v/FqB3zufbVlnGD4sn4
            type: encrypt  # Replaces value with Base64-encoded AES-CBC encrypted value with PKCS#7 padding and a prepended IV
            key: abcdefghijklmnop  # AES key, must be 128, 192 or 256 bits
          CREDIT_CARD:  # 4111111111111111 ➔ <CREDIT_CARD>
            type: replace
            new_value: <CREDIT_CARD>  # Defaults to <TYPE>

Step 3 - Optionally configure custom types

You can specify your own types using regex patterns, as shown below: or more regex patterns, as shown below, to improve the detection accuracy.

    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      parameters:
        analyzer_ad_hoc_recognizers: |
          - name: Zipcode regex
            supported_language: en
            supported_entity: ZIP
            patterns:
            - name: zipcode
              regex: "(\\b\\d{5}(?:\\-\\d{4})?\\b)"
              score: 0.01

In this example we are specifying the following data:

supported_language : the language this regex should be applied to
supported_entity: the entity type to apply the regex on
One or more patterns, each with a regexp and a score which represents the confidence in the PII detection.

Example

Configuring Gate

This is an example configuration for Gate such that the */api/echo endpoint is monitored by the PII detetection plugin and the following rules are enforced:

By default, sensitive data is replaced
The last 4 digits of the phone number are replaced with *
Email addresses are tokenized
Urls are encrypted with a simple symmetric key

slashid_config: &slashid_config
  slashid_org_id: { { .env.SLASHID_ORG_ID } }
  slashid_api_key: { { .env.SLASHID_API_KEY } }
  slashid_base_url: { { .env.SLASHID_BASE_URL } }

gate:
  port: 8080
  log:
    format: text
    level: trace

  default:
    target: http://backend:8000

  plugins:
    - id: pii_anonymizer
      type: anonymizer
      enabled: false
      parameters:
        anonymizers: |
          DEFAULT:
            type: replace
          PHONE_NUMBER:
            type: mask
            masking_char: "*"
            chars_to_mask: 4
            from_end: true
          EMAIL_ADDRESS:
            type: hash
          URL:
            type: encrypt
            key: abcdefghijklmnop

Conclusion

Through Gate's sensitive data detection plugin you are able to enforce data governance policies and monitor any potential sensitive data leakage.

Context​

Solution​

Step 1 - Run Gate in transparent mode.​

Step 2 - Define your detection policies​

Step 3 - Optionally configure custom types​

Example​

Configuring Gate​

Conclusion​