Guide: Getting Started with Data Vault
Introduction
Building truly secure data storage is still extremely hard, and the increased regulatory pressure makes mishandling of users’ personally identifiable information (PII) a very costly mistake.
We built Data Vault to help developers store user data and PII securely, without having to worry about data localization, key management, key rotation, or the million other issues you face when building and using cryptographic primitives.
Data Vault is ideal for scenarios in which you need to save sensitive user data such as data collected during KYC, health records, and credit card data, but it can also be used as a simple and convenient key-value store for user attributes. For instance, we use Data Vault to keep track of a user's cart in our e-commerce demo.
Data Vault has a number of properties that are not available through a database or through products like Hashicorp Vault:
- Database-agnostic
- Customers can bring their own Hardware Security Modules (HSM)
- Accessible directly from the frontend as well as the backend
- Handles encryption and data localization transparently
- Hardware-based root of trust to implement crypto-anchoring and avoid mass data exfiltration
- Data can be either globally replicated or localized to an individual region
- Compliance with GDPR and similar regulations by (i) keeping an audit log of data access requests and (ii) using crypto-shredding to service data deletion requests
- No noticeable latency for the end user
- Ability to perform encrypted searches with equality
How do you use Data Vault?
Data Vault’s interface is a standard REST API, coupled with a set of SDKs (both client-side and server-side) that together allow developers to store and retrieve users and their associated data.
Below you’ll see a few examples of how to use Data Vault, both client-side through our SDK and server-side through our APIs. We'll show a few examples with and without SlashID as the Identity Provider.
Data Vault with SlashID Access client-side
If you are using SlashID’s Access module for identity management, then Data Vault is already provided as part of Access.
For instance, this is a code snippet that shows how to register or sign-in a user and how to store and retrieve encrypted attributes using our client-side JavaScript SDK:
var sid = new slashid.SlashID();
...
const user = await sid.id(
org_id,
{
type: "phone_number",
value: "+13337777777"
},
{
method: "webauthn"
}
)
const bucket = org_id + "-organization-end_user_read_only";
await user.set(bucket, {"passport": "E65471234"});
...
const attrs = await user.get(bucket);
// => attrs === {"passport": "E65471234"}
In the example above, we are registering/logging in a user with the phone number +13337777777
, and using webauthn
as the authentication method. By default, if the user connected to the website from Lisbon their data would be automatically stored in the EU, and if they connected from New York the data would be stored in the US - you can override that choice in the id()
call.
All attributes are stored in attribute buckets, which define access control policies on the attributes in the bucket. For full details on how to use attribute buckets, read the dedicated guide.
Calling set
on a user creates a new attribute in our Data Vault module. When that happens an attribute-specific encryption key is generated, which is used to encrypt passport
. The key is itself encrypted with a user-specific key generated at user-creation time.
Calling get
on a user allows you to retrieve one or more attributes associated with the user. The attributes are decrypted and returned in clear-text to the caller.
Note how, throughout the example, you don't have to worry about encryption nor data localization, and the user data never touches your own infrastructure.
Client-side Data Vault without SlashID Access
If you are not using Access for Identity Management, you can still use Data Vault securely from your frontend.
You first need to create a person in SlashID through our APIs:
curl -L -X POST 'https://api.slashid.com/persons' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks' \
--data-raw '[
{
"attributes": {},
"handles": [
{
"type": "email_address",
"value": "user_blogpost@user.com"
}
],
"region": "us-iowa",
"roles": [
],
"groups": [
]
}
]'
{
"result":
[
{
"person_id":"0abc1553-a907-7dcf-1928-3abdae5cbabc",
"active":true,
"roles":[]
}
]
}
Once you have the person ID you can mint an access token for that person through our APIs:
curl -L -X POST 'https://api.slashid.com/persons/0abc1553-a907-7dcf-1928-3abdae5cbabc/mint-token' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks'
{
"result": "eyJhbGciOiJSUzI1NiIsImtpZCI6InZFcHNGZyJ9.dsaRoZW50aWNhdGVkX21ldGhvZHMiOlsiYXBpIl0sImV4cCI6MTY2NzU2NzIxMCwiZmlyc3RfdG9rZW4iOmZhbHNlLCJpYXQiOjE2Njc0ODA4MTAsImlzcyI6InNhbmRib3guc2xhc2hpZC5kZXYvYXV0aG4iLCJqdGkiOiI2MDU3Yzc1MDNhYzhkNjM2NWVkMDZkMDFkOGIwNGZlOSIsIm9pZCI6ImY5NzhhNmJkLTNlNDUtYmNkYS1jYjRlLTU3M2QwYmFkMTU1YiIsInVzZXJfaWQiOiJwaWQ6MDE5NzU1NDdlNDdkYTU5ZDc0ZDVmYTVkZjJlMWYxZGYzZmZhMWEyZWNlMTc4NWMzZTk3ZTZlZGUwNDc4ZTg5ZThkM2NhNzYxMmE6MSJ9.EMwqVfJLvVVhOy_TQ7UuX1kiXphXBwt3E38Rkdkgkld_C5TfHqi7BRgwyc6FQEsnX4QzoB9-vCjupVxpAlY_PuVaxvNI3j1NUilZuPB8GMsiUS5Ivc1k9yUBMt11qOd1GR6lU9GiUZnXzOaYiOMc5LZJs8ig7YsFJCSZ34QpgcwQAkR0gIWOBezc_q4vHel8NnOgjx4syhUz-1gLngyqgwpdWae1zmzMY9zrof7NfXnSr69waLQvToA7gQWDgC0TLimpvNNiHrE3Do7v0JgXJ7JQ63ARE7ES0uppaX-mvpo7HYp32RfDmh29EOsMkSt0NFrY8t1UhUvySh57ECkePA"
}
The resulting token can be loaded in Javascript through the SDK as follows:
const user = new slashid.User(
"eyJhbGciOiJSUzI1NiIsImtpZCI6InZFcHNGZyJ9.dsaRoZW50aWNhdGVkX21ldGhvZHMiOlsiYXBpIl0sImV4cCI6MTY2NzU2NzIxMCwiZmlyc3RfdG9rZW4iOmZhbHNlLCJpYXQiOjE2Njc0ODA4MTAsImlzcyI6InNhbmRib3guc2xhc2hpZC5kZXYvYXV0aG4iLCJqdGkiOiI2MDU3Yzc1MDNhYzhkNjM2NWVkMDZkMDFkOGIwNGZlOSIsIm9pZCI6ImY5NzhhNmJkLTNlNDUtYmNkYS1jYjRlLTU3M2QwYmFkMTU1YiIsInVzZXJfaWQiOiJwaWQ6MDE5NzU1NDdlNDdkYTU5ZDc0ZDVmYTVkZjJlMWYxZGYzZmZhMWEyZWNlMTc4NWMzZTk3ZTZlZGUwNDc4ZTg5ZThkM2NhNzYxMmE6MSJ9.EMwqVfJLvVVhOy_TQ7UuX1kiXphXBwt3E38Rkdkgkld_C5TfHqi7BRgwyc6FQEsnX4QzoB9-vCjupVxpAlY_PuVaxvNI3j1NUilZuPB8GMsiUS5Ivc1k9yUBMt11qOd1GR6lU9GiUZnXzOaYiOMc5LZJs8ig7YsFJCSZ34QpgcwQAkR0gIWOBezc_q4vHel8NnOgjx4syhUz-1gLngyqgwpdWae1zmzMY9zrof7NfXnSr69waLQvToA7gQWDgC0TLimpvNNiHrE3Do7v0JgXJ7JQ63ARE7ES0uppaX-mvpo7HYp32RfDmh29EOsMkSt0NFrY8t1UhUvySh57ECkePA"
)
From this point onwards, you can use the User
object just like you would if you were using Access for Identity Management:
await user.set(bucket, { passport: "E65471234" })
const attrs = await user.get(bucket)
// => attrs === {"passport": "E65471234"}
Accessing Data Vault from the backend
Accessing Data Vault from your backend is straightforward through a simple REST API. Here is an example to retrieve all the attributes in clear-text for a given user:
curl -L -X GET 'https://api.slashid.com/persons/0abc1553-a907-7dcf-1928-3abdae5cbabc/attributes' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks'
{
"result": {
"person_pool-end-user-read-only": {
"email": "testuser_32819h@slashid.dev",
"firstName": "John",
"lastName": "Smith",
"passport": "E65471234"
}
}
}
To add a new attribute we can PUT it as follows:
curl -L -X PUT 'https://api.slashid.com/persons/0abc1553-a907-7dcf-1928-3abdae5cbabc/attributes/person_pool-end_user_read_only' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks'
--data-raw '{"creditcard_num": "324345678123403"}'
And to delete an attribute, it’s a simple DELETE:
curl -L -X DELETE 'api.sandbox.slashid.com/persons/0abc1553-a907-7dcf-1928-3abdae5cbabc/attributes/person_pool-end_user_read_only?attributes=lastName' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks'
{
"result": {
"person_pool-end-user-read-only": {
"email": "testuser_32819h@slashid.dev",
"firstName": "John",
"passport": "E65471234"
}
}
}
Lastly, you can selectively GET individual attributes to minimize accessing unneeded clear-text confidential information:
curl -L -X GET 'https://api.slashid.com/persons/0abc1553-a907-7dcf-1928-3abdae5cbabc/attributes/person_pool-end_user_read_write?attributes=creditcard_num' \
-H 'Accept: application/json' \
-H 'SlashID-OrgID: c4681245-11d2-4fa5-835f-378b04ff7395' \
-H 'SlashID-API-Key: NnfXQCnvn/YvwHhVtaHF39/8X8kqAzks'
{
"result": {
"creditcard_num": "324345678123403"
}
}
The internals
We provide a deep-dive on our encryption scheme in this blogpost.
At a high level, we decided early on that our encryption layer would be database-agnostic, because we wanted the ability to expand datastore coverage beyond our current database, if needed.
Some data structures in our architecture are encrypted and globally replicated, while others are encrypted but localized to a specific region.
We make extensive use of envelope encryption. Our keys are encrypted following a tree-like logic as shown schematically in the picture below.
Properties of the Key Hierarchy
The key hierarchy mentioned above is generated per-region and this structure brings a number of advantages:
- A compromised key doesn't compromise all the data
- Deleting person data can be achieved through crypto-shredding by deleting the corresponding key
- Data can be replicated globally without violating data residency requirements since the keys are localized per-region
- The root of the chain of trust is stored in a Hardware Security Module (HSM) making a whole-database compromise an extremely arduous task
- We can enforce fine-grained access control on the HSM to tightly control and reduce the number of services able to decrypt subtrees
Key Lifecycle
Keys are created on demand when a new organization or a new person is created. Keys are never stored unencrypted anywhere at rest and only one service within SlashID’s infrastructure has the ability to access the HSM to decrypt the master key, significantly reducing the chances of compromise.
Derived keys are stored in a database that is encrypted, geo-localized and backed-up every 24 hours.
Lastly, all keys are rotated at regular, configurable intervals.
The storage engine
While Data Vault is database agnostic and is exposed as a key-value storage, we are currently using Cloud SQL for PostgreSQL for attribute storage. Cloud SQL is deployed in multiple GCP regions to obtain as much granular coverage as possible for data localization purposes.
As mentioned earlier, key material is stored in the region closest to the user to comply with data localization laws. Encrypted user data is either localized with the key or globally replicated across different regions to reduce latency, depending on access patterns.
What’s next?
We have a number of exciting upcoming features for Data Vault:
- Encryption/decryption API
- Tokenization and PCI compliance
- In-browser encryption
And many more!
In the next few weeks we’ll describe how we are approaching in-browser encryption so you can decide whether to encrypt the data in the backend or let the user encrypt data client-side in a non-custodial fashion.
Conclusion
If you are interested in Data Vault or you have built a similar system yourself, we’d love to hear from you. Please reach out to us!