Tokenization as an API: A Walkthrough with Fortanix DSM SaaS

In today’s post-pandemic era businesses have been desirably or undesirably forced to

  1. decentralize their business establishments,
  2. allow their workforce to work remotely,
  3. comply with various rules and regulations such as GDPR, CCPA, HIPPA, PCI, SOC 2, Schrems II, and last but not the least,
  4. boost the shareholder value and do whatever it takes to prevent any harm to business reputation.

It doesn’t take a security expert to recognize that this new business model has the potential of making these organizations more vulnerable to cybersecurity attacks such as ransomware attacks that could lead to loss of sensitive Personally Identifiable Information (PII) data leading to catastrophic consequences.

Our world is a more dangerous place today than it used to be. Hackers and cybercriminals are only getting smarter. These are macro geopolitical factors, which are difficult to control, however, making your data (your new currency) more secure is 100% in your control. One of the decades-old and battle-tested approaches to secure your data is encryption. Encrypt your data at all stages and never reveal it to unauthorized parties. Fortanix services allow you to keep your data private even while you do complex computations such as when you train your AI/ML models with it. Your sensitive data is protected against both passive and active attackers that are after your sensitive data - every time and at every stage. We offer different novel and state-of-the-art ways to protect your data among which is Data Tokenization.

Data Tokenization replaces sensitive personally identifiable information (PII) such as credit card account numbers with non-sensitive and random string of characters, known as a ‘Token’. A token has no meaningful value if breached and therefore, can be handled and used by applications without violating privacy regulations. Tokenization helps achieve higher overall security standards. Tokenization is also required to meet standards set by Payment Card Industry Council and compliance requirements like HIPAA and GDPR. For more information about how this works, please refer to our previous blog or download our Tokenization solution brief.

The objective of this blog is to give a quick-start guide and sample API constructs to quickly consume our Tokenization-as-a-Service solution for some of the most common PII data types.

To get started with our Tokenization-as-a-Service, simply follow these steps:

  1. Go to DSM SaaS sign-up page and fill in all the details to request a free trial. You can also watch this short video to see the how-to-get-started steps in action. In order to understand all the global regions our SaaS is available in today, you can view DSM SaaS Global Availability Map.
  2. You will be redirected to our DSM SaaS sign-in page based on the region you selected during the free trial sign up. Fill in the basic details and hit the SIGN UP button. That’s all! You should now be able to sign-in to our global SaaS service. You will also receive a welcome email with additional details.
  3. Login to DSM SaaS and create a Trial account
  4. Access your newly created account from the accounts listing page and simply create an App using the authentication method of your choice. We support API Keys, Certificates, Trusted CAs, JSON Web Tokens, AWS IAM, etc. Once you have configured the App, you are all set to keeping your data safe using Fortanix Tokenization-as-a-Service.

Next, we present some examples and snippets of our API constructs on how to create Tokenization objects and invoke tokenization/de-tokenization on those objects:

  1. Creating a tokenization object
    a. Social Security numbers
    b. Credit card numbers
    c. Email addresses
    d. Support for Chinese, Japanese, Korean characters

  2. Specifying characters to be masked
  3. Performing tokenization
  4. Performing detokenization or masked detokenization

Creating a tokenization object

To get started with tokenization in Fortanix Data Security Manager (DSM), you need to first create a tokenization object for tokenizing your data type. Once you have set up an account following the DSM SaaS instructions above, and have configured your app properly, initialize an app session by calling our /sys/v1/session/auth endpoint. (More details on authentication are available here.)

Now, you’ll need to make a POST request to our /crypto/v1/keys endpoint. The basic structure looks like so; what we’re primarily interested in is the fpe.format field, where we’ll specify what our datatype looks like.

{ 
  "name": "your key name here", // give a name for your tokenization object 
  "key_size": 256, 
  "fpe": { 
    "format": { ... }, // where you specify your datatype format 
    "description": "..." // optional string that describes your datatype 
  }, 
  "obj_type": "AES" 
} 

Fortanix DSM provides powerful APIs for specifying almost any arbitrary datatype format – all you need to do is to describe its format, and DSM will be able to generate encrypted tokens that satisfy the format provided. The fpe.format field is basically a tree-like JSON object that encodes the structure of the datatype to be processed.

Let’s look at what all this would look like for an American Social Security Number:

{ 
  "name": "SSN tokenization object", 
  "key_size": 256, 
  "fpe": { 
    "format": { 
      "concat": [ 
        { 
          "min_length": 3, 
          "max_length": 3, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_lt": 900, 
            "num_ne": [0, 666] 
          }, 
          "mask": "all" 
        }, 
        {"literal": ["-"]}, 
        { 
          "min_length": 2, 
          "max_length": 2, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_ne": [0] 
          } 
        }, 
        {"literal": ["-"]}, 
        { 
          "min_length": 4, 
          "max_length": 4, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_ne": [0] 
          } 
        } 
      ] 
    } 
  }, 
  "obj_type": "AES" 
} 

First off, notice how the fpe.format field contains a concat field consisting of several JSON object subparts. In other words, we are telling DSM that an SSN consists of the following subparts, one after the other: three digits, a literal hyphen (“-”) character, 2 digits, another literal hyphen character (“-”), and four digits. Furthermore, note that we have included constraints on some of the subparts. For instance, real SSNs in the United States do not start with “9”, so we include a num_lt constraint to tell DSM that the first three digits should not be 900 or above.

(In the first subpart above, ignore the mask field for now. We’ll come back to that in a bit.)

Sending the above request gives us a response like the following (irrelevant fields have been omitted and replaced with ellipses)

{ 
  ... 

  "fpe": { 
    "format": { 
      "concat": [ 
        { 
          "min_length": 3, 
          "max_length": 3, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_lt": 900, 
            "num_ne": [0, 666], 
            "applies_to": "all" 
          }, 
          "mask": "all" 
        }, 
        {"literal": ["-"]}, 
        { 
          "min_length": 2, 
          "max_length": 2, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_ne": [0], 
            "applies_to": "all" 
          } 
        }, 
        {"literal": ["-"]}, 
        { 
          "min_length": 4, 
          "max_length": 4, 
          "char_set": [["0", "9"]], 
          "constraints": { 
            "num_ne": [0], 
            "applies_to": "all" 
          } 
        } 
      ] 
    }, 
    "description": null, 
    "name": null 
  }, 

  ... 

  "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b", 
  "name": " SSN tokenization object", 

  ... 
} 

Congratulations - we now have a newly created tokenization object! That response contained a lot of information, but we’re primarily interested in the kid field. Remember this value since we’ll use it when we use this object to do tokenization (and detokenization).

Other datatypes

Before we move forward with using our newly created tokenization object, let’s take a brief detour and see how we can create other kinds of tokenization objects, for different datatypes. Again, the basic structure of the request is the same; the main difference is the fpe.format field, which we adapt for each token type that we wish to describe. (Hence, in all of our examples below, I will only provide the fpe.format fields.)

Credit card number

{ 
  "min_length": 13, 
  "max_length": 19, 
  "char_set": [["0", "9"]], 
  "constraints": { 
    "luhn_check": true 
  } 
} 

This is a rather simple schema – we are telling our tokenization engine that our credit card datatype consists of 13 to 19 digits, and must satisfy the Luhn checksum algorithm.

Email address

The full email address specification is rather complex; instead, let’s consider a simplified approximation.

{ 
  "concat": [ 
    { 
      "char_set": [ 
        ["!", "!"], ["#", "'"], ["*", "+"], ["-", "9"], 
        ["=", "="], ["?", "?"], ["A", "Z"], ["^", "~"] 
      ], 
      "min_length": 1, 
      "max_length": 64 
    }, 
    {"literal": ["@"]}, 
    { 
      "concat": [ 
        { 
          "char_set": [ 
            ["0", "9"], ["A", "Z"], ["a", "z"], ["-", "-"] 
          ], 
          "min_length": 1, 
          "max_length": 63 
        }, 
        { 
          "multiple": { 
            "concat": [ 
              {"literal": ["."]}, 
              { 
                "char_set": [ 
                  ["0", "9"], ["A", "Z"], ["a", "z"], ["-", "-"] 
                ], 
                "min_length": 1, 
                "max_length": 63 
              } 
            ] 
          } 
        } 
      ], 
      "max_length": 255 
    } 
  ] 
} 

Without going into the details, a high-level breakdown of the above structure is as follows:

  • The first subpart inside the outermost concat field encodes the local-part of an email (e.g., “test” in test@fortanix.com).
    • A local-part consists of at most 64 ASCII printable characters. This schema disallows certain ASCII characters, however (e.g., “@”, parentheses, etc.)
  • The second subpart inside the outermost concat field encodes a literal “@” character.

  • The last subpart encodes the domain name (e.g., the “fortanix.com” in test@fortanix.com).
    • Notice how this subpart contains subparts of its own; this is needed in order to encode the structure of a domain name, described below:
    • A domain name is at most 255 characters long, and consists of several dot-separated DNS labels (e.g., “fortanix”, “com”). Each DNS label is at most 63 characters long.

Chinese, Japanese, and Korean characters

Can Fortanix Tokenization handle Unicode characters? Absolutely! Our API allows you to specify exactly which characters you want to tokenize over, and CJK characters are no exception. For instance, the following fpe.format JSON object will allow you to tokenize a 10-character string of Chinese characters (from the “CJK Unified Ideographs” Unicode block).

{ 
  "min_length": 10, 
  "max_length": 10, 
  "char_set": [["\u4E00", "\u9FFF"]] 
} 

Notice how the char_set field covers exactly the range of Unicode characters that we want to accept: from U+4E00 to U+9FFF (precisely the “CJK Unified Ideographs” block).

Here’s another 10-character example, this time with Japanese hiragana and katakana.

{ 
  "min_length": 10, 
  "max_length": 10, 
  "char_set": [ 
    ["\u3041", "\u3096"], ["\u309D", "\u309F"], ["\u30A0", "\u30FF"] 
  ] 
} 

Here, the character set consists of most of the characters from the “Hiragana” Unicode block, and all of the characters from the “Katakana” Unicode block. One can easily add Unicode ranges for any relevant kanji characters, like the “CJK Unified Ideographs” block above.

And finally, here’s an example with Korean Hangul:

{ 
  "min_length": 10, 
  "max_length": 10, 
  "char_set": [["\uAC00", "\uD7A3"]] 
} 

The character set here consists of all assigned codepoints within the “Hangul Syllables” block.

Masking

Data masking is a de-tokenization technique in which sensitive data is obfuscated

Remember that mask field in the SSN example above? That’s a way of providing token masking, where we replace certain characters in the detokenized output with asterisks. For example, if my application only needs to verify the last couple digits of a user’s SSN, there is no need to return the entire SSN when detokenizing. Thus, as a security measure, we can mask the extraneous SSN digits, preventing unauthorized applications from seeing more than they’re supposed to.

An app can be configured to always use masked decryption; this is done by setting the app’s MASKDECRYPT permission. This can be done through our UI, where we can configure permissions for each app. (More details on how to do this are available here.)

For reference, here is our SSN fpe.format JSON from above:

{ 
  "concat": [ 
    { 
      "min_length": 3, 
      "max_length": 3, 
      "char_set": [["0", "9"]], 
      "constraints": { 
        "num_lt": 900, 
        "num_ne": [0, 666], 
        "applies_to": "all" 
      }, 
      "mask": "all" 
    }, 
    {"literal": ["-"]}, 
    { 
      "min_length": 2, 
      "max_length": 2, 
      "char_set": [["0", "9"]], 
      "constraints": { 
        "num_ne": [0], 
        "applies_to": "all" 
      } 
    }, 
    {"literal": ["-"]}, 
    { 
      "min_length": 4, 
      "max_length": 4, 
      "char_set": [["0", "9"]], 
      "constraints": { 
        "num_ne": [0], 
        "applies_to": "all" 
      } 
    } 
  ] 
} 

Here, we are masking the entire first subpart of an SSN – e.g., something like “123-45-6789” will turn into “***-45-6789” when masked.

In addition to masking entire subparts at a time, we can also mask individual characters within a subpart. For instance, if I just want to mask the first character of an SSN, we can use the following subpart (again, this is for the first three digits of an SSN):

{ 
  "min_length": 3, 
  "max_length": 3, 
  "char_set": [["0", "9"]], 
  "constraints": { 
    "num_lt": 900, 
    "num_ne": [0, 666] 
  }, 
  "mask": [0] 
} 

The mask field here is an array of Python-like indices. (Index 0 refers to the first character, index 1 refers to the second, -1 refers to the last character, -2 refers to the second-to-last, and so on.)

Performing tokenization

Once we have a tokenization object ready to use, let’s use it to tokenize data! To do so, we simply need to make a POST request to our /crypto/v1/encrypt endpoint, passing in the kid field of the tokenization object along with our input data. The general structure is as follows:

{ 
  "key": { 
    "kid": "kid from tokenization object creation (should be a UUID)" 
  }, 
  "alg": "AES", 
  "plain": "UTF-8 plaintext token here, base64 encoded", 
  "mode": "FPE" 
} 

Let’s see how this would work for the SSN tokenization object I created earlier, which had kid “fbfec653-53b7-42ca-9c1e-5526ef3e288b”. I want to tokenize “123-45-6789”. To do so, I need to base64 encode this string (which should be encoded in UTF-8 first), doing so gives us “MTIzLTQ1LTY3ODk=”. I can now construct my request body, which looks like this:

{ 
  "key": { 
    "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b" 
  }, 
  "alg": "AES", 
  "plain": "MTIzLTQ1LTY3ODk=", 
  "mode": "FPE" 
} 

Sending this request, I get back the following response:

{ 
  "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b", 
  "cipher": "MTM0LTc0LTQ5MTM=" 
} 

Our tokenized value is present in the cipher field. All we need to do is base64 decode the string, which gives us “134-74-4913”.

Voila, we now have a tokenized SSN!

Performing detokenization

Performing detokenization is very similar to performing tokenization; we instead send a POST request to our /crypto/v1/decrypt endpoint. The general request structure looks like so:

{ 
  "key": { 
    "kid": "kid from tokenization object creation (should be a UUID)" 
  }, 
  "alg": "AES", 
  "cipher": "UTF-8 tokenized token here, base64 encoded", 
  "mode": "FPE" 
} 

Continuing with our SSN example, I send the following request body:

{ 
  "key": { 
    "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b" 
  }, 
  "alg": "AES", 
  "cipher": "MTM0LTc0LTQ5MTM=", 
  "mode": "FPE" 
} 

Because the app I’ve been using has full DECRYPT permissions, DSM will output the entire detokenized token, unmasked. I get back the following response:

{ 
  "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b", 
  "plain": "MTIzLTQ1LTY3ODk=" 
} 

As expected, base64 decoding “MTIzLTQ1LTY3ODk=” gives us “123-45-6789”.

Masked detokenization

As mentioned earlier, we can configure an app so it always obtains masked values. If my app had the MASKDECRYPT permission (instead of DECRYPT), sending the decrypt request above would have given me

{ 
  "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b", 
  "plain": "KioqLTQ1LTY3ODk=" 
} 

Base64 decoding the plain field gives us a masked value, “***-45-6789”.

Alternatively, for apps with DECRYPT permissions, we can explicitly ask for a masked value by passing in an extra masked field in the decrypt request, which we set to true, e.g.,

{ 
  "key": { 
    "kid": "fbfec653-53b7-42ca-9c1e-5526ef3e288b" 
  }, 
  "alg": "AES", 
  "cipher": "MTM0LTc0LTQ5MTM=", 
  "mode": "FPE", 
  "masked": true 
} 

(This will give us the same response as the one above with “KioqLTQ1LTY3ODk=”.)

In the above API snippets, we demonstrated how you can quickly get started with DSM SaaS and start consuming Tokenization-as-a-Service within a few minutes by simply inserting our APIs into your DevSecOps builds. We only discussed few data types in this blog; however, we support a vast variety of data types out-of-the-box, and we allow you to pseudonymize complex data types using our state-of-the-art custom token types. Why wait? Sign up for a free trial today and help us make your business-critical PII data secure within minutes!

RBI Ebook CTA

Share this post:

Get our blog updates in your inbox: