Data Transformations

Data Transformations in TensorFlow:

We have a function called tf.Transform()in Tensorflow which is used to preprocess the data. Transform is a library for preprocessing of the input value for Tensorflow.

Using Tensorflow Transform we can:

– Any missing input values are normalized by replacing the mean, median and standard deviation.

– Any float values can be converted into integers.

– Any string values can be converted into integers by generating vocabulary for all input values.

Basically, Tensorflowgot few inbuilt functions to manipulate the single examples or batch examples. But, by using tf.Transform() we can increase these capabilities to support full passes over the entire training data set.

Import Transform function by using below code:

import tensorflow_transform as tft

Here let’s create sample example, for this we are have two types of data.

– raw_data– it’s an input data that we consider for preprocessing.

– raw_data_metadata– Basically, it’s a metadata of raw_data which is captured.

raw_data = [{'a': 1, 'b': 1, 'c': 'hello'},

 {'a': 2, 'b': 2, 'c': 'world'},

{'a': 3, 'b': 3, 'c': 'welcome'} ]

raw_data_metadata = dataset_metadata.DatasetMetadata(

dataset_schema.from_feature_spec({

 'b': tf.FixedLenFeature([], tf.float32),

    'a': tf.FixedLenFeature([], tf.float32),

  'c': tf.FixedLenFeature([], tf.string),

    }))

Prepocessing by using Transform:

Preprocessing is the main task for the Machine Learning applications. By using the tf.Transform() we can easily preprocess where it happens and It accepts and returns dictionary of tensors or sparse tensors.

Here, two functions are the heart of the preprocessing:

1. Tensorflow ops: Every function that accepts and returns tensors, which is calledTensorFlow ops. These are addTensorFlow operations to the tensorflow graphs that transforms raw data into transformed data one feature vector at a time.

2. TensorFlow Transform Analyzers:This will also accept and returns the tensors but unlike the Tensorflow ops will run only once during training and create tensors and added to the graph. The Transform() will gives fixed set of analyzers.

defpreprocessing_fn(inputs):

“””Preprocess input columns into transformed columns.”””

  a = inputs['a']

    b = inputs['b']

    c = inputs['c']

a_centered = a - tft.mean(x)

b_normalized = tft.scale_to_0_1(y)

c_integerized = tft.compute_and_apply_vocabulary(c)

a_centered_times_b_normalized = (a_centered * b_normalized)

    return {

        'a_centered': a_centered,

    'b_normalized': b_normalized,

        'c_integerized': c_integerized,

        'a_centered_times_b_normalized': a_centered_times_b_normalized,

    }.

a_centered:

thisa_centered function is built by applying tf.mean to “a “ and substractingthe mean from the “a”.

centered = (value in a – mean of a)

for Ex: the input s of a= [1, 2, 3] the mean of input is 2, and we subtract it from a to center our a values at 0. So we get the result of [-1.0, 0.0, 1.0]. thatis correct.

b_normalized:

theb_normalized is using a method tft.scale_to_0_1.

Normalized =( value – minimum value) /( maximum value – minimum value)

We wanted to scale our b values between 0 and 1. Our input was [1, 2, 3] so our result of [0.0, 0.5, 1.0] is correct.

c_integerized:

We wanted to map our strings to indexes in a vocabulary, and there were only 2 words in our vocabulary (“hello” and “world”). So with input of

 ["hello", "world", "welcome"]

our result of

[0, 1, 0]

is correct.