# Deep Learning in Python 
## Session 02 - Keras Advanced Concepts

- *Course*: Big Data and Language Technologies
- *Date*: 11.04.2022

This session will cover a few more advanced concepts around Deep Learning in Python with Keras. We will build upon the ideas from the last session and learn about ways to customize the workflow further in detail. We will also learn how to solve some problems that we faced during the last session.

## Setup

In [1]:
import tensorflow as tf
import numpy as np

## Loading Data

This time, we will simply use a wrapper provided by Keras to load up the IMDB dataset that we explored in the last session. For reference, see the [API docs](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/load_data).

In [2]:
INDEX_FROM=3
NUM_WORDS=1000
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.imdb.load_data(num_words=NUM_WORDS,index_from=INDEX_FROM)

Note that this already provides us with a train-test split.

This dataset is already built using word indices instead of word strings. For transforming text from and to indices using the word index, see [this example](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb/get_word_index#example).

**Exercise**: Explore the first 3 samples of X_train by converting them back to strings. What is going wrong? Why?

### Naive solution

In [3]:
word_index = tf.keras.datasets.imdb.get_word_index()
inverted_word_index = {v:k for k,v in word_index.items()}

for i in range(3):
    decoded_sequence = " ".join(inverted_word_index[ind] for ind in X_train[i])
    print("label:",["negative","positive"][y_train[i]])
    print(decoded_sequence)
    print()

label: positive
the as you with out themselves powerful and and their becomes and had and of lot from anyone to have after out atmosphere never more room and it so heart shows to years of every never going and help moments or of every and and movie except her was several of enough more with is now and film as you of and and unfortunately of you than him that with out themselves her get for was and of you movie sometimes movie that with scary but and to story wonderful that in seeing in character to of and and with heart had and they of here that with her serious to have does when from why what have and they is you that isn't one will very to as itself with other and in of seen over and for anyone of and br and to whether from than out themselves history he name half some br of and and was two most of mean for 1 any an and she he should is thought and but of script you not while history he heart to real at and but when from one bit then have two of script their with her and most that wi

### Better solution that takes into account the special tokens that we defined implicitly while loading the dataset

In [4]:
word_index = tf.keras.datasets.imdb.get_word_index()
inverted_word_index = {v+INDEX_FROM:k for k,v in word_index.items()}
inverted_word_index |= {0:"<PAD>",
             1: "<START>",
              2:"<UNK>",
            3:  "<UNUSED>"}

for i in range(3):
    decoded_sequence = " ".join(inverted_word_index[ind] for ind in X_train[i])
    print("label:",["negative","positive"][y_train[i]])
    print(decoded_sequence)
    print()

label: positive
<START> this film was just brilliant casting <UNK> <UNK> story direction <UNK> really <UNK> the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same <UNK> <UNK> as myself so i loved the fact there was a real <UNK> with this film the <UNK> <UNK> throughout the film were great it was just brilliant so much that i <UNK> the film as soon as it was released for <UNK> and would recommend it to everyone to watch and the <UNK> <UNK> was amazing really <UNK> at the end it was so sad and you know what they say if you <UNK> at a film it must have been good and this definitely was also <UNK> to the two little <UNK> that played the <UNK> of <UNK> and paul they were just brilliant children are often left out of the <UNK> <UNK> i think because the stars that play them all <UNK> up are such a big <UNK> for the whole film but these children are amazing and should be <UNK> for what they ha

## `tf.data.Dataset`

Using `tf.data.Dataset`, we can represent very large datasets (will become very important later in the semester). Tensorflow will handle many features necessary for that internally.

**Exercise**: Use `tf.data.Dataset.from_generator` ([docs](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator)) to convert our ndarray-based dataset to `tf.data.Dataset`. Provide an `output_signature=(X,y)` (you will also have to have the generator return this format).

Using `tf.data.Dataset.from_tensor_slices` is probably difficult because the data is not padded yet.

In [5]:
def gen(Xs, ys):
    for X,y in zip(Xs,ys):
        yield (X,y)

output_signature=(tf.TensorSpec(shape=[None], dtype=tf.int32),tf.TensorSpec(shape=[],dtype=tf.int32))

train_ds = tf.data.Dataset.from_generator(lambda: gen(X_train,y_train), output_signature=output_signature)
test_ds = tf.data.Dataset.from_generator(lambda: gen(X_test,y_test), output_signature=output_signature)

Converting the data back to numpy is easy:

In [6]:
next(train_ds.as_numpy_iterator())

(array([  1,  14,  22,  16,  43, 530, 973,   2,   2,  65, 458,   2,  66,
          2,   4, 173,  36, 256,   5,  25, 100,  43, 838, 112,  50, 670,
          2,   9,  35, 480, 284,   5, 150,   4, 172, 112, 167,   2, 336,
        385,  39,   4, 172,   2,   2,  17, 546,  38,  13, 447,   4, 192,
         50,  16,   6, 147,   2,  19,  14,  22,   4,   2,   2, 469,   4,
         22,  71,  87,  12,  16,  43, 530,  38,  76,  15,  13,   2,   4,
         22,  17, 515,  17,  12,  16, 626,  18,   2,   5,  62, 386,  12,
          8, 316,   8, 106,   5,   4,   2,   2,  16, 480,  66,   2,  33,
          4, 130,  12,  16,  38, 619,   5,  25, 124,  51,  36, 135,  48,
         25,   2,  33,   6,  22,  12, 215,  28,  77,  52,   5,  14, 407,
         16,  82,   2,   8,   4, 107, 117,   2,  15, 256,   4,   2,   7,
          2,   5, 723,  36,  71,  43, 530, 476,  26, 400, 317,  46,   7,
          4,   2,   2,  13, 104,  88,   4, 381,  15, 297,  98,  32,   2,
         56,  26, 141,   6, 194,   2,  18,   4, 226

## Dataset persistence

Tensorflow makes it quite easy to save and load `tf.data.Dataset`.

### Using `tf.data.experimental.save` and `load`

`tf.data.experimental.save` ([docs](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/save)) and `load` ([docs](https://tensorflow.google.cn/api_docs/python/tf/data/experimental/load)) can be used to persist a Dataset to storage. This will create multiple files (shards).

**Exercise**: Save and load our dataset to storage.

In [7]:
tf.data.experimental.save(train_ds,path="imdb_train")
tf.data.experimental.save(test_ds,path="imdb_test")

train_ds=tf.data.experimental.load(path="imdb_train")
test_ds=tf.data.experimental.load(path="imdb_test")

Let's test it again:

In [8]:
next(train_ds.as_numpy_iterator())

(array([  1,  14,  22,  16,  43, 530, 973,   2,   2,  65, 458,   2,  66,
          2,   4, 173,  36, 256,   5,  25, 100,  43, 838, 112,  50, 670,
          2,   9,  35, 480, 284,   5, 150,   4, 172, 112, 167,   2, 336,
        385,  39,   4, 172,   2,   2,  17, 546,  38,  13, 447,   4, 192,
         50,  16,   6, 147,   2,  19,  14,  22,   4,   2,   2, 469,   4,
         22,  71,  87,  12,  16,  43, 530,  38,  76,  15,  13,   2,   4,
         22,  17, 515,  17,  12,  16, 626,  18,   2,   5,  62, 386,  12,
          8, 316,   8, 106,   5,   4,   2,   2,  16, 480,  66,   2,  33,
          4, 130,  12,  16,  38, 619,   5,  25, 124,  51,  36, 135,  48,
         25,   2,  33,   6,  22,  12, 215,  28,  77,  52,   5,  14, 407,
         16,  82,   2,   8,   4, 107, 117,   2,  15, 256,   4,   2,   7,
          2,   5, 723,  36,  71,  43, 530, 476,  26, 400, 317,  46,   7,
          4,   2,   2,  13, 104,  88,   4, 381,  15, 297,  98,  32,   2,
         56,  26, 141,   6, 194,   2,  18,   4, 226

Note: The [TFRecord format](https://www.tensorflow.org/tutorials/load_data/tfrecord) is the traditional method to save serialized data, which might save memory.

## `map` and `filter`

Using `map` ([docs](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map)) and `filter` ([docs](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#filter)) on `tf.data.Dataset` is very convenient, as the used functions are applied on the fly, controlled by demand.

It is recommended to use the `tf.function` decorator ([docs](https://www.tensorflow.org/api_docs/python/tf/function)) to improve performance if possible.

**Exercise**: From the `train_ds`, filter out all reviews shorter than 100 tokens.

In [9]:
@tf.function
def filter_func(X,y):
    return tf.shape(X)[0]>=100

train_ds = train_ds.filter(filter_func)

### \* Bonus: `flat_map`

`map` allows us to modify Dataset samples 1-to-1. If we want to split certain samples into a varying number of samples, we can use `flat_map` ([docs](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#flat_map)).

**Exercise**: Use `flat_map` on `train_ds` to split up long reviews into reviews of 100 tokens.

In [10]:
@tf.function
def map_func(X,y):
    size=tf.shape(X)[0]
    r=tf.range(0,(size//100)*100,1)
    r=tf.reshape(r,[size//100,100])
    Xs=tf.gather(X,r)
    ys=tf.repeat(y,size//100)
    return tf.data.Dataset.from_tensor_slices((Xs,ys))

train_ds = train_ds.flat_map(map_func)

In [11]:
it=train_ds.as_numpy_iterator()
for i in range(3):
    print(next(it))

(array([  1,  14,  22,  16,  43, 530, 973,   2,   2,  65, 458,   2,  66,
         2,   4, 173,  36, 256,   5,  25, 100,  43, 838, 112,  50, 670,
         2,   9,  35, 480, 284,   5, 150,   4, 172, 112, 167,   2, 336,
       385,  39,   4, 172,   2,   2,  17, 546,  38,  13, 447,   4, 192,
        50,  16,   6, 147,   2,  19,  14,  22,   4,   2,   2, 469,   4,
        22,  71,  87,  12,  16,  43, 530,  38,  76,  15,  13,   2,   4,
        22,  17, 515,  17,  12,  16, 626,  18,   2,   5,  62, 386,  12,
         8, 316,   8, 106,   5,   4,   2,   2,  16]), 1)
(array([480,  66,   2,  33,   4, 130,  12,  16,  38, 619,   5,  25, 124,
        51,  36, 135,  48,  25,   2,  33,   6,  22,  12, 215,  28,  77,
        52,   5,  14, 407,  16,  82,   2,   8,   4, 107, 117,   2,  15,
       256,   4,   2,   7,   2,   5, 723,  36,  71,  43, 530, 476,  26,
       400, 317,  46,   7,   4,   2,   2,  13, 104,  88,   4, 381,  15,
       297,  98,  32,   2,  56,  26, 141,   6, 194,   2,  18,   4, 226,
     

## Batch, shuffle, repeat

In order to make our dataset usable for training, we will need to batch it (split it up into batches), repeat it (so you can train on multiple epochs) and shuffle it (to avoid using the same order every time).

In this task, you will learn that the order of these operations indeed matters!

Let's create a dummy dataset:

In [12]:
DUMMY_DS_SIZE=30
dummy_ds=tf.data.Dataset.range(DUMMY_DS_SIZE)
DUMMY_BATCHSIZE=10
DUMMY_BUFFERSIZE=2*10

**Exercise**: Roll the dice to determine the order in which you will implement shuffle, batch and repeat. Try to spot flaws in the results by inspecting 5 epochs.

In [13]:
print(np.random.choice(["Shuffle, repeat, batch",
                       "Repeat, shuffle, batch",
                       "Batch, shuffle, repeat"]))

Repeat, shuffle, batch


In [14]:
new_dummy_ds=dummy_ds.shuffle(DUMMY_BUFFERSIZE).repeat().batch(DUMMY_BATCHSIZE)
for epoch in range(5):
    for batch in new_dummy_ds.take(DUMMY_DS_SIZE//DUMMY_BATCHSIZE).as_numpy_iterator():
        print(batch)
    print()
# Observation: Despite being a bit unintuitive (repeating after shuffling?!), this is indeed the correct solution.

[ 3  4  0 21  7 22  5 15 27 28]
[ 9 14 24  6 10 13 29  1 26 17]
[19 18 20  8 25 23 16 11  2 12]

[10 16  6 21  7 12  3 20 11 26]
[29 24  1 13  0  4  5  8  9 25]
[27 14 17 15 28 22  2 23 19 18]

[16 10  9 12 17 11 19  6  8  5]
[13  7 27 29 18 14 22 28  1  2]
[23 21 25 26 15 20  0  3 24  4]

[19 11  8 16 10  7 22 17 12 23]
[15  2 25  4  9  3 13 24 14  0]
[ 5 26  1 18 21 20 28 29  6 27]

[ 8  0 19  7 16 22  5 24 12 13]
[25  4  6 17 29 23 26 18 21  3]
[ 1  2 20 11 10 27 14 15  9 28]



In [15]:
new_dummy_ds=dummy_ds.repeat().shuffle(DUMMY_BUFFERSIZE).batch(DUMMY_BATCHSIZE)
for epoch in range(5):
    for batch in new_dummy_ds.take(DUMMY_DS_SIZE//DUMMY_BATCHSIZE).as_numpy_iterator():
        print(batch)
    print()
# Observation: Some samples occur multiple times inside the epoch (and sometimes even inside a batch)

[11  2 20  3 15 16 19 13 10 12]
[ 5  9 17 29  0  8  0  4  7 28]
[ 6  6  5  1  7 26 24 27 22 12]

[ 6  9  7 19  4  1 21 16 14 11]
[17 13 20  5  2  0 29  3  1  8]
[ 9  3 11  7 10 22 14 15 23  0]

[17  7 11  9 21  2 25  6  4 20]
[13 27 14 18 19 15  5  4 24  6]
[22 29 26  0 10  5  7  2 12 14]

[16 14  0  4  5 18  7  1 25  2]
[ 8 13  9 12 17  0 23  3 11 22]
[ 6 10  8 24 11 19 29  3 27  5]

[ 0 13  7  4  2 19 12 18  1 17]
[ 6  3 25 24 29 14  5  9  8  7]
[28  3 16 26 10  4 20 11  0  5]



In [16]:
new_dummy_ds=dummy_ds.batch(DUMMY_BATCHSIZE).shuffle(DUMMY_BUFFERSIZE).repeat()
for epoch in range(5):
    for batch in new_dummy_ds.take(DUMMY_DS_SIZE//DUMMY_BATCHSIZE).as_numpy_iterator():
        print(batch)
    print()
# Observation: Only shuffles batches -> batch should be called last

[20 21 22 23 24 25 26 27 28 29]
[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]

[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[0 1 2 3 4 5 6 7 8 9]

[20 21 22 23 24 25 26 27 28 29]
[10 11 12 13 14 15 16 17 18 19]
[0 1 2 3 4 5 6 7 8 9]

[10 11 12 13 14 15 16 17 18 19]
[0 1 2 3 4 5 6 7 8 9]
[20 21 22 23 24 25 26 27 28 29]

[20 21 22 23 24 25 26 27 28 29]
[10 11 12 13 14 15 16 17 18 19]
[0 1 2 3 4 5 6 7 8 9]



### Applying what we found out
**Exercise**: Shuffle, repeat and batch (using `padded_batch`) our `train_ds`.

In [17]:
BATCHSIZE=64
BUFFERSIZE=2*64
train_ds=train_ds.shuffle(BUFFERSIZE).repeat().padded_batch(BATCHSIZE)

## Custom Layers

Keras allows you to define custom layers. This is useful for:
1. Combining multiple pre-defined layers into a single custom layer
2. Defining the layer weights explicitly
3. Modifying gradients

### "Custom" dense layer

**Exercise**: Re-implement a dense layer using a subclass of the `tf.keras.layers.Layer` class ([docs](https://keras.io/api/layers/base_layer/)).

In [18]:
class CustomDenseLayer(tf.keras.layers.Layer):

  def __init__(self, units=32):
      super().__init__()
      self.units = units

  def build(self, input_shape):
    w_init = tf.random_normal_initializer()
    self.w = tf.Variable(
        initial_value=w_init(shape=(input_shape[-1], self.units),
                             dtype='float32'),
        trainable=True)
    b_init = tf.zeros_initializer()
    self.b = tf.Variable(
        initial_value=b_init(shape=(self.units,), dtype='float32'),
        trainable=True)

  def call(self, inputs):
      return tf.matmul(inputs, self.w) + self.b

### \* Bonus: "Custom" dropout layer

**Exercise**: Re-implement a dropout layer using a subclass of the `Layer` class ([docs](https://keras.io/api/layers/base_layer/)).

In [73]:
class CustomDropoutLayer(tf.keras.layers.Layer):

  def __init__(self, rate):
      super().__init__()
      self.rate = rate

  def call(self, inputs, training=True):
    #if training is None:
    #    training = tf.keras.backend.learning_phase()
    random=tf.where(tf.random.uniform(inputs.shape,minval=0,maxval=1,dtype=tf.float32)<self.rate,0.0,1.0)
    dropped_out=inputs*random
    dropped_out=dropped_out/(1.0 - self.rate) # sum over all inputs should be unchanged
    return tf.where(training,dropped_out,inputs)

If you want to dive deeper into defining custom layers, see [this guide](https://keras.io/guides/making_new_layers_and_models_via_subclassing/).

## Custom Loss

Sometimes we are not fully happy with the predifined losses provided by Tensorflow/Keras. See the [docs](https://keras.io/api/losses/#creating-custom-losses) for how to create custom losses based on `y_true` and `y_pred`.

**Exercise**: Define a custom loss that computes a weighted crossentropy to rebalance the classes (label `0` and `1`).

In [69]:
def custom_loss(y_true, y_pred):
    return tf.reduce_sum(tf.where(y_true==tf.ones_like(y_true),
                           -tf.math.log(y_pred)*(tf.cast(tf.size(y_true)/tf.reduce_sum(y_true),tf.float32)),
                           -tf.math.log(1.0-y_pred)*(tf.cast(tf.size(y_true)/tf.reduce_sum(1-y_true),tf.float32))))

### The `add_loss()` API

Regularization losses are not just based on a comparison of `y_true` and `y_pred`. The `add_loss()` API allows to use layer weights in loss computation. See the [docs](https://keras.io/api/losses/#the-addloss-api).

## Custom Training Loops

By defining a subclass to `tf.keras.Model`, we can customize what is happening during `fit()` on a more fine-grained level than using callbacks.

In [21]:
class CustomModel(tf.keras.Model):
    def train_step(self, data):
        # Unpack the data. Its structure depends on your model and
        # on what you pass to `fit()`.
        x, y = data

        with tf.GradientTape() as tape:
            y_pred = self(x, training=True)  # Forward pass
            # Compute the loss value
            # (the loss function is configured in `compile()`)
            loss = self.compiled_loss(y, y_pred, regularization_losses=self.losses)

        # Compute gradients
        trainable_vars = self.trainable_variables
        gradients = tape.gradient(loss, trainable_vars)
        # Update weights
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))
        # Update metrics (includes the metric that tracks the loss)
        self.compiled_metrics.update_state(y, y_pred)
        # Return a dict mapping metric names to current value
        return {m.name: m.result() for m in self.metrics}

Further details on customizing the behavior of `fit()` can be found in [this guide](https://keras.io/guides/customizing_what_happens_in_fit/).

## TensorBoard

TensorBoard is a browser application that allows you to supervise the training progress. To access the generated logs, use the following command:

In [27]:
%load_ext tensorboard
%tensorboard --logdir ./logs
# alternatively: !tensorboard --logdir ./logs

Reusing TensorBoard on port 6006 (pid 12512), started 0:05:55 ago. (Use '!kill 12512' to kill it.)

We use a special callback to generate the data that TensorBoard will be visualizing:

In [23]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="./logs",update_freq=1)

## Assembling everything

In [None]:
inputs = tf.keras.Input(shape=(None,),batch_size=BATCHSIZE)
x = tf.keras.layers.Embedding(len(inverted_word_index), 16)(inputs)
x = tf.keras.layers.GlobalAveragePooling1D()(x)
x = CustomDenseLayer(16)(x)
x = tf.keras.layers.Activation("relu")(x)
x = CustomDropoutLayer(0.1)(x)
outputs = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(x)
model = CustomModel(inputs,outputs)
print(model.summary())

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
             loss=custom_loss)
model.fit(train_ds,validation_data=test_ds.batch(1),callbacks=[tensorboard_callback],epochs=10,steps_per_epoch=1000)