02-transformers.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "YKdSeUmVSXah",
    "pycharm": {
     "is_executing": false,
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Introduction\n",
    "The transformers library is an open-source, community-based repository to train, use and share models based on \n",
    "the Transformer architecture [(Vaswani & al., 2017)](https://arxiv.org/abs/1706.03762) such as Bert [(Devlin & al., 2018)](https://arxiv.org/abs/1810.04805),\n",
    "Roberta [(Liu & al., 2019)](https://arxiv.org/abs/1907.11692), GPT2 [(Radford & al., 2019)](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf),\n",
    "XLNet [(Yang & al., 2019)](https://arxiv.org/abs/1906.08237), etc. \n",
    "\n",
    "Along with the models, the library contains multiple variations of each of them for a large variety of \n",
    "downstream-tasks like **Named Entity Recognition (NER)**, **Sentiment Analysis**, \n",
    "**Language Modeling**, **Question Answering** and so on.\n",
    "\n",
    "## Before Transformer\n",
    "\n",
    "Back to 2017, most of the people using Neural Networks when working on Natural Language Processing were relying on \n",
    "sequential processing of the input through [Recurrent Neural Network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network).\n",
    "\n",
    "![rnn](http://colah.github.io/posts/2015-09-NN-Types-FP/img/RNN-general.png)   \n",
    "\n",
    "RNNs were performing well on large variety of tasks involving sequential dependency over the input sequence. \n",
    "However, this sequentially-dependent process had issues modeling very long range dependencies and \n",
    "was not well suited for the kind of hardware we're currently leveraging due to bad parallelization capabilities. \n",
    "\n",
    "Some extensions were provided by the academic community, such as Bidirectional RNN ([Schuster & Paliwal., 1997](https://www.researchgate.net/publication/3316656_Bidirectional_recurrent_neural_networks), [Graves & al., 2005](https://mediatum.ub.tum.de/doc/1290195/file.pdf)), \n",
    "which can be seen as a concatenation of two sequential process, one going forward, the other one going backward over the sequence input.\n",
    "\n",
    "![birnn](https://miro.medium.com/max/764/1*6QnPUSv_t9BY9Fv8_aLb-Q.png)\n",
    "\n",
    "\n",
    "And also, the Attention mechanism, which introduced a good improvement over \"raw\" RNNs by giving \n",
    "a learned, weighted-importance to each element in the sequence, allowing the model to focus on important elements.\n",
    "\n",
    "![attention_rnn](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2017/08/Example-of-Attention.png)  \n",
    "\n",
    "## Then comes the Transformer  \n",
    "\n",
    "The Transformers era originally started from the work of [(Vaswani & al., 2017)](https://arxiv.org/abs/1706.03762) who\n",
    "demonstrated its superiority over [Recurrent Neural Network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network)\n",
    "on translation tasks but it quickly extended to almost all the tasks RNNs were State-of-the-Art at that time.\n",
    "\n",
    "One advantage of Transformer over its RNN counterpart was its non sequential attention model. Remember, the RNNs had to\n",
    "iterate over each element of the input sequence one-by-one and carry an \"updatable-state\" between each hop. With Transformer, the model is able to look at every position in the sequence, at the same time, in one operation.\n",
    "\n",
    "For a deep-dive into the Transformer architecture, [The Annotated Transformer](https://nlp.seas.harvard.edu/2018/04/03/attention.html#encoder-and-decoder-stacks) \n",
    "will drive you along all the details of the paper.\n",
    "\n",
    "![transformer-encoder-decoder](https://nlp.seas.harvard.edu/images/the-annotated-transformer_14_0.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TFHTP6CFSXai",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Getting started with transformers\n",
    "\n",
    "For the rest of this notebook, we will use the [BERT (Devlin & al., 2018)](https://arxiv.org/abs/1810.04805) architecture, as it's the most simple and there are plenty of content about it\n",
    "over the internet, it will be easy to dig more over this architecture if you want to.\n",
    "\n",
    "The transformers library allows you to benefits from large, pretrained language models without requiring a huge and costly computational\n",
    "infrastructure. Most of the State-of-the-Art models are provided directly by their author and made available in the library \n",
    "in PyTorch and TensorFlow in a transparent and interchangeable way. \n",
    "\n",
    "If you're executing this notebook in Colab, you will need to install the transformers library. You can do so with this command:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "KnT3Jn6fSXai",
    "pycharm": {
     "is_executing": false,
     "name": "#%% code\n"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# !pip install transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UIQGDTIDSXai",
    "outputId": "9851454a-c898-4fba-a389-9b16462a27c1",
    "pycharm": {
     "is_executing": false,
     "name": "#%% code\n"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch.autograd.grad_mode.set_grad_enabled at 0x7ff0cc2a2c50>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "from transformers import AutoModel, AutoTokenizer, BertTokenizer\n",
    "\n",
    "torch.set_grad_enabled(False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "id": "1xMDTHQXSXai",
    "pycharm": {
     "is_executing": false,
     "name": "#%% code\n"
    }
   },
   "outputs": [],
   "source": [
    "# Store the model we want to use\n",
    "MODEL_NAME = \"bert-base-cased\"\n",
    "\n",
    "# We need to create the model and tokenizer\n",
    "model = AutoModel.from_pretrained(MODEL_NAME)\n",
    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "l6EcynhYSXai",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "With only the above two lines of code, you're ready to use a BERT pre-trained model. \n",
    "The tokenizers will allow us to map a raw textual input to a sequence of integers representing our textual input\n",
    "in a way the model can manipulate. Since we will be using a PyTorch model, we ask the tokenizer to return to us PyTorch tensors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "input_ids:\n",
      "\ttensor([[ 101, 1188, 1110, 1126, 7758, 1859,  102]])\n",
      "token_type_ids:\n",
      "\ttensor([[0, 0, 0, 0, 0, 0, 0]])\n",
      "attention_mask:\n",
      "\ttensor([[1, 1, 1, 1, 1, 1, 1]])\n"
     ]
    }
   ],
   "source": [
    "tokens_pt = tokenizer(\"This is an input example\", return_tensors=\"pt\")\n",
    "for key, value in tokens_pt.items():\n",
    "    print(\"{}:\\n\\t{}\".format(key, value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The tokenizer automatically converted our input to all the inputs expected by the model. It generated some additional tensors on top of the IDs: \n",
    "\n",
    "- token_type_ids: This tensor will map every tokens to their corresponding segment (see below).\n",
    "- attention_mask: This tensor is used to \"mask\" padded values in a batch of sequence with different lengths (see below).\n",
    "\n",
    "You can check our [glossary](https://huggingface.co/transformers/glossary.html) for more information about each of those keys. \n",
    "\n",
    "We can just feed this directly into our model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "XgkFg52fSXai",
    "outputId": "94b569d4-5415-4327-f39e-c9541b0a53e0",
    "pycharm": {
     "is_executing": false,
     "name": "#%% code\n"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Token wise output: torch.Size([1, 7, 768]), Pooled output: torch.Size([1, 768])\n"
     ]
    }
   ],
   "source": [
    "outputs = model(**tokens_pt)\n",
    "last_hidden_state = outputs.last_hidden_state\n",
    "pooler_output = outputs.pooler_output\n",
    "\n",
    "print(\"Token wise output: {}, Pooled output: {}\".format(last_hidden_state.shape, pooler_output.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lBbvwNKXSXaj",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "As you can see, BERT outputs two tensors:\n",
    " - One with the generated representation for every token in the input `(1, NB_TOKENS, REPRESENTATION_SIZE)`\n",
    " - One with an aggregated representation for the whole input `(1, REPRESENTATION_SIZE)`\n",
    " \n",
    "The first, token-based, representation can be leveraged if your task requires to keep the sequence representation and you\n",
    "want to operate at a token-level. This is particularly useful for Named Entity Recognition and Question-Answering.\n",
    "\n",
    "The second, aggregated, representation is especially useful if you need to extract the overall context of the sequence and don't\n",
    "require a fine-grained token-level. This is the case for Sentiment-Analysis of the sequence or Information Retrieval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Pl2HIcwDSXal",
    "outputId": "22e5d010-47a9-4a12-a67d-208e5016157e",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Single segment token (str): ['[CLS]', 'This', 'is', 'a', 'sample', 'input', '[SEP]']\n",
      "Single segment token (int): [101, 1188, 1110, 170, 6876, 7758, 102]\n",
      "Single segment type       : [0, 0, 0, 0, 0, 0, 0]\n",
      "\n",
      "Multi segment token (str): ['[CLS]', 'This', 'is', 'segment', 'A', '[SEP]', 'This', 'is', 'segment', 'B', '[SEP]']\n",
      "Multi segment token (int): [101, 1188, 1110, 6441, 138, 102, 1188, 1110, 6441, 139, 102]\n",
      "Multi segment type       : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\n"
     ]
    }
   ],
   "source": [
    "# Single segment input\n",
    "single_seg_input = tokenizer(\"This is a sample input\")\n",
    "\n",
    "# Multiple segment input\n",
    "multi_seg_input = tokenizer(\"This is segment A\", \"This is segment B\")\n",
    "\n",
    "print(\"Single segment token (str): {}\".format(tokenizer.convert_ids_to_tokens(single_seg_input['input_ids'])))\n",
    "print(\"Single segment token (int): {}\".format(single_seg_input['input_ids']))\n",
    "print(\"Single segment type       : {}\".format(single_seg_input['token_type_ids']))\n",
    "\n",
    "# Segments are concatened in the input to the model, with \n",
    "print()\n",
    "print(\"Multi segment token (str): {}\".format(tokenizer.convert_ids_to_tokens(multi_seg_input['input_ids'])))\n",
    "print(\"Multi segment token (int): {}\".format(multi_seg_input['input_ids']))\n",
    "print(\"Multi segment type       : {}\".format(multi_seg_input['token_type_ids']))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "1NtvWOgzSXam",
    "outputId": "e66c47d0-e106-408d-d01c-9ac194ca3ec6",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tokens (int)      : [101, 1188, 1110, 170, 6876, 102, 0, 0]\n",
      "Tokens (str)      : ['[CLS]', 'This', 'is', 'a', 'sample', '[SEP]', '[PAD]', '[PAD]']\n",
      "Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 0, 0]\n",
      "\n",
      "Tokens (int)      : [101, 1188, 1110, 1330, 2039, 6876, 3087, 102]\n",
      "Tokens (str)      : ['[CLS]', 'This', 'is', 'another', 'longer', 'sample', 'text', '[SEP]']\n",
      "Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 1, 1]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Padding highlight\n",
    "tokens = tokenizer(\n",
    "    [\"This is a sample\", \"This is another longer sample text\"], \n",
    "    padding=True  # First sentence will have some PADDED tokens to match second sequence length\n",
    ")\n",
    "\n",
    "for i in range(2):\n",
    "    print(\"Tokens (int)      : {}\".format(tokens['input_ids'][i]))\n",
    "    print(\"Tokens (str)      : {}\".format([tokenizer.convert_ids_to_tokens(s) for s in tokens['input_ids'][i]]))\n",
    "    print(\"Tokens (attn_mask): {}\".format(tokens['attention_mask'][i]))\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vkRYm2HESXan"
   },
   "source": [
    "## Frameworks interoperability\n",
    "\n",
    "One of the most powerfull feature of transformers is its ability to seamlessly move from PyTorch to Tensorflow\n",
    "without pain for the user.\n",
    "\n",
    "We provide some convenient methods to load TensorFlow pretrained weight insinde a PyTorch model and opposite."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "id": "Kubwm-wJSXan",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3b971be3639d4fedb02778fb5c6898a0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=526681800.0, style=ProgressStyle(descri…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Some layers from the model checkpoint at bert-base-cased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']\n",
      "- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
      "- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
      "All the layers of TFBertModel were initialized from the model checkpoint at bert-base-cased.\n",
      "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.\n"
     ]
    }
   ],
   "source": [
    "from transformers import TFBertModel, BertModel\n",
    "\n",
    "# Let's load a BERT model for TensorFlow and PyTorch\n",
    "model_tf = TFBertModel.from_pretrained('bert-base-cased')\n",
    "model_pt = BertModel.from_pretrained('bert-base-cased')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "lJ13tlzOSXan",
    "outputId": "1e4ac151-a8fc-4b34-946a-da0bc44ed0e6",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "last_hidden_state differences: 1.2933e-05\n",
      "pooler_output differences: 2.9691e-06\n"
     ]
    }
   ],
   "source": [
    "# transformers generates a ready to use dictionary with all the required parameters for the specific framework.\n",
    "input_tf = tokenizer(\"This is a sample input\", return_tensors=\"tf\")\n",
    "input_pt = tokenizer(\"This is a sample input\", return_tensors=\"pt\")\n",
    "\n",
    "# Let's compare the outputs\n",
    "output_tf, output_pt = model_tf(input_tf), model_pt(**input_pt)\n",
    "\n",
    "# Models outputs 2 values (The value for each tokens, the pooled representation of the input sentence)\n",
    "# Here we compare the output differences between PyTorch and TensorFlow.\n",
    "for name in [\"last_hidden_state\", \"pooler_output\"]:\n",
    "    print(\"{} differences: {:.5}\".format(name, (output_tf[name].numpy() - output_pt[name].numpy()).sum()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "CQf_fpApSXao",
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Want it lighter? Faster? Let's talk distillation! \n",
    "\n",
    "One of the main concerns when using these Transformer based models is the computational power they require. All over this notebook we are using BERT model as it can be run on common machines but that's not the case for all of the models.\n",
    "\n",
    "For example, Google released a few months ago **T5** an Encoder/Decoder architecture based on Transformer and available in `transformers` with no more than 11 billions parameters. Microsoft also recently entered the game with **Turing-NLG** using 17 billions parameters. This kind of model requires tens of gigabytes to store the weights and a tremendous compute infrastructure to run such models which makes it impracticable for the common man !\n",
    "\n",
    "![transformers-parameters](https://github.com/huggingface/notebooks/blob/master/examples/images/model_parameters.png?raw=true)\n",
    "\n",
    "With the goal of making Transformer-based NLP accessible to everyone we @huggingface developed models that take advantage of a training process called **Distillation** which allows us to drastically reduce the resources needed to run such models with almost zero drop in performances.\n",
    "\n",
    "Going over the whole Distillation process is out of the scope of this notebook, but if you want more information on the subject you may refer to [this Medium article written by my colleague Victor SANH, author of DistilBERT paper](https://medium.com/huggingface/distilbert-8cf3380435b5), you might also want to directly have a look at the paper [(Sanh & al., 2019)](https://arxiv.org/abs/1910.01108)\n",
    "\n",
    "Of course, in `transformers` we have distilled some models and made them available directly in the library ! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 185,
     "referenced_widgets": [
      "fcffccb427714665bec7d621d00d4ce3",
      "7aa02ef05fe64489ad6c969dd92d1b07",
      "817f53e3fa5c43e29ad1b4410c3df7db",
      "50a441d7a43c4a809a09505b36e83375",
      "497ba6a585a147459f1346c0661d5c94",
      "a18c5319739141af9a255bccf25f6884",
      "cf319210c8134cdba487525c49e4813b",
      "f1f0272d9bea4e9fad3be8646b45d629",
      "530b39d56f6b4e0caae3317855c4bcf4",
      "c5e735694f2c4813a1d6f0d867119f67",
      "8d53b8dc213f405d8187f3c1f005826d",
      "d492afe626804d95a5cfac0550913190",
      "a657a312068b43529afed2050bce572f",
      "fe230ff13a82400f97cf6f292e8851ba",
      "be97cf2269d748f3b1a916b5376f7736",
      "74bd90a09da74db5bcbbe86f044bd664"
     ]
    },
    "id": "wfxMOXb-SXao",
    "outputId": "fa667556-fbf2-4c86-fc7e-9e3d3ec9da88",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "fcffccb427714665bec7d621d00d4ce3",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=411.0, style=ProgressStyle(description_…"
      ]
     },
     "metadata": {
      "tags": []
     },
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "530b39d56f6b4e0caae3317855c4bcf4",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=263273408.0, style=ProgressStyle(descri…"
      ]
     },
     "metadata": {
      "tags": []
     },
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CPU times: user 64.4 ms, sys: 0 ns, total: 64.4 ms\n",
      "Wall time: 72.9 ms\n",
      "CPU times: user 130 ms, sys: 124 µs, total: 130 ms\n",
      "Wall time: 131 ms\n"
     ]
    }
   ],
   "source": [
    "from transformers import DistilBertModel\n",
    "\n",
    "bert_distil = DistilBertModel.from_pretrained('distilbert-base-cased')\n",
    "input_pt = tokenizer(\n",
    "    'This is a sample input to demonstrate performance of distiled models especially inference time', \n",
    "    return_tensors=\"pt\"\n",
    ")\n",
    "\n",
    "\n",
    "%time _ = bert_distil(input_pt['input_ids'])\n",
    "%time _ = model_pt(input_pt['input_ids'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "7lSIc7FbSXao"
   },
   "source": [
    "## Community provided models\n",
    "\n",
    "Last but not least, earlier in this notebook we introduced Hugging Face `transformers` as a repository for the NLP community to exchange pretrained models. We wanted to highlight this features and all the possibilities it offers for the end-user.\n",
    "\n",
    "To leverage community pretrained models, just provide the organisation name and name of the model to `from_pretrained` and it will do all the magic for you ! \n",
    "\n",
    "\n",
    "We currently have more 50 models provided by the community and more are added every day, don't hesitate to give it a try !"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "cxLYnadGSXao",
    "outputId": "70ab584a-e795-490a-8c6a-06e034b3df3d",
    "pycharm": {
     "is_executing": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tokens (int)      : [102, 12272, 9355, 5746, 30881, 215, 261, 5945, 4118, 212, 2414, 153, 1942, 232, 3532, 566, 103]\n",
      "Tokens (str)      : ['[CLS]', 'Hug', '##ging', 'Fac', '##e', 'ist', 'eine', 'französische', 'Firma', 'mit', 'Sitz', 'in', 'New', '-', 'York', '.', '[SEP]']\n",
      "Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]\n",
      "\n",
      "Token wise output: torch.Size([1, 17, 768]), Pooled output: torch.Size([1, 768])\n"
     ]
    }
   ],
   "source": [
    "# Let's load German BERT from the Bavarian State Library\n",
    "de_bert = BertModel.from_pretrained(\"dbmdz/bert-base-german-cased\")\n",
    "de_tokenizer = BertTokenizer.from_pretrained(\"dbmdz/bert-base-german-cased\")\n",
    "\n",
    "de_input = de_tokenizer(\n",
    "    \"Hugging Face ist eine französische Firma mit Sitz in New-York.\",\n",
    "    return_tensors=\"pt\"\n",
    ")\n",
    "print(\"Tokens (int)      : {}\".format(de_input['input_ids'].tolist()[0]))\n",
    "print(\"Tokens (str)      : {}\".format([de_tokenizer.convert_ids_to_tokens(s) for s in de_input['input_ids'].tolist()[0]]))\n",
    "print(\"Tokens (attn_mask): {}\".format(de_input['attention_mask'].tolist()[0]))\n",
    "print()\n",
    "\n",
    "outputs_de = de_bert(**de_input)\n",
    "last_hidden_state_de = outputs_de.last_hidden_state\n",
    "pooler_output_de = outputs_de.pooler_output\n",
    "\n",
    "print(\"Token wise output: {}, Pooled output: {}\".format(last_hidden_state_de.shape, pooler_output_de.shape))"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "02-transformers.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "497ba6a585a147459f1346c0661d5c94": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": "initial"
     }
    },
    "50a441d7a43c4a809a09505b36e83375": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_f1f0272d9bea4e9fad3be8646b45d629",
      "placeholder": "",
      "style": "IPY_MODEL_cf319210c8134cdba487525c49e4813b",
      "value": " 411/411 [00:19&lt;00:00, 21.5B/s]"
     }
    },
    "530b39d56f6b4e0caae3317855c4bcf4": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_8d53b8dc213f405d8187f3c1f005826d",
       "IPY_MODEL_d492afe626804d95a5cfac0550913190"
      ],
      "layout": "IPY_MODEL_c5e735694f2c4813a1d6f0d867119f67"
     }
    },
    "74bd90a09da74db5bcbbe86f044bd664": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "7aa02ef05fe64489ad6c969dd92d1b07": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "817f53e3fa5c43e29ad1b4410c3df7db": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "Downloading: 100%",
      "description_tooltip": null,
      "layout": "IPY_MODEL_a18c5319739141af9a255bccf25f6884",
      "max": 411,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_497ba6a585a147459f1346c0661d5c94",
      "value": 411
     }
    },
    "8d53b8dc213f405d8187f3c1f005826d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "Downloading: 100%",
      "description_tooltip": null,
      "layout": "IPY_MODEL_fe230ff13a82400f97cf6f292e8851ba",
      "max": 263273408,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_a657a312068b43529afed2050bce572f",
      "value": 263273408
     }
    },
    "a18c5319739141af9a255bccf25f6884": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "a657a312068b43529afed2050bce572f": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": "initial"
     }
    },
    "be97cf2269d748f3b1a916b5376f7736": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "c5e735694f2c4813a1d6f0d867119f67": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "cf319210c8134cdba487525c49e4813b": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "d492afe626804d95a5cfac0550913190": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_74bd90a09da74db5bcbbe86f044bd664",
      "placeholder": "",
      "style": "IPY_MODEL_be97cf2269d748f3b1a916b5376f7736",
      "value": " 263M/263M [00:06&lt;00:00, 43.5MB/s]"
     }
    },
    "f1f0272d9bea4e9fad3be8646b45d629": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "fcffccb427714665bec7d621d00d4ce3": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_817f53e3fa5c43e29ad1b4410c3df7db",
       "IPY_MODEL_50a441d7a43c4a809a09505b36e83375"
      ],
      "layout": "IPY_MODEL_7aa02ef05fe64489ad6c969dd92d1b07"
     }
    },
    "fe230ff13a82400f97cf6f292e8851ba": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}