Unverified Commit 839bfaed authored by Patrick von Platen's avatar Patrick von Platen Committed by GitHub
Browse files

[Docs, Notebook] Include generation pipeline (#4295)

* add first text for generation

* add generation pipeline to usage

* Created using Colaboratory

* correct docstring

* finish
parent 2d184cb5
......@@ -404,48 +404,150 @@ Causal language modeling is the task of predicting the token following a sequenc
model only attends to the left context (tokens on the left of the mask). Such a training is particularly interesting
for generation tasks.
There is currently no pipeline to do causal language modeling/generation.
Usually, the next token is predicted by sampling from the logits of the last hidden state the model produces from the input sequence.
Here is an example using the tokenizer and model. leveraging the :func:`~transformers.PreTrainedModel.generate` method
to generate the tokens following the initial sequence in PyTorch, and creating a simple loop in TensorFlow.
Here is an example using the tokenizer and model and leveraging the :func:`~transformers.PreTrainedModel.top_k_top_p_filtering` method to sample the next token following an input sequence of tokens.
::
## PYTORCH CODE
from transformers import AutoModelWithLMHead, AutoTokenizer
from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
import torch
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelWithLMHead.from_pretrained("gpt2")
sequence = f"Hugging Face is based in DUMBO, New York City, and is"
sequence = f"Hugging Face is based in DUMBO, New York City, and "
input = tokenizer.encode(sequence, return_tensors="pt")
generated = model.generate(input, max_length=50, do_sample=True)
input_ids = tokenizer.encode(sequence, return_tensors="pt")
# get logits of last hidden state
next_token_logits = model(input_ids)[0][:, -1, :]
# filter
filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
# sample
probs = F.softmax(filtered_next_token_logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
generated = torch.cat([input_ids, next_token], dim=-1)
resulting_string = tokenizer.decode(generated.tolist()[0])
print(resulting_string)
## TENSORFLOW CODE
from transformers import TFAutoModelWithLMHead, AutoTokenizer
from transformers import TFAutoModelWithLMHead, AutoTokenizer, tf_top_k_top_p_filtering
import tensorflow as tf
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = TFAutoModelWithLMHead.from_pretrained("gpt2")
sequence = f"Hugging Face is based in DUMBO, New York City, and is"
input = tokenizer.encode(sequence, return_tensors="tf")
generated = model.generate(input, max_length=50, do_sample=True)
sequence = f"Hugging Face is based in DUMBO, New York City, and "
resulting_string = tokenizer.decode(generated.tolist()[0])
input_ids = tokenizer.encode(sequence, return_tensors="tf")
# get logits of last hidden state
next_token_logits = model(input_ids)[0][:, -1, :]
# filter
filtered_next_token_logits = tf_top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
# sample
next_token = tf.random.categorical(filtered_next_token_logits, dtype=tf.int32, num_samples=1)
generated = tf.concat([input_ids, next_token], axis=1)
resulting_string = tokenizer.decode(generated.numpy().tolist()[0])
print(resulting_string)
This outputs a (hopefully) coherent string from the original sequence, as the
:func:`~transformers.PreTrainedModel.generate` samples from a top_p/tok_k distribution:
This outputs a (hopefully) coherent next token following the original sequence, which is in our case is the word *has*:
::
Hugging Face is based in DUMBO, New York City, and is a live-action TV series based on the novel by John
Carpenter, and its producers, David Kustlin and Steve Pichar. The film is directed by!
Hugging Face is based in DUMBO, New York City, and has
In the next section, we show how this functionality is leveraged in :func:`~transformers.PreTrainedModel.generate` to generate multiple tokens up to a user-defined length.
Text Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In text generation (*a.k.a* *open-ended text generation*) the goal is to create a coherent portion of text that is a continuation from the given context. As an example, is it shown how *GPT-2* can be used in pipelines to generate text. As a default all models apply *Top-K* sampling when used in pipelines as configured in their respective configurations (see `gpt-2 config <https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json>`_ for example).
::
from transformers import pipeline
text_generator = pipeline("text-generation")
print(text_generator("As far as I am concerned, I will", max_length=50))
Here the model generates a random text with a total maximal length of *50* tokens from context *"As far as I am concerned, I will"*.
The default arguments of ``PreTrainedModel.generate()`` can directly be overriden in the pipeline as is shown above for the argument ``max_length``.
Here is an example for text generation using XLNet and its tokenzier.
::
## PYTORCH CODE
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
# Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
prompt = "Today the weather is really nice and I am planning on "
inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="pt")
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]
print(generated)
## TENSORFLOW CODE
from transformers import TFAutoModelWithLMHead, AutoTokenizer
model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
# Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
prompt = "Today the weather is really nice and I am planning on "
inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="tf")
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]
print(generated)
Text generation is currently possible with *GPT-2*, *OpenAi-GPT*, *CTRL*, *XLNet*, *Transfo-XL* and *Reformer* in PyTorch and for most models in Tensorflow as well. As can be seen in the example above *XLNet* and *Transfo-xl* often need to be padded to work well.
GPT-2 is usually a good choice for *open-ended text generation* because it was trained on millions on webpages with a causal language modeling objective.
For more information on how to apply different decoding strategies for text generation, please also refer to our generation blog post `here <https://huggingface.co/blog/how-to-generate>`_.
Named Entity Recognition
......
......@@ -30,7 +30,8 @@
},
"colab": {
"name": "03-pipelines.ipynb",
"provenance": []
"provenance": [],
"include_colab_link": true
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
......@@ -1504,6 +1505,251 @@
"left": null
}
},
"3c86415352574190b71e1fe5a15d36f1": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"state": {
"_view_name": "HBoxView",
"_dom_classes": [],
"_model_name": "HBoxModel",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.5.0",
"box_style": "",
"layout": "IPY_MODEL_dd2c9dd935754cf2802233053554c21c",
"_model_module": "@jupyter-widgets/controls",
"children": [
"IPY_MODEL_8ae3be32d9c845e59fdb1c47884d48aa",
"IPY_MODEL_4dea0031f3554752ad5aad01fe516a60"
]
}
},
"dd2c9dd935754cf2802233053554c21c": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"8ae3be32d9c845e59fdb1c47884d48aa": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_1efb96d931a446de92f1930b973ae846",
"_dom_classes": [],
"description": "Downloading: 100%",
"_model_name": "FloatProgressModel",
"bar_style": "success",
"max": 230,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 230,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_6a4f5aab5ba949fd860b5a35bba7db9c"
}
},
"4dea0031f3554752ad5aad01fe516a60": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"state": {
"_view_name": "HTMLView",
"style": "IPY_MODEL_4b02b2e964ad49af9f7ce7023131ceb8",
"_dom_classes": [],
"description": "",
"_model_name": "HTMLModel",
"placeholder": "​",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": " 230/230 [00:00&lt;00:00, 8.69kB/s]",
"_view_count": null,
"_view_module_version": "1.5.0",
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_0ae8a68c3668401da8d8a6d5ec9cac8f"
}
},
"1efb96d931a446de92f1930b973ae846": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "initial",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"6a4f5aab5ba949fd860b5a35bba7db9c": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"4b02b2e964ad49af9f7ce7023131ceb8": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"0ae8a68c3668401da8d8a6d5ec9cac8f": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"fd44cf6ab17e4b768b2e1d5cb8ce5af9": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
......@@ -2105,6 +2351,16 @@
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/huggingface/transformers/blob/generation_pipeline_docs/notebooks/03-pipelines.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
......@@ -2170,13 +2426,29 @@
},
"id": "4maAknWNrl_N",
"colab_type": "code",
"colab": {}
"colab": {
"base_uri": "https://localhost:8080/",
"height": 102
},
"outputId": "467e3cc8-a069-47da-8029-86e4142c7dde"
},
"source": [
"!pip install -q transformers"
],
"execution_count": 0,
"outputs": []
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"\u001b[K |████████████████████████████████| 645kB 4.4MB/s \n",
"\u001b[K |████████████████████████████████| 3.8MB 11.7MB/s \n",
"\u001b[K |████████████████████████████████| 890kB 51.5MB/s \n",
"\u001b[K |████████████████████████████████| 1.0MB 46.0MB/s \n",
"\u001b[?25h Building wheel for sacremoses (setup.py) ... \u001b[?25l\u001b[?25hdone\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
......@@ -2219,6 +2491,7 @@
},
"id": "AMRXHQw9rl_d",
"colab_type": "code",
"outputId": "a7a10851-b71e-4553-9afc-04066120410d",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 83,
......@@ -2232,14 +2505,13 @@
"ad84da685cf44abb90d17d9d2e023b48",
"a246f9eea2d7440cb979e728741d2e32"
]
},
"outputId": "a7a10851-b71e-4553-9afc-04066120410d"
}
},
"source": [
"nlp_sentence_classif = pipeline('sentiment-analysis')\n",
"nlp_sentence_classif('Such a nice weather outside !')"
],
"execution_count": 3,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2300,6 +2572,7 @@
},
"id": "B3BDRX_Krl_n",
"colab_type": "code",
"outputId": "a6b90b11-a272-4ecb-960d-4c682551b399",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 185,
......@@ -2313,14 +2586,13 @@
"405afa5bb8b840d8bc0850e02f593ce4",
"78c718e3d5fa4cb892217260bea6d540"
]
},
"outputId": "a6b90b11-a272-4ecb-960d-4c682551b399"
}
},
"source": [
"nlp_token_class = pipeline('ner')\n",
"nlp_token_class('Hugging Face is a French company based in New-York.')"
],
"execution_count": 4,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2384,6 +2656,7 @@
},
"id": "ND_8LzQKrl_u",
"colab_type": "code",
"outputId": "c59ae695-c465-4de6-fa6e-181d8f1a3992",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 117,
......@@ -2397,14 +2670,13 @@
"cd64e3f20b23483daa79712bde6622ea",
"67cbaa1f55d24e62ad6b022af36bca56"
]
},
"outputId": "c59ae695-c465-4de6-fa6e-181d8f1a3992"
}
},
"source": [
"nlp_qa = pipeline('question-answering')\n",
"nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')"
],
"execution_count": 5,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2470,6 +2742,7 @@
},
"id": "zpJQ2HXNrl_4",
"colab_type": "code",
"outputId": "3fb62e7a-25a6-4b06-ced8-51eb8aa6bf33",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 321,
......@@ -2483,14 +2756,13 @@
"a35703cc8ff44e93a8c0eb413caddc40",
"9df7014c99b343f3b178fa020ff56010"
]
},
"outputId": "3fb62e7a-25a6-4b06-ced8-51eb8aa6bf33"
}
},
"source": [
"nlp_fill = pipeline('fill-mask')\n",
"nlp_fill('Hugging Face is a French company based in ' + nlp_fill.tokenizer.mask_token)"
],
"execution_count": 6,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2560,11 +2832,11 @@
"metadata": {
"id": "8BaOgzi1u1Yc",
"colab_type": "code",
"outputId": "2168e437-cfba-4247-a38c-07f02f555c6e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 88
},
"outputId": "2168e437-cfba-4247-a38c-07f02f555c6e"
}
},
"source": [
"TEXT_TO_SUMMARIZE = \"\"\" \n",
......@@ -2590,7 +2862,7 @@
"summarizer = pipeline('summarization')\n",
"summarizer(TEXT_TO_SUMMARIZE)"
],
"execution_count": 7,
"execution_count": 0,
"outputs": [
{
"output_type": "stream",
......@@ -2631,6 +2903,7 @@
"metadata": {
"id": "8FwayP4nwV3Z",
"colab_type": "code",
"outputId": "66956816-c924-4718-fe58-cabef7d51974",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 83,
......@@ -2644,15 +2917,14 @@
"ad78042ee71a41fd989e4b4ce9d2e3c1",
"40c8d2617f3d4c84b923b140456fa5da"
]
},
"outputId": "66956816-c924-4718-fe58-cabef7d51974"
}
},
"source": [
"# English to French\n",
"translator = pipeline('translation_en_to_fr')\n",
"translator(\"HuggingFace is a French company that is based in New York City. HuggingFace's mission is to solve NLP one commit at a time\")"
],
"execution_count": 8,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2696,6 +2968,7 @@
"metadata": {
"colab_type": "code",
"id": "ra0-WfznwoIW",
"outputId": "278a3d5f-cc42-40bc-a9db-c92ec5a3a2f0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 83,
......@@ -2709,15 +2982,14 @@
"4486f8a2efc34b9aab3864eb5ad2ba48",
"d6228324f3444aa6bd1323d65ae4ff75"
]
},
"outputId": "278a3d5f-cc42-40bc-a9db-c92ec5a3a2f0"
}
},
"source": [
"# English to German\n",
"translator = pipeline('translation_en_to_de')\n",
"translator(\"The history of natural language processing (NLP) generally started in the 1950s, although work can be found from earlier periods.\")"
],
"execution_count": 9,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2756,6 +3028,89 @@
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qPUpg0M8hCtB",
"colab_type": "text"
},
"source": [
"## 7. Text Generation\n",
"\n",
"Text generation is currently supported by GPT-2, OpenAi-GPT, TransfoXL, XLNet, CTRL and Reformer."
]
},
{
"cell_type": "code",
"metadata": {
"id": "5pKfxTxohXuZ",
"colab_type": "code",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 120,
"referenced_widgets": [
"3c86415352574190b71e1fe5a15d36f1",
"dd2c9dd935754cf2802233053554c21c",
"8ae3be32d9c845e59fdb1c47884d48aa",
"4dea0031f3554752ad5aad01fe516a60",
"1efb96d931a446de92f1930b973ae846",
"6a4f5aab5ba949fd860b5a35bba7db9c",
"4b02b2e964ad49af9f7ce7023131ceb8",
"0ae8a68c3668401da8d8a6d5ec9cac8f"
]
},
"outputId": "8705f6b4-2413-4ac6-f72d-e5ecce160662"
},
"source": [
"text_generator = pipeline(\"text-generation\")\n",
"text_generator(\"Today is a beautiful day and I will\")"
],
"execution_count": 5,
"outputs": [
{
"output_type": "display_data",
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3c86415352574190b71e1fe5a15d36f1",
"version_minor": 0,
"version_major": 2
},
"text/plain": [
"HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
]
},
"metadata": {
"tags": []
}
},
{
"output_type": "stream",
"text": [
"\n"
],
"name": "stdout"
},
{
"output_type": "stream",
"text": [
"Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence\n"
],
"name": "stderr"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[{'generated_text': 'Today is a beautiful day and I will celebrate my birthday!\"\\n\\nThe mother told CNN the two had planned their meal together. After dinner, she added that she and I walked down the street and stopped at a diner near her home. \"He'}]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
......@@ -2763,7 +3118,7 @@
"colab_type": "text"
},
"source": [
"## 7. Projection - Features Extraction "
"## 8. Projection - Features Extraction "
]
},
{
......@@ -2775,6 +3130,7 @@
},
"id": "O4SjR1QQrl__",
"colab_type": "code",
"outputId": "2ce966d5-7a89-4488-d48f-626d1c2a8222",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 83,
......@@ -2788,8 +3144,7 @@
"31d97ecf78fa412c99e6659196d82828",
"c6be5d48ec3c4c799d1445607e5f1ac6"
]
},
"outputId": "2ce966d5-7a89-4488-d48f-626d1c2a8222"
}
},
"source": [
"import numpy as np\n",
......@@ -2797,7 +3152,7 @@
"output = nlp_features('Hugging Face is a French company based in Paris')\n",
"np.array(output).shape # (Samples, Tokens, Vector Size)\n"
],
"execution_count": 10,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2861,6 +3216,7 @@
},
"id": "yFlBPQHtrmAH",
"colab_type": "code",
"outputId": "03cc3207-a7e8-49fd-904a-63a7a1d0eb7a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 116,
......@@ -2872,8 +3228,7 @@
"62b10ca525cc4ac68f3a006434eb7416",
"211109537fbe4e60b89a238c89db1346"
]
},
"outputId": "03cc3207-a7e8-49fd-904a-63a7a1d0eb7a"
}
},
"source": [
"task = widgets.Dropdown(\n",
......@@ -2906,7 +3261,7 @@
"input.on_submit(forward)\n",
"display(task, input)"
],
"execution_count": 11,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......@@ -2958,6 +3313,7 @@
},
"id": "GCoKbBTYrmAN",
"colab_type": "code",
"outputId": "57c3a647-160a-4b3a-e852-e7a1daf1294a",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143,
......@@ -2969,8 +3325,7 @@
"d305ba1662e3466c93ab5cca7ebf8f33",
"879f7a3747ad455d810c7a29918648ee"
]
},
"outputId": "57c3a647-160a-4b3a-e852-e7a1daf1294a"
}
},
"source": [
"context = widgets.Textarea(\n",
......@@ -2995,7 +3350,7 @@
"query.on_submit(forward)\n",
"display(context, query)"
],
"execution_count": 12,
"execution_count": 0,
"outputs": [
{
"output_type": "display_data",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment