VideoSummary.ipynb 11.6 KB
Newer Older
Rayyyyy's avatar
Rayyyyy committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "30b1235c-2f3e-4628-9c90-30385f741550",
   "metadata": {},
   "source": [
    "## This demo app shows:\n",
    "* How to use LangChain's YoutubeLoader to retrieve the caption in a YouTube video\n",
    "* How to ask Llama 3 to summarize the content (per the Llama's input size limit) of the video in a naive way using LangChain's stuff method\n",
    "* How to bypass the limit of Llama 3's max input token size by using a more sophisticated way using LangChain's map_reduce and refine methods - see [here](https://python.langchain.com/docs/use_cases/summarization) for more info"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c866f6be",
   "metadata": {},
   "source": [
    "We start by installing the necessary packages:\n",
    "- [youtube-transcript-api](https://pypi.org/project/youtube-transcript-api/) API to get transcript/subtitles of a YouTube video\n",
    "- [langchain](https://python.langchain.com/docs/get_started/introduction) provides necessary RAG tools for this demo\n",
    "- [tiktoken](https://github.com/openai/tiktoken) BytePair Encoding tokenizer\n",
    "- [pytube](https://pytube.io/en/latest/) Utility for downloading YouTube videos\n",
    "\n",
    "**Note** This example uses OctoAI to host the Llama 3 model. If you have not set up/or used OctoAI before, we suggest you take a look at the [HelloLlamaCloud](HelloLlamaCloud.ipynb) example for information on how to set up OctoAI before continuing with this example.\n",
    "If you do not want to use OctoAI, you will need to make some changes to this notebook as you go along."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02482167",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install langchain==0.1.19 youtube-transcript-api tiktoken pytube"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "af3069b1",
   "metadata": {},
   "source": [
    "Let's first load a long (2:47:16) YouTube video (Lex Fridman with Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI) transcript using the YoutubeLoader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e4b8598",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import YoutubeLoader\n",
    "\n",
    "loader = YoutubeLoader.from_youtube_url(\n",
    "    \"https://www.youtube.com/watch?v=5t1vTLU7s40\", add_video_info=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dca32ebb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the youtube video caption into Documents\n",
    "docs = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "afba128f-b7fd-4b2f-873f-9b5163455d54",
   "metadata": {},
   "outputs": [],
   "source": [
    "# check the docs length and content\n",
    "len(docs[0].page_content), docs[0].page_content[:300]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4af7cc16",
   "metadata": {},
   "source": [
    "You should see 142689 returned for the doc character length, which is about 30k words or 40k tokens, beyond the 8k context length limit of Llama 3. You'll see how to summarize a text longer than the limit.\n",
    "\n",
    "**Note**: We are using OctoAI in this example to host our Llama 3 model so you will need to get a OctoAI token.\n",
    "\n",
    "To get the OctoAI token:\n",
    "\n",
    "- You will need to first sign in with OctoAI with your github account\n",
    "- Then create a free API token [here](https://octo.ai/docs/getting-started/how-to-create-an-octoai-access-token) that you can use for a while (a month or $10 in OctoAI credits, whichever one runs out first)\n",
    "\n",
    "After the free trial ends, you will need to enter billing info to continue to use Llama2 hosted on OctoAI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab3ac00e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# enter your OctoAI API token, or you can use local Llama. See README for more info\n",
    "from getpass import getpass\n",
    "import os\n",
    "\n",
    "OCTOAI_API_TOKEN = getpass()\n",
    "os.environ[\"OCTOAI_API_TOKEN\"] = OCTOAI_API_TOKEN"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b911efd",
   "metadata": {},
   "source": [
    "Next we call the Llama 3 model from OctoAI. In this example we will use the Llama 3 8b instruct model. You can find more on Llama models on the [OctoAI text generation solution page](https://octoai.cloud/text).\n",
    "\n",
    "At the time of writing this notebook the following Llama models are available on OctoAI:\n",
    "* meta-llama-3-8b-instruct\n",
    "* meta-llama-3-70b-instruct\n",
    "* codellama-7b-instruct\n",
    "* codellama-13b-instruct\n",
    "* codellama-34b-instruct\n",
    "* llama-2-13b-chat\n",
    "* llama-2-70b-chat\n",
    "* llamaguard-7b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "adf8cf3d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.llms.octoai_endpoint import OctoAIEndpoint\n",
    "\n",
    "llama3_8b = \"meta-llama-3-8b-instruct\"\n",
    "llm = OctoAIEndpoint(\n",
    "    model=llama3_8b,\n",
    "    max_tokens=500,\n",
    "    temperature=0.01\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e3baa56",
   "metadata": {},
   "source": [
    "Once everything is set up, we prompt Llama 3 to summarize the first 4000 characters of the transcript for us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51739e11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.prompts import PromptTemplate\n",
    "from langchain.chains import LLMChain\n",
    "\n",
    "prompt_template = \"Give me a summary of the text below: {text}?\"\n",
    "prompt = PromptTemplate(\n",
    "    input_variables=[\"text\"], template=prompt_template\n",
    ")\n",
    "chain = prompt | llm\n",
    "\n",
    "# be careful of the input text length sent to LLM\n",
    "text = docs[0].page_content[:10000]\n",
    "summary = chain.invoke(text)\n",
    "\n",
    "# Note: The context length of 8k tokens in Llama 3 is roughly 6000-7000 words or 32k characters\n",
    "print(summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ad1881a",
   "metadata": {},
   "source": [
    "If you try the whole content which has over 142k characters, about 40k tokens, which exceeds the 8k limit, you'll get an empty result (OctoAI used to return an error \"BadRequestError: The token count (32704) of your prompt (32204) + your setting of `max_tokens` (500) cannot exceed this model's context length (8192).\")."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61a088b7-cba2-4603-ba7c-f6673bfaa3cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "# this will generate an empty result because the input exceeds Llama 3's context length limit\n",
    "text = docs[0].page_content\n",
    "summary = llm.invoke(f\"Give me a summary of the text below: {text}.\")\n",
    "print(summary)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e112845f-de16-4c2f-8afe-6cca31f6fa38",
   "metadata": {},
   "source": [
    "To fix this, you can use LangChain's load_summarize_chain method (detail [here](https://python.langchain.com/docs/use_cases/summarization)).\n",
    "\n",
    "First you'll create splits or sub-documents of the original content, then use the LangChain's `load_summarize_chain` with the `refine` or `map_reduce type`.\n",
    "\n",
    "Because this may involve many calls to Llama 3, it'd be great to set up a quick free LangChain API key [here](https://smith.langchain.com/settings), run the following cell to set up necessary environment variables, and check the logs on [LangSmith](https://docs.smith.langchain.com/) during and after the run."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55586a09-db53-4741-87d8-fdfb40d9f8cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = \"your_langchain_api_key\"\n",
    "os.environ[\"LANGCHAIN_API_KEY\"] = \"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\"\n",
    "os.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\n",
    "os.environ[\"LANGCHAIN_PROJECT\"] = \"Video Summary with Llama 3\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bfee2d3-3afe-41d9-8968-6450cc23f493",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "\n",
    "# we need to split the long input text\n",
    "text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(\n",
    "    chunk_size=1000, chunk_overlap=0\n",
    ")\n",
    "split_docs = text_splitter.split_documents(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "682799a8-3846-41b1-a908-02ab5ac3ecee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# check the splitted docs lengths\n",
    "len(split_docs), len(docs), len(split_docs[0].page_content), len(docs[0].page_content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aecf6328",
   "metadata": {},
   "source": [
    "The `refine` type implements the following steps under the hood:\n",
    "\n",
    "1. Call Llama 3 on the first sub-document to generate a concise summary;\n",
    "2. Loop over each subsequent sub-document, pass the previous summary with the current sub-document to generate a refined new summary;\n",
    "3. Return the final summary generated on the final sub-document as the final answer - the summary of the whole content.\n",
    "\n",
    "An example prompt template for each call in step 2, which gets used under the hood by LangChain, is:\n",
    "\n",
    "```\n",
    "Your job is to produce a final summary.\n",
    "We have provided an existing summary up to a certain point:\n",
    "<previous_summary>\n",
    "Refine the existing summary (only if needed) with some more content below:\n",
    "<new_content>\n",
    "```\n",
    "\n",
    "**Note**: The following call will make 33 calls to Llama 3 and genereate the final summary in about 10 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3be1236a-fe6a-4bf6-983f-0e72dde39fee",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chains.summarize import load_summarize_chain\n",
    "\n",
    "chain = load_summarize_chain(llm, chain_type=\"refine\")\n",
    "print(chain.run(split_docs))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "752f2b71-5fd6-4a8a-ac09-371bce1db703",
   "metadata": {},
   "source": [
    "You can also set `chain_type` to `map_reduce` to generate the summary of the entire content using the standard map and reduce method, which works behind the scene by first mapping each split document to a sub-summary via a call to LLM, then combines all those sub-summaries into a single final summary by yet another call to LLM.\n",
    "\n",
    "**Note**: The following call takes about 3 minutes and all the calls to Llama 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8991df49-8578-46de-8b30-cb2cd11e30f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain = load_summarize_chain(llm, chain_type=\"map_reduce\")\n",
    "print(chain.run(split_docs))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}