{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "OBCrx5fSG4Qm" }, "source": [ "# CS-UY 4613: Project\n", "\n", "Yufei Zhen\n", "\n", "macOS: Ventura 13.3.1 (a), GPU: Apple M2 Max" ] }, { "cell_type": "markdown", "metadata": { "id": "IptBGhoVG790" }, "source": [ "## Setup\n", "\n", "* video source: [https://www.youtube.com/@pantelism](https://www.youtube.com/@pantelism)\n", "\n", "* **option 1** (repository source: [https://github.com/PacktPublishing/LLM-Engineers-Handbook](https://github.com/PacktPublishing/LLM-Engineers-Handbook))\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8i3CcnpG_VPn", "outputId": "597a492a-6305-43a6-e94e-b74fa8a12d7b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cloning into 'LLM-Engineers-Handbook'...\n", "remote: Enumerating objects: 1970, done.\u001b[K\n", "remote: Counting objects: 100% (515/515), done.\u001b[K\n", "remote: Compressing objects: 100% (138/138), done.\u001b[K\n", "remote: Total 1970 (delta 414), reused 377 (delta 377), pack-reused 1455 (from 2)\u001b[K\n", "Receiving objects: 100% (1970/1970), 4.77 MiB | 21.22 MiB/s, done.\n", "Resolving deltas: 100% (1263/1263), done.\n" ] } ], "source": [ "# !git clone https://github.com/PacktPublishing/LLM-Engineers-Handbook.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !poetry env use 3.11\n", "# !poetry install --without aws\n", "# !poetry run pre-commit install" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MPS available: True\n", "CUDA available: False\n" ] } ], "source": [ "import torch\n", "print(f\"MPS available: {torch.backends.mps.is_available()}\")\n", "print(f\"CUDA available: {torch.cuda.is_available()}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ufyNDhgOYiUh" }, "source": [ "## RAG Architecture\n", "\n", "- Integrating into [https://github.com/PacktPublishing/LLM-Engineers-Handbook/tree/main/llm_engineering/application/rag](https://github.com/PacktPublishing/LLM-Engineers-Handbook/tree/main/llm_engineering/application/rag):\n", "\n", "- Directory overview: \n", "\n", "```\n", ".\n", "├── ... \n", "├── clips/ # Generated video clip responses\n", "├── llm_engineering/ # Core project package\n", "│ ├── application/\n", "│ │ ├── ...\n", "│ │ ├── rag # Main RAG architecture\n", "│ │ │ ├── __init__.py\n", "│ │ │ ├── base.py\n", "│ │ │ ├── multimodel_dispatcher.py (new)\n", "│ │ │ ├── pipeline.py (new)\n", "│ │ │ ├── prompt_templates.py\n", "│ │ │ ├── query_expansion.py\n", "│ │ │ ├── reranking.py\n", "│ │ │ ├── retriever.py (modified)\n", "│ │ │ ├── self_query.py\n", "│ │ │ ├── topic_retriever.py (new)\n", "│ │ │ ├── video_ingetser.py (new)\n", "│ │ │ ├── video_processor.py (new)\n", "│ ├── domain/\n", "│ │ ├── ...\n", "│ │ ├── queries.py (modified)\n", "│ │ ├── video_chunks.py (new)\n", "├── demonstration.ipynb (YOU'RE HERE)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Video Ingestion" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "video_db = \"/Users/yufeizhen/Desktop/project/videos\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m2025-05-04 03:25:21.777\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mllm_engineering.settings\u001b[0m:\u001b[36mload_settings\u001b[0m:\u001b[36m94\u001b[0m - \u001b[1mLoading settings from the ZenML secret store.\u001b[0m\n", "\u001b[32m2025-05-04 03:25:21.929\u001b[0m | \u001b[33m\u001b[1mWARNING \u001b[0m | \u001b[36mllm_engineering.settings\u001b[0m:\u001b[36mload_settings\u001b[0m:\u001b[36m99\u001b[0m - \u001b[33m\u001b[1mFailed to load settings from the ZenML secret store. Defaulting to loading the settings from the '.env' file.\u001b[0m\n", "\u001b[32m2025-05-04 03:25:22.015\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mllm_engineering.infrastructure.db.mongo\u001b[0m:\u001b[36m__new__\u001b[0m:\u001b[36m20\u001b[0m - \u001b[1mConnection to MongoDB with URI successful: mongodb://llm_engineering:llm_engineering@127.0.0.1:27017\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;35mPyTorch version 2.2.2 available.\u001b[0m\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\u001b[32m2025-05-04 03:25:23.410\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mllm_engineering.infrastructure.db.qdrant\u001b[0m:\u001b[36m__new__\u001b[0m:\u001b[36m29\u001b[0m - \u001b[1mConnection to Qdrant DB with URI successful: str\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2\u001b[0m\n", "Initializing fallback TextEmbedder\n", "\u001b[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2\u001b[0m\n", "Loading CLIP model: openai/clip-vit-base-patch32\n", "CLIP model loaded successfully\n", "Initialized embedders\n", "Loaded NLP model\n", "Loaded BERTopic\n", "Processing videos from: /Users/yufeizhen/Desktop/project/videos\n", "Already processed 8 videos\n", "Previously processed videos:\n", " - 9CGGh6ivg68\n", " - FCQ-rih6cHY\n", " - TV-DjM8242s\n", " - WXoOohWU28Y\n", " - eFgkZKhNUdM\n", " - eQ6UE968Xe4\n", " - lb_5AdUpfuA\n", " - rCVlIVKqqGE\n", "Found 8 video folders\n", "Will process 0 videos (8 skipped)\n", "Skipping TV-DjM8242s (already processed)\n", "Skipping eFgkZKhNUdM (already processed)\n", "Skipping eQ6UE968Xe4 (already processed)\n", "Skipping rCVlIVKqqGE (already processed)\n", "Skipping lb_5AdUpfuA (already processed)\n", "Skipping FCQ-rih6cHY (already processed)\n", "Skipping 9CGGh6ivg68 (already processed)\n", "Skipping WXoOohWU28Y (already processed)\n", "\n", "All videos processed!\n", "Total processed videos: 8\n" ] } ], "source": [ "from llm_engineering.application.rag.video_ingester import VideoIngester\n", "\n", "ingester = VideoIngester(video_root=video_db)\n", "# ingester.process_video_library(force_reprocess=True)\n", "ingester.process_video_library()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total stored vectors: 403\n" ] } ], "source": [ "from qdrant_client import QdrantClient\n", "\n", "client = QdrantClient(path=\"/Users/yufeizhen/Desktop/project/qdrant_storage\")\n", "print(\"Total stored vectors:\", client.count(\"video_chunks\").count)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Video Q&A" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Initializing VideoQAEngine\n", "Video root: /Users/yufeizhen/Desktop/project/videos\n", "Qdrant storage path: /Users/yufeizhen/Desktop/project/qdrant_storage\n", "Connected to Qdrant storage at: /Users/yufeizhen/Desktop/project/qdrant_storage\n", "Available collections: collections=[CollectionDescription(name='video_chunks')]\n", "Found video_chunks collection with 403 points\n", "Initializing fallback TextEmbedder\n", "\u001b[1;35mLoad pretrained SentenceTransformer: all-MiniLM-L6-v2\u001b[0m\n", "Loading CLIP model: openai/clip-vit-base-patch32\n", "CLIP model loaded successfully\n", "VideoQAEngine initialized successfully\n" ] } ], "source": [ "from llm_engineering.application.rag.pipeline import VideoQAEngine\n", "\n", "engine = VideoQAEngine(video_root=video_db)\n", "\n", "def respond(question):\n", " clips = engine.ask(question)\n", " return [(str(clip[\"path\"]), f\"Relevance: {clip['score']:.2f}\") for clip in clips]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "question = \"Using only the videos, explain the the binary cross entropy loss function.\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--- Processing query: 'Using only the videos, explain the the binary cross entropy loss function.' ---\n", "Retrieving relevant video segments...\n", "Encoding query with CLIP: 'Using only the videos, explain the the binary cros...'\n", "Cleaned text for CLIP: Using only the videos, explain the the binary cros...\n", "Query embedded successfully\n", "Sending search request to Qdrant (attempt 1/5)\n", "Creating fresh connection to Qdrant...\n", "Search successful, found 3 results\n", "Retrieval completed in 0.07 seconds\n", "Found 3 relevant video segments\n", "\n", "Processing result 1/3:\n", " Video ID: eFgkZKhNUdM\n", " Timestamps: 1270.0s - 1302.0s\n", " Score: 0.8472\n", " Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4\n", " Creating clip to: clips/clip_eFgkZKhNUdM_1270_0.847.mp4\n", " Clip created successfully\n", "\n", "Processing result 2/3:\n", " Video ID: eFgkZKhNUdM\n", " Timestamps: 642.0s - 647.0s\n", " Score: 0.8467\n", " Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4\n", " Creating clip to: clips/clip_eFgkZKhNUdM_642_0.847.mp4\n", " Clip created successfully\n", "\n", "Processing result 3/3:\n", " Video ID: eFgkZKhNUdM\n", " Timestamps: 874.0s - 882.0s\n", " Score: 0.8379\n", " Found alternative video path: /Users/yufeizhen/Desktop/project/videos/eFgkZKhNUdM/eFgkZKhNUdM.mp4\n", " Creating clip to: clips/clip_eFgkZKhNUdM_874_0.838.mp4\n", " Clip created successfully\n", "\n", "Processed 3 clips successfully\n" ] }, { "data": { "text/plain": [ "[('clips/clip_eFgkZKhNUdM_1270_0.847.mp4', 'Relevance: 0.85'),\n", " ('clips/clip_eFgkZKhNUdM_642_0.847.mp4', 'Relevance: 0.85'),\n", " ('clips/clip_eFgkZKhNUdM_874_0.838.mp4', 'Relevance: 0.84')]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "respond(question)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gradio App" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;35mHTTP Request: GET \u001b[0m\u001b[34mhttps://api.gradio.app/pkg-version\u001b[1;35m \"HTTP/1.1 200 OK\"\u001b[0m\n" ] } ], "source": [ "import gradio as gr\n", "\n", "interface = gr.Interface(\n", " fn=respond,\n", " inputs=gr.Textbox(label=\"Ask about the video content\"),\n", " outputs=gr.Gallery(label=\"Relevant Video Clips\"),\n", " examples=[\n", " [\"Using only the videos, explain how ResNets work.\"],\n", " [\"Using only the videos, explain the advantages of CNNs over fully connected networks.\"],\n", " [\"Using only the videos, explain the the binary cross entropy loss function.\"]\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* Running on local URL: http://127.0.0.1:7860\n", "\u001b[1;35mHTTP Request: GET \u001b[0m\u001b[34mhttp://127.0.0.1:7860/gradio_api/startup-events\u001b[1;35m \"HTTP/1.1 200 OK\"\u001b[0m\n", "\u001b[1;35mHTTP Request: HEAD \u001b[0m\u001b[34mhttp://127.0.0.1:7860/\u001b[1;35m \"HTTP/1.1 200 OK\"\u001b[0m\n", "\u001b[1;35mHTTP Request: GET \u001b[0m\u001b[34mhttps://api.gradio.app/v3/tunnel-request\u001b[1;35m \"HTTP/1.1 200 OK\"\u001b[0m\n", "* Running on public URL: https://382d4d0bacff86ee02.gradio.live\n", "\n", "This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)\n", "\u001b[1;35mHTTP Request: HEAD \u001b[0m\u001b[34mhttps://382d4d0bacff86ee02.gradio.live\u001b[1;35m \"HTTP/1.1 200 OK\"\u001b[0m\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "