Spaces:
Runtime error
Runtime error
Download locally the tar file, unzip it and obtain the mp4, vtt, json, etc. There is no need to store the files into MongoDB. | |
Process vtt or json and create for each video the short sentences. | |
Remove any repetitions and create timestamped longer sentences. | |
When we do stop the merging of consecutive timestamed short sentences. | |
We can merge everything per video and use spacy’s sentence segmentation to create the longer sentences. | |
https://spacy.io/usage/linguistic-features#sbd (sentence segmentation) - this should be the main and first bathing to try. | |
https://spacy.io/api/sentencizer - much simpler - it may not work as well. | |
Preserve the video that the sentences came from and their timestamps. | |
Use BERTopic to do topic modeling. | |
Start with 5 sentences in a video that you believe belong to a different topic. Just test BERTopic to see if the assigned topics are indeed different. | |
After verifying that BERTopic works you can select an embedding model (eg CLIP) https://maartengr.github.io/BERTopic/getting_started/multimodal/multimodal.html | |
You will now have sentences assigned to a topic and in the d-dimensional space sentences that belong to the same cluster == sentences that are assigned the same token id. | |
Retrieve the sentence embeddings that BERTopic used and store them in Qdrant (vector database). | |
Embed the question as well in the same vector database and in the same d-sim space. | |
Use the Qdrant cosine similarity to retrieve the closest k (kNN) sentences from the database. | |
Lookup their timestamps (note that Qdrant allows you to store vectors with any metadata you need) | |
Use the min and max timestamps from the Qdrant response to slice the video that the sentences belong to creating the clip. https://shotstack.io/learn/use-ffmpeg-to-trim-video/ | |