Skip to content

Conversation

@rlancemartin
Copy link
Contributor

@rlancemartin rlancemartin commented Jun 6, 2023

This introduces the YoutubeAudioLoader, which will load blobs from a YouTube url and write them. Blobs are then parsed by OpenAIWhisperParser(), as show in this PR, but we extend the parser to split audio such that each chuck meets the 25MB OpenAI size limit. As shown in the notebook, this enables a very simple UX:

# Transcribe the video to text
loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser())
docs = loader.load()

Tested on full set of Karpathy lecture videos:

# Karpathy lecture videos
urls = ["https://youtu.be/VMj-3S1tku0"
        "https://youtu.be/PaCmpygFfXo",
        "https://youtu.be/TCH_1BHY58I",
        "https://youtu.be/P6sfmUTpUmc",
        "https://youtu.be/q8SA3rM6ckI",
        "https://youtu.be/t3YJ5hKiMQ0",
        "https://youtu.be/kCc8FmEb1nY"]

# Directory to save audio files 
save_dir = "~/Downloads/YouTube"
 
# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())
docs = loader.load()

# Split the audio into chunk_duration_ms chunks
for split_number,i in enumerate(range(0, len(audio), chunk_duration_ms)):

print(f"Transcribing part {split_number}!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"Transcribing part {split_number}!")


with blob.as_bytes_io() as f:
transcript = openai.Audio.transcribe("whisper-1", f)
yield Document(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we yield a single document if the input is a single audio file and we're trying to hide the fact there's chunking under the hood? We can collect the transcripts and concatenate them. The only problem is that it's unclear on which delimiter to use to join on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy to do this. E.g., we can build a single blob from the combined docs:

combined_docs = [doc.page_content for doc in docs].join(strings)

But, as discussed, it's kind of nice to have the intermediate outputs.

(The latency is somewhat high - 15 min for 2 hr video.)

with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url,download=False)
title = info.get('title', 'video')
print(f"Writing file: {title} to {self.save_dir}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"Writing file: {title} to {self.save_dir}")

@rlancemartin rlancemartin force-pushed the rlm/simple_audio_load_and_split branch 7 times, most recently from 01a5729 to 74326d6 Compare June 6, 2023 18:39
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with raise ValueError or ImportError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be raised as well

@rlancemartin rlancemartin force-pushed the rlm/simple_audio_load_and_split branch from 74326d6 to 4f0e4ca Compare June 6, 2023 21:59
@rlancemartin rlancemartin force-pushed the rlm/simple_audio_load_and_split branch from 4f0e4ca to e1fa1a4 Compare June 6, 2023 22:03
@rlancemartin rlancemartin merged commit 4092fd2 into langchain-ai:master Jun 6, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
)

This introduces the `YoutubeAudioLoader`, which will load blobs from a
YouTube url and write them. Blobs are then parsed by
`OpenAIWhisperParser()`, as show in this
[PR](langchain-ai#5580), but we extend
the parser to split audio such that each chuck meets the 25MB OpenAI
size limit. As shown in the notebook, this enables a very simple UX:

```
# Transcribe the video to text
loader = GenericLoader(YoutubeAudioLoader([url],save_dir),OpenAIWhisperParser())
docs = loader.load()
``` 

Tested on full set of Karpathy lecture videos:

```
# Karpathy lecture videos
urls = ["https://youtu.be/VMj-3S1tku0"
        "https://youtu.be/PaCmpygFfXo",
        "https://youtu.be/TCH_1BHY58I",
        "https://youtu.be/P6sfmUTpUmc",
        "https://youtu.be/q8SA3rM6ckI",
        "https://youtu.be/t3YJ5hKiMQ0",
        "https://youtu.be/kCc8FmEb1nY"]

# Directory to save audio files 
save_dir = "~/Downloads/YouTube"
 
# Transcribe the videos to text
loader = GenericLoader(YoutubeAudioLoader(urls,save_dir),OpenAIWhisperParser())
docs = loader.load()
```
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants