Skip to content

Conversation

iw4p
Copy link
Contributor

@iw4p iw4p commented Feb 18, 2025

Hi! had problem with youtube part, especially when I wanted to paste url on terminal (MacOS: zsh, bash - Linux: bash).
Also in the README nothing's mentioned about Youtube, I added it until doc completes.

What’s Changed:

This PR introduces several important updates to improve the reliability and functionality of the YouTube transcript fetching process and URL handling:

  1. Retry Logic for YouTube Transcript Fetching:

    • I've added a retry mechanism around the YouTube transcript fetching operation. This helps to handle intermittent failures or network issues more gracefully by retrying the operation a few times before failing.
  2. Fixed URL Decoding Issue:

    • There was an issue where YouTube URLs with escape characters (like \? and \=) were not being processed correctly, especially when pasted from the terminal. This fix ensures that URLs are properly decoded using urllib.parse.unquote(), so URLs like https://www.youtube.com/watch\?v\=videoID are handled properly.
  3. Improved Metadata and Description Extraction:

    • I’ve also improved the logic for extracting metadata and descriptions from YouTube pages. This makes the extraction process more reliable, particularly when dealing with different YouTube page layouts.
  4. Error Handling Improvements:

    • Enhanced error handling for the YouTube transcript fetching process, so the system can recover better from failures or missing data.
  5. Refactored _findKey Function:

    • The _findKey function has been refactored to simplify its code and make it more efficient by using json.items() for dictionary iteration instead of a more complex recursive method.

Why This Change is Needed:

  • Reliability: The retry mechanism will improve the reliability of fetching transcripts, which can fail due to network issues or API rate limiting.
  • Correct URL Processing: With the URL decoding fix, users can now paste URLs directly from the terminal without worrying about escape sequences, ensuring URLs are parsed correctly.
  • Better Metadata Handling: The improvements to metadata and description extraction will ensure that we get more accurate data from YouTube pages.
  • Resiliency: The improved error handling will help the application deal with temporary issues without failing entirely, making the process more robust.

@iw4p
Copy link
Contributor Author

iw4p commented Feb 18, 2025

@iw4p please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@afourney
Copy link
Member

Thanks. This looks good. There appear to be a test error unrelated to this PR, which I will fix, then re-run these tests and merge.

@iw4p
Copy link
Contributor Author

iw4p commented Feb 28, 2025

Hi! Thank you.

@afourney afourney merged commit a394cc7 into microsoft:main Feb 28, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants