Skip to content

Conversation

@jackfrost1411
Copy link
Contributor

@jackfrost1411 jackfrost1411 commented Jun 13, 2023

We propose an enhancement to the web-based loader initialize method by introducing a "verify" option. This enhancement addresses the issue of SSL verification errors encountered on certain web pages. By providing users with the option to set the verify parameter to False, we offer greater flexibility and control.

Fixes #6079

Who can review?

@eyurtsev @hwchase17

Chnage web base loader initialize method to have "verify" option. For example, before it gave SSL verification error in some webpages. This gives option to the users to set verify option to False if they want.
We propose an enhancement to the web-based loader initialize method by introducing a "verify" option. This enhancement addresses the issue of SSL verification errors encountered in certain web pages. By providing users with the option to set the verify parameter to False, we offer greater flexibility and control.
@jackfrost1411 jackfrost1411 changed the title Jackfrost1411 patch 2 update web_base.py to have verify option Jun 13, 2023
@jackfrost1411
Copy link
Contributor Author

jackfrost1411 commented Jun 13, 2023

And by adding verify option: you can finally pass in headers such as

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

to bypass the SSL verification.

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
loader = WebBaseLoader(web_path="https://SO_AND_SO.com", header_template=headers, verify=False)
data = loader.load()

This solves a lot of issues that I faced in the recent past.

@jackfrost1411
Copy link
Contributor Author

jackfrost1411 commented Jun 13, 2023

The older version of web_base.py gives errors:
image

The newer version of web_base.py is working just fine:
Screen Shot 2023-06-13 at 1 27 12 PM

Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks - seems great

@hwchase17 hwchase17 added the lgtm label Jun 17, 2023
@vercel
Copy link

vercel bot commented Jun 17, 2023

@hwchase17 is attempting to deploy a commit to the LangChain Team on Vercel.

A member of the Team first needs to authorize it.

@hwchase17 hwchase17 merged commit 2eec687 into langchain-ai:master Jun 17, 2023
@zomchak-code zomchak-code mentioned this pull request Jun 21, 2023
This was referenced Jun 25, 2023
rlancemartin pushed a commit that referenced this pull request Jul 5, 2023
Fix for bug in SitemapLoader

`aiohttp` `get` does not accept `verify` argument, and currently throws
error, so SitemapLoader is not working

This PR fixes it by removing `verify` param for `get` function call

Fixes #6107

#### Who can review?

Tag maintainers/contributors who might be interested:

@eyurtsev

---------

Co-authored-by: techcenary <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: Can't load a public webpage

2 participants