Skip to content

Commit 9352cce

Browse files
authored
Merge branch 'dev' into fix/quoted-charset-encoding
2 parents 6390c0a + b3408c6 commit 9352cce

35 files changed

Lines changed: 150 additions & 156 deletions

.github/workflows/tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ jobs:
7373
- name: Install all browsers dependencies
7474
run: |
7575
python3 -m pip install --upgrade pip
76-
python3 -m pip install playwright==1.59.0 patchright==1.59.1
76+
python3 -m pip install playwright==1.60.0 patchright==1.60.1
7777
7878
- name: Get Playwright version
7979
id: playwright-version

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ There are many ways to contribute to Scrapling. Here are some of them:
1111
- Report bugs and request features using the [GitHub issues](https://github.com/D4Vinci/Scrapling/issues). Please follow the issue template to help us resolve your issue quickly.
1212
- Blog about Scrapling. Tell the world how you’re using Scrapling. This will help newcomers with more examples and increase the Scrapling project's visibility.
1313
- Join the [Discord community](https://discord.gg/EMgGbDceNQ) and share your ideas on how to improve Scrapling. We’re always open to suggestions.
14-
- If you are not a developer, perhaps you would like to help with translating the [documentation](https://github.com/D4Vinci/Scrapling/tree/docs)?
14+
- If you are not a developer, perhaps you would like to help with translating the [documentation](https://github.com/D4Vinci/Scrapling/tree/dev/docs)?
1515

1616
## Making a Pull Request
1717
To ensure that your PR gets accepted, please make sure that your PR is based on the latest changes from the dev branch and that it satisfies the following requirements:

README.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -141,16 +141,6 @@ MySpider().start()
141141
<a href="https://tikhub.io/?utm_source=github.com/D4Vinci/Scrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad" target="_blank">TikHub.io</a> provides 900+ stable APIs across 16+ platforms including TikTok, X, YouTube & Instagram, with 40M+ datasets. <br /> Also offers <a href="https://ai.tikhub.io/?ref=KarimShoair" target="_blank">DISCOUNTED AI models</a> - Claude, GPT, GEMINI & more up to 71% off.
142142
</td>
143143
</tr>
144-
<tr>
145-
<td width="200">
146-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank" title="Scalable Web Data Access for AI Applications">
147-
<img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/nsocks.png">
148-
</a>
149-
</td>
150-
<td>
151-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank">Nsocks</a> provides fast Residential and ISP proxies for developers and scrapers. Global IP coverage, high anonymity, smart rotation, and reliable performance for automation and data extraction. Use <a href="https://www.xcrawl.com/?keyword=2p67aivg" target="_blank">Xcrawl</a> to simplify large-scale web crawling.
152-
</td>
153-
</tr>
154144
<tr>
155145
<td width="200">
156146
<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting.">
@@ -189,7 +179,7 @@ MySpider().start()
189179
</a>
190180
</td>
191181
<td>
192-
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> provides residential proxies from just $0.015/IP or $0.68/GB. 20M+ IPs across 90+ countries. Sticky or rotating sessions, managed from desktop or mobile app.
182+
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> provides residential proxies from just $0.018/IP or $0.68/GB. 20M+ IPs across 90+ countries. Sticky or rotating sessions, managed from desktop or mobile app.
193183
</td>
194184
</tr>
195185
<tr>
@@ -481,7 +471,8 @@ Scrapling requires Python 3.10 or higher:
481471
pip install scrapling
482472
```
483473

484-
This installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies.
474+
> [!IMPORTANT]
475+
> This installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies. So importing anything from `scrapling.fetchers` or `scrapling.spiders`, like in the examples above, will raise `ModuleNotFoundError` with this installation alone. If you are going to use any of the fetchers or spiders, install the fetchers' dependencies first as shown below.
485476
486477
### Optional Dependencies
487478

agent-skill/Scrapling-Skill.zip

2 Bytes
Binary file not shown.

agent-skill/Scrapling-Skill/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: scrapling-official
33
description: Scrape web pages using Scrapling with anti-bot bypass (like Cloudflare Turnstile), stealth headless browsing, spiders framework, adaptive scraping, and JavaScript rendering. Use when asked to scrape, crawl, or extract data from websites; web_fetch fails; the site has anti-bot protections; write Python code to scrape/crawl; or write spiders.
4-
version: "0.4.8"
4+
version: "0.4.9"
55
license: Complete terms in LICENSE.txt
66
metadata:
77
homepage: "https://scrapling.readthedocs.io/en/latest/index.html"
@@ -40,7 +40,7 @@ Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers fo
4040

4141
Create a virtual Python environment through any way available, like `venv`, then inside the environment do:
4242

43-
`pip install "scrapling[all]>=0.4.8"`
43+
`pip install "scrapling[all]>=0.4.9"`
4444

4545
Then do this to download all the browsers' dependencies:
4646

agent-skill/Scrapling-Skill/examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ All examples collect **all 100 quotes across 10 pages**.
99
Make sure Scrapling is installed:
1010

1111
```bash
12-
pip install "scrapling[all]>=0.4.8"
12+
pip install "scrapling[all]>=0.4.9"
1313
scrapling install --force
1414
```
1515

agent-skill/Scrapling-Skill/references/mcp-server.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ Docker alternative:
208208

209209
```bash
210210
docker pull pyd4vinci/scrapling
211-
docker run -i --rm scrapling mcp
211+
docker run -i --rm pyd4vinci/scrapling mcp
212212
```
213213

214214
The MCP server name when registering with a client is `ScraplingServer`. The command is the path to the `scrapling` binary and the argument is `mcp`.

docs/README_AR.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,16 +137,6 @@ MySpider().start()
137137
<a href="https://tikhub.io/?utm_source=github.com/D4Vinci/Scrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad" target="_blank">TikHub.io</a> يوفر أكثر من 900 واجهة API مستقرة عبر أكثر من 16 منصة تشمل TikTok و X و YouTube و Instagram، مع أكثر من 40 مليون مجموعة بيانات. <br /> يقدم أيضاً <a href="https://ai.tikhub.io/?ref=KarimShoair" target="_blank">نماذج ذكاء اصطناعي بأسعار مخفضة</a> - Claude و GPT و GEMINI والمزيد بخصم يصل إلى 71%.
138138
</td>
139139
</tr>
140-
<tr>
141-
<td width="200">
142-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank" title="Scalable Web Data Access for AI Applications">
143-
<img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/nsocks.png">
144-
</a>
145-
</td>
146-
<td>
147-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank">Nsocks</a> يوفر بروكسيات سكنية و ISP سريعة للمطورين والسكرابرز. تغطية IP عالمية، إخفاء هوية عالي، تدوير ذكي، وأداء موثوق للأتمتة واستخراج البيانات. استخدم <a href="https://www.xcrawl.com/?keyword=2p67aivg" target="_blank">Xcrawl</a> لتبسيط زحف الويب على نطاق واسع.
148-
</td>
149-
</tr>
150140
<tr>
151141
<td width="200">
152142
<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting.">
@@ -185,7 +175,7 @@ MySpider().start()
185175
</a>
186176
</td>
187177
<td>
188-
يوفر <a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> بروكسيات سكنية بدءًا من 0.015 دولار فقط لكل IP أو 0.68 دولار لكل جيجابايت. أكثر من 20 مليون عنوان IP في أكثر من 90 دولة. جلسات ثابتة أو متناوبة، تتم إدارتها من تطبيق سطح المكتب أو الجوال.
178+
يوفر <a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> بروكسيات سكنية بدءًا من 0.018 دولار فقط لكل IP أو 0.68 دولار لكل جيجابايت. أكثر من 20 مليون عنوان IP في أكثر من 90 دولة. جلسات ثابتة أو متناوبة، تتم إدارتها من تطبيق سطح المكتب أو الجوال.
189179
</td>
190180
</tr>
191181
<tr>
@@ -477,7 +467,8 @@ Scrapling ليس قوياً فحسب - بل هو أيضاً سريع بشكل م
477467
pip install scrapling
478468
```
479469

480-
يتضمن هذا التثبيت فقط محرك المحلل وتبعياته، بدون أي جوالب أو تبعيات سطر الأوامر.
470+
> [!IMPORTANT]
471+
> يتضمن هذا التثبيت فقط محرك المحلل وتبعياته، بدون أي جوالب أو تبعيات سطر الأوامر. لذلك، فإن استيراد أي شيء من `scrapling.fetchers` أو `scrapling.spiders`، كما في الأمثلة أعلاه، سيؤدي إلى خطأ `ModuleNotFoundError` مع هذا التثبيت وحده. إذا كنت ستستخدم أيًا من الجوالب أو العناكب، فقم أولًا بتثبيت تبعيات الجوالب كما هو موضح أدناه.
481472
482473
### التبعيات الاختيارية
483474

docs/README_CN.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,16 +137,6 @@ MySpider().start()
137137
<a href="https://tikhub.io/?utm_source=github.com/D4Vinci/Scrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad" target="_blank">TikHub.io</a> 提供覆盖 16+ 平台(包括 TikTok、X、YouTube 和 Instagram)的 900+ 稳定 API,拥有 4000 万+ 数据集。<br /> 还提供<a href="https://ai.tikhub.io/?ref=KarimShoair" target="_blank">优惠 AI 模型</a> - Claude、GPT、GEMINI 等,最高优惠 71%。
138138
</td>
139139
</tr>
140-
<tr>
141-
<td width="200">
142-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank" title="Scalable Web Data Access for AI Applications">
143-
<img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/nsocks.png">
144-
</a>
145-
</td>
146-
<td>
147-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank">Nsocks</a> 提供面向开发者和爬虫的快速住宅和 ISP 代理。全球 IP 覆盖、高匿名性、智能轮换,以及可靠的自动化和数据提取性能。使用 <a href="https://www.xcrawl.com/?keyword=2p67aivg" target="_blank">Xcrawl</a> 简化大规模网页爬取。
148-
</td>
149-
</tr>
150140
<tr>
151141
<td width="200">
152142
<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting.">
@@ -185,7 +175,7 @@ MySpider().start()
185175
</a>
186176
</td>
187177
<td>
188-
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> 提供住宅代理,价格低至每个 IP 仅 $0.015 或每 GB $0.68。覆盖 90+ 国家/地区的 2000 万+ IP。支持固定或轮换会话,可通过桌面或移动应用进行管理。
178+
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> 提供住宅代理,价格低至每个 IP 仅 $0.018 或每 GB $0.68。覆盖 90+ 国家/地区的 2000 万+ IP。支持固定或轮换会话,可通过桌面或移动应用进行管理。
189179
</td>
190180
</tr>
191181
<tr>
@@ -477,7 +467,8 @@ Scrapling 需要 Python 3.10 或更高版本:
477467
pip install scrapling
478468
```
479469

480-
此安装仅包括解析器引擎及其依赖项,没有任何 Fetcher 或命令行依赖项。
470+
> [!IMPORTANT]
471+
> 此安装仅包括解析器引擎及其依赖项,没有任何 Fetcher 或命令行依赖项。 因此,仅使用此安装时,像上面的示例那样从 `scrapling.fetchers``scrapling.spiders` 导入任何内容都会引发 `ModuleNotFoundError`。如果要使用任何 Fetcher 或 Spider,请先按照下面的说明安装 Fetcher 的依赖项。
481472
482473
### 可选依赖项
483474

docs/README_DE.md

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -137,16 +137,6 @@ MySpider().start()
137137
<a href="https://tikhub.io/?utm_source=github.com/D4Vinci/Scrapling&utm_medium=marketing_social&utm_campaign=retargeting&utm_content=carousel_ad" target="_blank">TikHub.io</a> bietet über 900 stabile APIs auf mehr als 16 Plattformen, darunter TikTok, X, YouTube und Instagram, mit über 40 Mio. Datensätzen. <br /> Bietet außerdem <a href="https://ai.tikhub.io/?ref=KarimShoair" target="_blank">vergünstigte KI-Modelle</a> - Claude, GPT, GEMINI und mehr mit bis zu 71% Rabatt.
138138
</td>
139139
</tr>
140-
<tr>
141-
<td width="200">
142-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank" title="Scalable Web Data Access for AI Applications">
143-
<img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/nsocks.png">
144-
</a>
145-
</td>
146-
<td>
147-
<a href="https://www.nsocks.com/?keyword=2p67aivg" target="_blank">Nsocks</a> bietet schnelle Residential- und ISP-Proxies für Entwickler und Scraper. Globale IP-Abdeckung, hohe Anonymität, intelligente Rotation und zuverlässige Leistung für Automatisierung und Datenextraktion. Verwenden Sie <a href="https://www.xcrawl.com/?keyword=2p67aivg" target="_blank">Xcrawl</a>, um großflächiges Web-Crawling zu vereinfachen.
148-
</td>
149-
</tr>
150140
<tr>
151141
<td width="200">
152142
<a href="https://petrosky.io/d4vinci" target="_blank" title="PetroSky delivers cutting-edge VPS hosting.">
@@ -185,7 +175,7 @@ MySpider().start()
185175
</a>
186176
</td>
187177
<td>
188-
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> bietet Residential-Proxys ab nur 0,015 $/IP oder 0,68 $/GB. Über 20 Mio. IPs in mehr als 90 Ländern. Sticky oder rotierende Sessions, verwaltet über die Desktop- oder Mobile-App.
178+
<a href="https://9proxy.com/pricing?tab=traffic&utm_source=Github&utm_campaign=D4vinci" target="_blank">9Proxy</a> bietet Residential-Proxys ab nur 0,018 $/IP oder 0,68 $/GB. Über 20 Mio. IPs in mehr als 90 Ländern. Sticky oder rotierende Sessions, verwaltet über die Desktop- oder Mobile-App.
189179
</td>
190180
</tr>
191181
<tr>
@@ -477,7 +467,8 @@ Scrapling erfordert Python 3.10 oder höher:
477467
pip install scrapling
478468
```
479469

480-
Diese Installation enthält nur die Parser-Engine und ihre Abhängigkeiten, ohne Fetcher oder Kommandozeilenabhängigkeiten.
470+
> [!IMPORTANT]
471+
> Diese Installation enthält nur die Parser-Engine und ihre Abhängigkeiten, ohne Fetcher oder Kommandozeilenabhängigkeiten. Daher führt der Import von allem aus `scrapling.fetchers` oder `scrapling.spiders`, wie in den Beispielen oben, mit dieser Installation allein zu einem `ModuleNotFoundError`. Wenn Sie einen der Fetcher oder Spider verwenden möchten, installieren Sie zuerst die Fetcher-Abhängigkeiten wie unten gezeigt.
481472
482473
### Optionale Abhängigkeiten
483474

0 commit comments

Comments
 (0)