Conversation
- Add a `--version` flag to the main CLI group using click's `version_option` - Prints `Scrapling, version <version>` and exits, sourcing the version from `scrapling.__version__` - Add a CLI test asserting the flag's output Closes #299
PR #283 fixed the bare `scrapling` image name in docs/ai/mcp-server.md; the agent-skill MCP reference and CLI extract docs still used the unqualified name, which Docker resolves against the official library namespace and fails with pull access denied.
`ResponseFactory.__extract_browser_encoding` matched the charset with `charset=([\w-]+)`, which stops at a quote character. RFC 7231 permits the charset value to be a quoted-string (e.g. `content-type: text/html; charset="ISO-8859-1"`), so for any quoted charset the regex failed to match and the function silently fell back to the `utf-8` default. A page served as quoted ISO-8859-1 / windows-1252 / Shift_JIS would then be decoded as UTF-8, producing mojibake. Allow an optional surrounding quote in the pattern (`charset=["']?([\w-]+)`) so the value is captured without the quote. Unquoted headers are unaffected. The existing `content_type_map` fixture in tests/fetchers/test_utils.py was unused; add focused tests covering unquoted, quoted, and missing charsets.
When `css()`/`xpath()` are called with both `adaptive=True` and `auto_save=True`, the relocation branch guarded the re-save with `if elements is not None`. However `relocate()` returns an empty list (never `None`) when no candidate clears the `percentage` threshold, so the guard always passed and `self.save(elements[0], ...)` raised `IndexError: list index out of range`. This crashes exactly when adaptive resilience is needed most: the page structure changed enough that nothing matches above the threshold. Fix: use a truthiness check (`if elements and auto_save`) so the re-save is skipped when relocation yields nothing. The successful relocation path (which re-saves the relocated element) is unchanged. Added a regression test that fails before the fix (IndexError) and passes after. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ndows CheckpointManager.save() and ResponseCacheManager.put() write to a temp file and then move it into place with Path.rename(). On Windows, os.rename cannot overwrite an existing destination and raises FileExistsError (WinError 183), so every write after the first one fails: checkpoint saving raises and breaks resume, while the development response cache swallows the error and keeps returning the stale entry. Path.replace() (os.replace) overwrites the destination atomically on every platform and behaves identically to rename() on POSIX, so this is a no-op on Linux and macOS and only fixes the broken overwrite on Windows. Add a regression test for the cache overwrite path; the checkpoint overwrite is already covered by test_multiple_saves_overwrite.
…enominator Candidates with fewer attributes than the original got inflated scores because the denominator counted candidate attributes only, while the extra-attributes penalty direction worked as intended. Using `max()` on both counts fixes the inflation while keeping the penalty. Closes #322
The quickstart examples import from `scrapling.fetchers`, which raises `ModuleNotFoundError` on a bare install since those dependencies live in the fetchers extra. Make the consequence explicit in the installation section across the docs and all README translations. Closes #343
The per-request proxy resolution never fell back to the session default, so FetcherSession(proxy=...) was silently ignored, and requests went direct. Same fix in the sync and async paths, with regression tests asserting on the proxy that reaches curl_cffi. Closes #295
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A maintenance update packed with community-reported fixes 🛠️
Note
Follow us on X for daily tips and tricks
🚀 New Stuff and quality of life changes
scrapling install --forceafter updating to refresh them.--versionflag to the CLI by @ETM-Code in #303 (Solves #299)🐛 Bug Fixes
proxyargument being silently ignored in HTTP sessions, which could leak your real IP (Solves #295). Note that mixing a session-levelproxywith a per-requestproxiesargument (or vice versa) now raises an error instead of one being silently dropped.init_scriptwithuser_data_dir(Solves #294).Content-Typeheader by @Bortlesboat in #323.IndexErrorin adaptive element relocation whenauto_saveis enabled by @Mubashirrrr in #340.find_similarfor elements with mismatched attribute counts (Solves #322).Docs
🙏 Special thanks to the community for all the continuous testing and feedback
Big shoutout to our Platinum Sponsors