Skip to content
Change the repository type filter

All

    Repositories list

    • python-crfsuite

      Public
      A python binding for crfsuite
      Python
      224772463Updated Dec 23, 2025Dec 23, 2025
    • Python
      141521Updated Dec 16, 2025Dec 16, 2025
    • web-poet

      Public
      Web scraping Page Objects core library
      Python
      181041611Updated Dec 15, 2025Dec 15, 2025
    • andi

      Public
      Library for annotation-based dependency injection
      Python
      62441Updated Dec 9, 2025Dec 9, 2025
    • marathon-apps-collectd-plugin

      Public
      marathon-apps-collectd-plugin
      Python
      96201Updated Nov 27, 2025Nov 27, 2025
    • shub

      Public
      Scrapinghub Command Line Client
      Python
      801304613Updated Nov 6, 2025Nov 6, 2025
    • scrapinghub-entrypoint-scrapy

      Public
      Scrapy entrypoint for Scrapinghub job runner
      Python
      162660Updated Nov 3, 2025Nov 3, 2025
    • hcf-backend

      Public
      Crawl Frontier HCF backend
      Python
      6821Updated Oct 31, 2025Oct 31, 2025
    • dateparser

      Public
      python parser for human readable dates
      Python
      4862.8k30155Updated Oct 28, 2025Oct 28, 2025
    • docker-images

      Public
      Dockerfile
      83305Updated Oct 20, 2025Oct 20, 2025
    • Page Object pattern for Scrapy
      Python
      28125135Updated Oct 17, 2025Oct 17, 2025
    • Extract price amount and currency symbol from a raw text string
      Python
      52346179Updated Oct 6, 2025Oct 6, 2025
    • A client interface for Scrapinghub's API
      Python
      62205232Updated Oct 3, 2025Oct 3, 2025
    • extruct

      Public
      Extract embedded metadata from HTML markup
      Python
      1209353915Updated Oct 1, 2025Oct 1, 2025
    • Article extraction benchmark: dataset and evaluation scripts
      Python
      3134311Updated Sep 23, 2025Sep 23, 2025
    • scrapyrt

      Public
      HTTP API for Scrapy spiders
      Python
      161872246Updated Sep 22, 2025Sep 22, 2025
    • Formasaurus tells you the type of an HTML form and its fields using machine learning
      HTML
      46811Updated Sep 4, 2025Sep 4, 2025
    • scikit-learn inspired API for CRFsuite
      Python
      213101Updated Sep 4, 2025Sep 4, 2025
    • Software stack with latest Scrapy and updated deps
      Dockerfile
      206512Updated Aug 2, 2025Aug 2, 2025
    • More flexible and featured Frontera scheduler for Scrapy
      Python
      53621Updated Jun 6, 2025Jun 6, 2025
    • frontera

      Public
      A scalable frontier for web crawlers
      Python
      2171.3k7817Updated Jun 6, 2025Jun 6, 2025
    • spidermon

      Public
      Scrapy Extension for monitoring spiders execution.
      Python
      101552425Updated Apr 11, 2025Apr 11, 2025
    • Python Social Auth - Application - Django
      Python
      395201Updated Nov 18, 2024Nov 18, 2024
    • Parse numbers written in natural language
      Python
      25124136Updated Oct 23, 2024Oct 23, 2024
    • streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
      Python
      221201Updated Sep 20, 2024Sep 20, 2024
    • splash

      Public
      Lightweight, scriptable browser as a service with an HTTP API
      Python
      5184.2k37326Updated Aug 2, 2024Aug 2, 2024
    • A Postgres-backed ContentsManager implementation for IPython
      Python
      87201Updated Jul 18, 2024Jul 18, 2024
    • shublang

      Public
      Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
      Python
      816236Updated Jul 9, 2024Jul 9, 2024
    • An opinionated fork of the Drone CI system
      Go
      504005Updated Jul 7, 2024Jul 7, 2024
    • varanus

      Public
      A command line spider monitoring tool
      Python
      7822Updated Jul 6, 2024Jul 6, 2024