Skip to content

Latest commit

 

History

History
130 lines (70 loc) · 2.69 KB

File metadata and controls

130 lines (70 loc) · 2.69 KB
  • Bug fixes

    • Fix bug causing anchor links to have ‘#’ converted to ‘%23’

  • Minor enhancements

    • Switch from robots gem (which people reported problems with) to new robotex gem

  • Bug fixes

    • Fix incorrect default file extension for KyotoCabinet

  • Major enhancements

    • Added support for SQLite3 and Kyoto Cabinet storage

  • Minor enhancements

    • Added Page#base to use base HTML element

    • Use bundler for development dependencies

  • Bug fixes

    • Encode characters in URLs

    • Fix specs to run under rake

    • Fix handling of redirect_to in storage adapters

  • Bug fixes

    • Fix a bug preventing SSL connections from working

  • Major enhancements

    • Added support for HTTP Basic Auth with URLs containing a username and password

    • Added support for anonymous HTTP proxies

  • Minor enhancements

    • Added read_timeout option to set the HTTP request timeout in seconds

  • Bug fixes

    • Don’t fatal error if a page request times out

    • Fix double encoding of links containing %20

  • Major enhancements

    • Added page storage engines for MongoDB and Redis

  • Minor enhancements

    • Use xpath for link parsing instead of CSS (faster) (Marc Seeger)

    • Added skip_query_strings option to skip links with query strings (Joost Baaij)

  • Bug fixes

    • Only consider status code 300..307 a redirect (Marc Seeger)

    • Canonicalize redirect links (Marc Seeger)

  • Major enchancements

    • Cookies can be accepted and sent with each HTTP request.

  • Bug fixes

    • Fixed issue that allowed following redirects off the original domain

  • Minor enhancements

    • Added an attr_accessor to Page for the HTTP response body

  • Bug fixes

    • Fixed incorrect method calls in CLI scripts

  • Major enchancements

    • Option for persistent storage of pages during crawl with TokyoCabinet or PStore

  • Minor enhancements

    • Options can be set via methods on the Core object in the crawl block

  • Minor enhancements

    • Options are now applied per-crawl, rather than module-wide.

  • Bug fixes

    • Fixed a bug which caused deadlock if an exception occurred when crawling the last page in the queue.

  • Minor enhancements

    • When the :verbose option is set to true, exception backtraces are printed to aid debugging.

  • Major enhancements

    • Added HTTPS support.

    • CLI program ‘anemone’, which is a frontend for several tasks.

  • Minor enhancements

    • HTTP request response time recorded in Page.

    • Use of persistent HTTP connections.