@hackage / http-conduit-downloader

HTTP downloader tailored for web-crawler needs.

Latest1.1.5

About

Metadata

  • Last updated , by VladimirShabanov
  • License BSD-3-Clause
  • Categories Web Development
  • Maintained by: Vladimir Shabanov <dev@vshabanov.com>

  • Lottery factor: 0

Links

Installation

Readme

HTTP/HTTPS downloader built on top of http-client and used in https://bazqux.com crawler.

Previously it was based on http-conduit (hence the name) but since all the necessary parts are in http-client now http-conduit is no longer used.

  • Handles all possible http-client exceptions and returns human readable error messages.

  • Handles some web server bugs (returning deflate data instead of gzip, invalid gzip encoding).

  • Uses OpenSSL instead of tls package (since tls doesn't handle all sites and works slower than OpenSSL).

  • Ignores invalid SSL sertificates.

  • Receives data in 32k chunks internally to reduce memory fragmentation on many parallel downloads.

  • Download timeout.

  • Total download size limit.

  • Returns HTTP headers for subsequent redownloads and handles 'Not modified' results.

  • Can be used with external DNS resolver (e.g. concurrent-dns-cache).