@hackage / fast-tagsoup

Fast parsing and extracting information from (possibly malformed) HTML/XML documents

Latest1.0.14

About

Metadata

  • Last updated , by VladimirShabanov
  • License BSD-3-Clause
  • Categories XML
  • Maintained by: Vladimir Shabanov <vshabanoff@gmail.com>

  • Lottery factor: 0

Links

Installation

Readme

Fast TagSoup parser. Speeds of 20-200MB/sec were observed.

Works only with strict bytestrings.

This library is intended to be used in conjunction with the original tagsoup package:

import Text.HTML.TagSoup hiding (parseTags, renderTags)
import Text.HTML.TagSoup.Fast

Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags, converts tags to lower case and can decode non UTF-8 XML for you.

This parser is used in production in BazQux Reader feeds and comments crawler.