@hackage / tdigest

On-line accumulation of rank-based statistics

Latest0.3.1

About

Metadata

  • Last updated , by phadej
  • License BSD-3-Clause
  • Categories Mathematics
  • Maintained by: Oleg Grenrus <oleg.grenrus@iki.fi>

  • Lottery factor: 1

Links

Installation

Tested Compilers

  1. 9.10.1
  2. 9.8.2
  3. 9.6.6
  4. 9.4.8
  5. 9.2.8
  6. 9.0.2
  7. 8.10.7
  8. 8.8.4
  9. 8.6.5

Readme

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example