@hackage / full-text-search

In-memory full text search engine

Latest0.2.2.3

About

Metadata

  • Last updated , by BenGamari
  • License BSD-3-Clause
  • Categories Natural Language Processing, Text Processing
  • Maintained by: Duncan Coutts <duncan@well-typed.com>, Adam Gundry <adam@well-typed.com>

  • Lottery factor: 3

Links

Installation

Tested Compilers

  1. 9.12.1
  2. 9.10.1
  3. 9.8.2
  4. 9.6.6
  5. 9.4.8
  6. 9.2.8

Package Flags

Use the -f option with cabal commands to enable flags

    build-search-demo (off by default)

    Build a little program illustrating the use of the library

Readme

An in-memory full text search engine library. It lets you run full-text queries on a collection of your documents.

Features:

  • Keyword queries and auto-complete/auto-suggest queries.

  • Can search over any type of "document". (You explain how to extract search terms from them.)

  • Supports documents with multiple fields (e.g. title, body)

  • Supports documents with non-term features (e.g. quality score, page rank)

  • Uses the state of the art BM25F ranking function

  • Adjustable ranking parameters (including field weights and non-term feature scores)

  • In-memory but quite compact. It does not keep a copy of your original documents.

  • Quick incremental index updates, making it possible to keep your text search in-sync with your data.

It is independent of the document type, so you have to write the document-specific parts: extracting search terms and any stop words, case-normalisation or stemming. This is quite easy using libraries such as tokenize and snowball.

The source package includes a demo to illustrate how to use the library. The demo is a simplified version of how the library is used in the hackage-server where it provides the backend for the package search feature.