@hackage NaturalLanguageAlphabets0.0.2.0

Alphabet and word representations

Provides different encoding for characters and words in natural language processing. A character will often be encoded as a unicode text string as we deal with multi-symbol characters.

Internal encoding of IMMC symbols are 0-based integers, which allows for the use of unboxed containers.

A very simple unigram-based scoring scheme and DSL to write such schemes are also provided.

https://github.com/choener/NaturalLanguageAlphabets/blob/master/README.md