@hackage NaturalLanguageAlphabets0.0.1.0

Alphabet and word representations

Provides different encoding for characters and words in natural language processing. A character will often be encoded as a unicode text string as we deal with multi-symbol characters.

Internal encoding of IMMC symbols are 0-based integers, which allows for the use of unboxed containers.

A very simple unigram-based scoring scheme is also provided.