@hackage / Sonnex

Sonnex is an alternative to Soundex for french language

Latest0.1.0.3

About

Metadata

  • Last updated , by zigazou
  • License GPL-3.0-only
  • Categories Natural Language Processing, Text Processing
  • Maintained by: zigazou@free.fr

  • Lottery factor: 0

Links

Installation

Readme

Sonnex

Sonnex is an alternative to Soundex for french language

The string must contain only one word. The Sonnex code contains the following characters:

  • 1 ← un, ein, in, ain
  • 2 ← en, an
  • 3 ← on
  • a ← a, à, â
  • b ← b, bb
  • C ← ch
  • d ← d, dd
  • e ← e, eu
  • E ← ê, é, è, ai, ei
  • f ← f, ff, ph
  • g ← gu
  • i ← î, i, ille
  • j ← j, ge
  • k ← k, c, qu, ck
  • l ← l, ll
  • m ← m, mm
  • n ← n, nn
  • o ← o, ô
  • p ← p, pp
  • r ← r, rr
  • s ← s, ss
  • t ← t, tt
  • u ← u, ù, û
  • v ← v, w
  • z ← z, s
  • U ← ou

Examples

Here are a few examples of sonnex results:

  • balade | ballade → balad
  • basilic | basilique → bazilik
  • boulot | bouleau → bUlo
  • cane | canne → kan
  • censé | sensé → s2sé
  • compte | comte | conte → k3t
  • cygne | signe → sin
  • date | datte → dat
  • dessin | dessein → dés1
  • différend | différent → difér2
  • cric | crique → krik
  • champ | chant → C2

Testing

The test directory contains two files:

  • test-sonnex-homonymes.txt: a list of french homonyms
  • test-sonnex-streets.txt: a list of streets from Rouen (Normandy)

You can try these files against Sonnex by using the following command:

cat test-sonnex-homonymes.txt | runhaskell test-sonnex.hs