@hackage louter0.1.1.0

Multi-protocol LLM router and client library

Louter

Multi-protocol LLM proxy and Haskell client library. Connect to any LLM API (OpenAI, Anthropic, Gemini) using any SDK with automatic protocol translation.

Features

  • Protocol Translation: OpenAI ↔ Anthropic ↔ Gemini automatic conversion
  • Dual Usage: Haskell library or standalone proxy server
  • Streaming: Full SSE support with smart buffering
  • Function Calling: Works across all protocols (JSON and XML formats)
  • Vision: Multimodal image support
  • Flexible Auth: Optional authentication for local vs cloud backends

Quick Start

As a Proxy Server

# Install
git clone https://github.com/junjihashimoto/louter.git
cd louter
cabal build all

# Configure
cat > config.yaml <<EOF
backends:
  llama-server:
    type: openai
    url: http://localhost:11211
    requires_auth: false
    model_mapping:
      gpt-4: qwen/qwen2.5-vl-7b
EOF

# Run
cabal run louter-server -- --config config.yaml --port 9000

Now send OpenAI/Anthropic/Gemini requests to localhost:9000.

Test it:

curl http://localhost:9000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello!"}]}'

As a Haskell Library

Add to your project:

# package.yaml
dependencies:
  - louter
  - text
  - aeson

Basic usage:

import Louter.Client
import Louter.Client.OpenAI (llamaServerClient)

main = do
  client <- llamaServerClient "http://localhost:11211"
  response <- chatCompletion client $
    defaultChatRequest "gpt-4" [Message RoleUser "Hello!"]
  print response

Streaming:

import Louter.Client
import Louter.Types.Streaming
import System.IO (hFlush, stdout)

main = do
  client <- llamaServerClient "http://localhost:11211"
  let request = (defaultChatRequest "gpt-4"
        [Message RoleUser "Write a haiku"]) { reqStream = True }

  streamChatWithCallback client request $ \event -> case event of
    StreamContent txt -> putStr txt >> hFlush stdout
    StreamFinish reason -> putStrLn $ "\n[Done: " <> reason <> "]"
    StreamError err -> putStrLn $ "[Error: " <> err <> "]"
    _ -> pure ()

Function calling:

import Data.Aeson (object, (.=))

weatherTool = Tool
  { toolName = "get_weather"
  , toolDescription = Just "Get current weather"
  , toolParameters = object
      [ "type" .= ("object" :: Text)
      , "properties" .= object
          [ "location" .= object
              [ "type" .= ("string" :: Text) ]
          ]
      , "required" .= (["location"] :: [Text])
      ]
  }

request = (defaultChatRequest "gpt-4"
    [Message RoleUser "Weather in Tokyo?"])
    { reqTools = [weatherTool]
    , reqToolChoice = ToolChoiceAuto
    }

Use Cases

Frontend Backend Use Case
OpenAI SDK Gemini API Use OpenAI SDK with Gemini models
Anthropic SDK Local llama-server Use Claude Code with local models
Gemini SDK OpenAI API Use Gemini SDK with GPT models
Any SDK Any Backend Protocol-agnostic development

Configuration

Local model (no auth):

backends:
  local:
    type: openai
    url: http://localhost:11211
    requires_auth: false
    model_mapping:
      gpt-4: qwen/qwen2.5-vl-7b

Cloud API (with auth):

backends:
  openai:
    type: openai
    url: https://api.openai.com
    requires_auth: true
    api_key: "${OPENAI_API_KEY}"
    model_mapping:
      gpt-4: gpt-4-turbo-preview

Multi-backend:

backends:
  local:
    type: openai
    url: http://localhost:11211
    requires_auth: false
    model_mapping:
      gpt-3.5-turbo: qwen/qwen2.5-7b

  openai:
    type: openai
    url: https://api.openai.com
    requires_auth: true
    api_key: "${OPENAI_API_KEY}"
    model_mapping:
      gpt-4: gpt-4-turbo-preview

See examples/ for more configurations.

API Types

Client Creation

-- Local llama-server (no auth)
import Louter.Client.OpenAI (llamaServerClient)
client <- llamaServerClient "http://localhost:11211"

-- Cloud APIs (with auth)
import Louter.Client.OpenAI (openAIClient)
import Louter.Client.Anthropic (anthropicClient)
import Louter.Client.Gemini (geminiClient)

client <- openAIClient "sk-..."
client <- anthropicClient "sk-ant-..."
client <- geminiClient "your-api-key"

Request Types

-- ChatRequest
data ChatRequest = ChatRequest
  { reqModel :: Text
  , reqMessages :: [Message]
  , reqTools :: [Tool]
  , reqTemperature :: Maybe Float
  , reqMaxTokens :: Maybe Int
  , reqStream :: Bool
  }

-- Message
data Message = Message
  { msgRole :: MessageRole  -- RoleSystem | RoleUser | RoleAssistant
  , msgContent :: Text
  }

-- Tool
data Tool = Tool
  { toolName :: Text
  , toolDescription :: Maybe Text
  , toolParameters :: Value  -- JSON schema
  }

Response Types

-- Non-streaming
chatCompletion :: Client -> ChatRequest -> IO (Either Text ChatResponse)

data ChatResponse = ChatResponse
  { respId :: Text
  , respChoices :: [Choice]
  , respUsage :: Maybe Usage
  }

-- Streaming
streamChatWithCallback :: Client -> ChatRequest -> (StreamEvent -> IO ()) -> IO ()

data StreamEvent
  = StreamContent Text           -- Response text
  | StreamReasoning Text         -- Thinking tokens
  | StreamToolCall ToolCall      -- Complete tool call (buffered)
  , StreamFinish FinishReason
  | StreamError Text

Docker

# Build
docker build -t louter .

# Run with config
docker run -p 9000:9000 -v $(pwd)/config.yaml:/app/config.yaml louter

# Or use docker-compose
docker-compose up

Testing

# Python SDK integration tests (43+ tests)
python tests/run_all_tests.py

# Haskell unit tests
cabal test all

Architecture

Client Request (Any Format)
    ↓
Protocol Converter
    ↓
Core IR (OpenAI-based)
    ↓
Backend Adapter
    ↓
LLM Backend (Any Format)

Key Components:

  • SSE Parser: Incremental streaming with attoparsec
  • Smart Buffering: Tool calls buffered until complete JSON
  • Type Safety: Strict Haskell types throughout

Streaming Strategy:

  • Content/Reasoning: Stream immediately (real-time output)
  • Tool Calls: Buffer until complete (valid JSON required)
  • State Machine: Track tool call assembly by index

Proxy Examples

Use OpenAI SDK with Local Models

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:9000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="gpt-4",  # Routed to qwen/qwen2.5-vl-7b
    messages=[{"role": "user", "content": "Hello!"}]
)

Use Claude Code with Gemini

# config.yaml
backends:
  gemini:
    type: gemini
    url: https://generativelanguage.googleapis.com
    requires_auth: true
    api_key: "${GEMINI_API_KEY}"
    model_mapping:
      claude-3-5-sonnet-20241022: gemini-2.0-flash
# Start proxy on Anthropic-compatible port
cabal run louter-server -- --config config.yaml --port 8000

# Configure Claude Code:
# API Endpoint: http://localhost:8000
# Model: claude-3-5-sonnet-20241022

Monitoring

Health check:

curl http://localhost:9000/health

JSON-line logging:

cabal run louter-server -- --config config.yaml --port 9000 2>&1 | jq .

Troubleshooting

Connection refused:

# Check backend is running
curl http://localhost:11211/v1/models

Invalid API key:

# Verify environment variable
echo $OPENAI_API_KEY

Model not found:

  • Check model_mapping in config
  • Frontend model (client requests) → Backend model (sent to API)

Examples

See examples/ for configuration examples and use cases.

License

MIT License - see LICENSE file.