Inference Engine Built by Cognisoc MIT

Mullama

Versatile local LLM inference engine with multi-language bindings for Python, Node.js, Go, Rust, PHP, and C/C++. Supports daemon server and embedded modes.

GitHub

Platforms: windowsmacoslinux

Mullama is a local LLM inference engine designed for developers who need native language bindings rather than HTTP API calls. Unlike tools that expose models exclusively through a REST interface, Mullama provides direct bindings for Python, Node.js, Go, Rust, PHP, and C/C++, allowing you to embed inference directly into your application without network overhead or serialization costs.

Key Features

Native multi-language bindings. Mullama offers first-class support for seven programming languages. Each binding provides idiomatic APIs that feel natural in their respective ecosystem — Python developers get generators and context managers, Rust developers get ownership-safe interfaces, and Node.js developers get async/await patterns. This eliminates the boilerplate of HTTP client code and JSON parsing.

Dual operating modes. Run Mullama as a background daemon server for shared access across multiple applications, or embed it directly into your process for single-application use. The daemon mode handles concurrent requests with automatic queuing, while embedded mode minimizes latency by keeping the model loaded in your application’s memory space.

GGUF model compatibility. Mullama reads standard GGUF model files, giving you access to the entire ecosystem of quantized models available on Hugging Face and other repositories. Any model that works with llama.cpp works with Mullama.

Cross-platform support. Mullama compiles and runs on Windows, macOS, and Linux with automatic GPU detection for NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (Metal) hardware. CPU-only fallback ensures it works on any machine.

Lightweight footprint. The core engine is compiled to a single binary with minimal dependencies. Language bindings link against this binary through FFI rather than bundling the entire engine, keeping package sizes small.

When to Use Mullama

Choose Mullama when you want to call LLM inference from your application code without running a separate server process or making HTTP requests. It is particularly valuable for polyglot teams working across multiple languages, for applications where inference latency matters, and for embedded systems or edge deployments where running a full HTTP server is impractical.

Ecosystem Role

Mullama fills the gap between low-level llama.cpp C bindings and high-level HTTP-based tools like Ollama. If you need programmatic access from languages beyond Python and want tighter integration than a REST API provides, Mullama is the right tool.

Key Features

When to Use Mullama

Ecosystem Role

Related Guides

Comparisons