Mobile AI Apache-2.0

MLC LLM

Machine Learning Compilation framework for deploying LLMs on mobile devices, browsers, and edge hardware. Native iOS, Android, and WebGPU support.

Website GitHub

Platforms: androidioswindowsmacoslinuxcross-platform

MLC LLM is a universal deployment framework that brings large language models to mobile phones, web browsers, and edge devices through machine learning compilation. It uses Apache TVM to compile models into optimized native code for each target platform, enabling on-device inference on iOS, Android, WebGPU-capable browsers, and embedded systems. For developers building AI-powered mobile or web applications that need on-device LLM inference without cloud dependencies, MLC LLM is the most versatile cross-platform compilation solution available.

Key Features

True cross-platform deployment. A single model can be compiled and deployed to iOS (Metal), Android (OpenCL/Vulkan), web browsers (WebGPU), Windows, macOS, and Linux from the same source. This write-once-deploy-everywhere approach eliminates the need for platform-specific inference engines.

Machine learning compilation. MLC LLM uses Apache TVM’s compilation stack to transform models from high-level frameworks into optimized code for each target device. The compiler automatically tunes performance for specific hardware, generating optimized kernels for GPU, CPU, and accelerator backends.

Mobile-native SDKs. Native SDKs for iOS (Swift) and Android (Kotlin/Java) let developers integrate LLM inference directly into mobile applications. The SDKs handle model loading, tokenization, and generation with platform-appropriate APIs.

WebGPU browser inference. MLC LLM can run models directly in web browsers using WebGPU, requiring no server-side infrastructure. Users visit a web page and run inference on their own GPU through the browser, enabling truly serverless AI applications.

Quantization and optimization. Support for 3-bit and 4-bit quantization makes models small enough to fit on mobile devices with limited memory. The compilation process applies platform-specific optimizations including operator fusion and memory planning.

OpenAI-compatible serving. For server deployments, MLC LLM includes an OpenAI-compatible REST API server with continuous batching and structured generation support.

When to Use MLC LLM

Choose MLC LLM when you need to deploy LLMs on mobile devices, in web browsers, or on edge hardware. It is essential for mobile app developers adding offline AI features, web developers wanting client-side inference, and IoT/edge deployment scenarios where cloud connectivity is unreliable.

Ecosystem Role

MLC LLM fills the mobile and edge deployment gap that desktop-focused tools like Ollama and llama.cpp leave open. While llama.cpp has some mobile support, MLC LLM’s compilation approach produces more optimized binaries for specific devices. For server-side inference, vLLM or Ollama are simpler choices. MLC LLM’s strength is pushing LLMs to platforms where traditional inference engines struggle.