A local compatibility server for the Gemini and OpenAI APIs. Run one container, point your SDK at http://localhost:8090, and get both protocol shapes on the same port for tests and development.
Testing code that calls Gemini or OpenAI is painful: real API calls are slow, cost money, and need network access. localaik gives you a single Docker container that speaks both protocols backed by a local model — no API key, no internet, deterministic enough for CI.
┌────────────────────────────────────────────────────────┐
│ localaik container │
│ │
│ ┌──────────────────────────┐ ┌──────────────────┐ │
│ │ localaik proxy (:8090) │ │ llama.cpp (:8080)│ │
│ │ │ │ │ │
│ │ /v1beta/* (Gemini) ────┼──▶ │ Gemma 3 model │ │
│ │ /v1/* (OpenAI) ────┼──▶ │ │ │
│ │ │ └──────────────────┘ │
│ │ │ │
│ │ │ ┌──────────────────┐ │
│ │ PDF uploads ────────────┼──▶ │ pdftoppm │ │
│ │ │ │ PDF ─▶ images │ │
│ └──────────────────────────┘ └──────────────────┘ │
└────────────────────────────────────────────────────────┘
SDK requests hit the localaik proxy, which translates Gemini or OpenAI wire format and forwards to the local llama.cpp server running a Gemma 3 model.
docker run -d -p 8090:8090 gokhalh/localaik
Or with Docker Compose:
services:
localaik:
image: gokhalh/localaik
ports:
- "8090:8090"
localaik is a plain HTTP server, so any language or SDK that can set a base URL will work.
More runnable samples (curl, Go, Python, JavaScript, Java) live under examples/.
Go:
client, err := genai.NewClient(ctx, &genai.ClientConfig{
APIKey: "test",
HTTPOptions: genai.HTTPOptions{BaseURL: "http://localhost:8090"},
})
Python:
from google import genai
client = genai.Client(
api_key="test",
http_options=genai.types.HttpOptions(api_version="v1beta", base_url="http://localhost:8090"),
)
Or set the environment variable for any language:
export GOOGLE_GEMINI_BASE_URL=http://localhost:8090
Python:
from openai import OpenAI
client = OpenAI(api_key="test", base_url="http://localhost:8090/v1")
Go:
client := openai.NewClient(
option.WithAPIKey("test"),
option.WithBaseURL("http://localhost:8090/v1"),
)
| Tag | Model | Image size |
|---|---|---|
latest, gemma3-4b |
Gemma 3 4B Q4_K_M | ~3 GB |
gemma3-12b |
Gemma 3 12B Q4_K_M | ~7 GB |
Version-pinned tags follow the pattern v0.1.1-gemma3-4b, v0.1.1-gemma3-12b.
Pass environment variables to tune the underlying model server:
docker run -d -p 8090:8090 \
-e LK_THREADS=8 \
-e LK_CTX_SIZE=4096 \
-e LK_FLASH_ATTN=1 \
-e LK_CONT_BATCHING=1 \
-e LK_PARALLEL=2 \
gokhalh/localaik
Or with Docker Compose:
services:
localaik:
image: gokhalh/localaik
ports:
- "8090:8090"
environment:
LK_THREADS: 8
LK_CTX_SIZE: 4096
LK_FLASH_ATTN: 1
LK_CONT_BATCHING: 1
LK_PARALLEL: 2
| Variable | Default | Description |
|---|---|---|
LK_CTX_SIZE |
8192 | Context window in tokens |
LK_THREADS |
auto | CPU threads for inference |
LK_THREADS_BATCH |
same as threads | CPU threads for prompt processing |
LK_BATCH_SIZE |
2048 | Prompt processing batch size |
LK_UBATCH_SIZE |
512 | Micro-batch size |
LK_GPU_LAYERS |
0 | Layers offloaded to GPU (99 = all) |
LK_PARALLEL |
1 | Max concurrent request slots |
LK_FLASH_ATTN |
0 (off) | Flash attention (1 to enable) |
LK_CONT_BATCHING |
0 (off) | Continuous batching (1 to enable) |
LK_MLOCK |
0 (off) | Lock model in RAM (1 to enable) |
| Route | Used by | Notes |
|---|---|---|
POST /v1beta/models/{model}:generateContent |
Gemini GenerateContent |
Translated to upstream chat completions |
POST /v1beta/models/{model}:streamGenerateContent |
Gemini GenerateContentStream |
Gemini-style SSE (typically ?alt=sse) |
POST /v1/chat/completions |
OpenAI chat completions | Forwarded to upstream |
GET /health |
Health checks | Custom route |
All other API routes return 404.
Automated contract tests validate against:
google.golang.org/genai v1.51.0github.com/openai/openai-go/v3 v3.30.0Other SDK versions and languages may work if they emit the same HTTP shapes.
Run localaik as a GitHub Actions service container so your tests hit a real local model instead of mocks:
jobs:
test:
runs-on: ubuntu-latest
services:
localaik:
image: gokhalh/localaik
ports:
- 8090:8090
options: >-
--health-cmd "curl -f http://localhost:8090/health"
--health-interval 10s
--health-timeout 5s
--health-retries 30
steps:
- uses: actions/checkout@v4
- run: go test ./...
env:
GOOGLE_GEMINI_BASE_URL: http://localhost:8090
OPENAI_BASE_URL: http://localhost:8090/v1
Supported features:
inlineData), and PDF input (auto-converted to page images)fileData for image URLs and local/data:-URI PDF/text filessystemInstructiongenerationConfig: temperature, topP, topK, candidateCount, maxOutputTokens, stopSequences, responseLogprobs, logprobs, presencePenalty, frequencyPenalty, seedresponseMimeType, responseSchema, responseJsonSchematools, function calling config via toolConfigfunctionCall and functionResponse partsPartial support:
top_k, n, logprobs, and tool choice behavior depends on the upstream runtimeexecutableCode, codeExecutionResult, toolCall, toolResponse parts preserved as text contextNot supported:
GenerateContent / GenerateContentStreamSupported: text chat completions, structured output, vision inputs, tool-related fields (all passed through to upstream).
Not supported: Responses API, Assistants, Embeddings, Images, Audio, Files, Vector stores.
Tip: Run
make docker-upto build and start the localaik container, which includes a local llama.cpp server with a bundled model. This is the easiest way to get a working upstream for development.
# Run the proxy locally (requires a running llama.cpp server)
go run ./cmd/localaik --port 8090 --upstream http://127.0.0.1:8080/v1
# Common commands
make help # Show all targets
make lint # Format check + go vet
make test-unit # Unit tests
make test-integration # Integration tests (requires docker-up)
make test # All of the above
make docker-up # Build and start container
make docker-down # Stop container
# Default (Gemma 3 4B)
docker build -t gokhalh/localaik .
# Custom model
docker build \
--build-arg MODEL_URL=... \
--build-arg MODEL_SHA256=... \
--build-arg MMPROJ_URL=... \
--build-arg MMPROJ_SHA256=... \
-t gokhalh/localaik:custom .