docker compose unification, small changes

Signed-off-by: mudler <mudler@localai.io>
This commit is contained in:
mudler
2025-04-12 18:17:43 +02:00
parent 4858f85ade
commit 0ac5c13c4d
6 changed files with 161 additions and 175 deletions

View File

@@ -45,14 +45,100 @@ LocalAGI ensures your data stays exactly where you want it—on your hardware. N
git clone https://github.com/mudler/LocalAGI git clone https://github.com/mudler/LocalAGI
cd LocalAGI cd LocalAGI
# CPU setup # CPU setup (default)
docker compose up -f docker-compose.yml docker compose up
# GPU setup # NVIDIA GPU setup
docker compose up -f docker-compose.gpu.yml docker compose --profile nvidia up
# Intel GPU setup (for Intel Arc and integrated GPUs)
docker compose --profile intel up
# Start with a specific model (see available models in models.localai.io, or localai.io to use any model in huggingface)
MODEL_NAME=gemma-3-12b-it docker compose up
# NVIDIA GPU setup with custom multimodal and image models
MODEL_NAME=gemma-3-12b-it \
MULTIMODAL_MODEL=minicpm-v-2_6 \
IMAGE_MODEL=flux.1-dev \
docker compose --profile nvidia up
``` ```
Access your agents at `http://localhost:8080` Now you can access and manage your agents at [http://localhost:8080](http://localhost:8080)
## 🖥️ Hardware Configurations
LocalAGI supports multiple hardware configurations through Docker Compose profiles:
### CPU (Default)
- No special configuration needed
- Runs on any system with Docker
- Best for testing and development
- Supports text models only
### NVIDIA GPU
- Requires NVIDIA GPU and drivers
- Uses CUDA for acceleration
- Best for high-performance inference
- Supports text, multimodal, and image generation models
- Run with: `docker compose --profile nvidia up`
- Default models:
- Text: `openthinker-7b`
- Multimodal: `minicpm-v-2_6`
- Image: `flux.1-dev`
- Environment variables:
- `MODEL_NAME`: Text model to use
- `MULTIMODAL_MODEL`: Multimodal model to use
- `IMAGE_MODEL`: Image generation model to use
- `LOCALAI_SINGLE_ACTIVE_BACKEND`: Set to `true` to enable single active backend mode
### Intel GPU
- Supports Intel Arc and integrated GPUs
- Uses SYCL for acceleration
- Best for Intel-based systems
- Supports text, multimodal, and image generation models
- Run with: `docker compose --profile intel up`
- Default models:
- Text: `openthinker-7b`
- Multimodal: `minicpm-v-2_6`
- Image: `sd-1.5-ggml`
- Environment variables:
- `MODEL_NAME`: Text model to use
- `MULTIMODAL_MODEL`: Multimodal model to use
- `IMAGE_MODEL`: Image generation model to use
- `LOCALAI_SINGLE_ACTIVE_BACKEND`: Set to `true` to enable single active backend mode
## Customize models
You can customize the models used by LocalAGI by setting environment variables when running docker-compose. For example:
```bash
# CPU with custom model
MODEL_NAME=gemma-3-12b-it docker compose up
# NVIDIA GPU with custom models
MODEL_NAME=gemma-3-12b-it \
MULTIMODAL_MODEL=minicpm-v-2_6 \
IMAGE_MODEL=flux.1-dev \
docker compose --profile nvidia up
# Intel GPU with custom models
MODEL_NAME=gemma-3-12b-it \
MULTIMODAL_MODEL=minicpm-v-2_6 \
IMAGE_MODEL=sd-1.5-ggml \
docker compose --profile intel up
```
If no models are specified, it will use the defaults:
- Text model: `openthinker-7b`
- Multimodal model: `minicpm-v-2_6`
- Image model: `flux.1-dev` (NVIDIA) or `sd-1.5-ggml` (Intel)
Good (relatively small) models that have been tested are:
- `qwen_qwq-32b` (best in co-ordinating agents)
- `gemma-3-12b-it`
- `gemma-3-27b-it`
## 🏆 Why Choose LocalAGI? ## 🏆 Why Choose LocalAGI?
@@ -98,6 +184,8 @@ Explore detailed documentation including:
### Environment Configuration ### Environment Configuration
LocalAGI supports environment configurations. Note that these environment variables needs to be specified in the localagi container in the docker-compose file to have effect.
| Variable | What It Does | | Variable | What It Does |
|----------|--------------| |----------|--------------|
| `LOCALAGI_MODEL` | Your go-to model | | `LOCALAGI_MODEL` | Your go-to model |

View File

@@ -41,7 +41,7 @@ func (a *PlanAction) Plannable() bool {
func (a *PlanAction) Definition() types.ActionDefinition { func (a *PlanAction) Definition() types.ActionDefinition {
return types.ActionDefinition{ return types.ActionDefinition{
Name: PlanActionName, Name: PlanActionName,
Description: "Use this tool for solving complex tasks that involves calling more tools in sequence.", Description: "Use it for situations that involves doing more actions in sequence.",
Properties: map[string]jsonschema.Definition{ Properties: map[string]jsonschema.Definition{
"subtasks": { "subtasks": {
Type: jsonschema.Array, Type: jsonschema.Array,

View File

@@ -115,7 +115,7 @@ Available Tools:
const reSelfEvalTemplate = pickSelfTemplate const reSelfEvalTemplate = pickSelfTemplate
const pickActionTemplate = hudTemplate + ` const pickActionTemplate = hudTemplate + `
Your only task is to analyze the situation and determine a goal and the best tool to use, or just a final response if we have fullfilled the goal. Your only task is to analyze the conversation and determine a goal and the best tool to use, or just a final response if we have fullfilled the goal.
Guidelines: Guidelines:
1. Review the current state, what was done already and context 1. Review the current state, what was done already and context

View File

@@ -1,75 +0,0 @@
services:
localai:
# See https://localai.io/basics/container/#standard-container-images for
# a list of available container images (or build your own with the provided Dockerfile)
# Available images with CUDA, ROCm, SYCL, Vulkan
# Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
# Image list (dockerhub): https://hub.docker.com/r/localai/localai
image: localai/localai:master-sycl-f32-ffmpeg-core
command:
# - rombo-org_rombo-llm-v3.0-qwen-32b # minimum suggested model
- openthinker-7b # (smaller)
- granite-embedding-107m-multilingual
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 60s
timeout: 10m
retries: 120
ports:
- 8081:8080
environment:
- DEBUG=true
#- LOCALAI_API_KEY=sk-1234567890
volumes:
- ./volumes/models:/build/models:cached
- ./volumes/images:/tmp/generated/images
devices:
# On a system with integrated GPU and an Arc 770, this is the Arc 770
- /dev/dri/card1
- /dev/dri/renderD129
localrecall:
image: quay.io/mudler/localrecall:main
ports:
- 8080
environment:
- COLLECTION_DB_PATH=/db
- EMBEDDING_MODEL=granite-embedding-107m-multilingual
- FILE_ASSETS=/assets
- OPENAI_API_KEY=sk-1234567890
- OPENAI_BASE_URL=http://localai:8080
volumes:
- ./volumes/localrag/db:/db
- ./volumes/localrag/assets/:/assets
localrecall-healthcheck:
depends_on:
localrecall:
condition: service_started
image: busybox
command: ["sh", "-c", "until wget -q -O - http://localrecall:8080 > /dev/null 2>&1; do echo 'Waiting for localrecall...'; sleep 1; done; echo 'localrecall is up!'"]
localagi:
depends_on:
localai:
condition: service_healthy
localrecall-healthcheck:
condition: service_completed_successfully
build:
context: .
dockerfile: Dockerfile.webui
ports:
- 8080:3000
image: quay.io/mudler/localagi:master
environment:
- LOCALAGI_MODEL=openthinker-7b
- LOCALAGI_LLM_API_URL=http://localai:8080
#- LOCALAGI_LLM_API_KEY=sk-1234567890
- LOCALAGI_LOCALRAG_URL=http://localrecall:8080
- LOCALAGI_STATE_DIR=/pool
- LOCALAGI_TIMEOUT=5m
- LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- ./volumes/localagi/:/pool

View File

@@ -1,85 +0,0 @@
services:
localai:
# See https://localai.io/basics/container/#standard-container-images for
# a list of available container images (or build your own with the provided Dockerfile)
# Available images with CUDA, ROCm, SYCL, Vulkan
# Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
# Image list (dockerhub): https://hub.docker.com/r/localai/localai
image: localai/localai:master-gpu-nvidia-cuda-12
command:
- mlabonne_gemma-3-27b-it-abliterated
- qwen_qwq-32b
# Other good alternative options:
# - rombo-org_rombo-llm-v3.0-qwen-32b # minimum suggested model
# - arcee-agent
- granite-embedding-107m-multilingual
- flux.1-dev
- minicpm-v-2_6
environment:
# Enable if you have a single GPU which don't fit all the models
- LOCALAI_SINGLE_ACTIVE_BACKEND=true
- DEBUG=true
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 10s
timeout: 20m
retries: 20
ports:
- 8081:8080
volumes:
- ./volumes/models:/build/models:cached
- ./volumes/images:/tmp/generated/images
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
localrecall:
image: quay.io/mudler/localrecall:main
ports:
- 8080
environment:
- COLLECTION_DB_PATH=/db
- EMBEDDING_MODEL=granite-embedding-107m-multilingual
- FILE_ASSETS=/assets
- OPENAI_API_KEY=sk-1234567890
- OPENAI_BASE_URL=http://localai:8080
volumes:
- ./volumes/localrag/db:/db
- ./volumes/localrag/assets/:/assets
localrecall-healthcheck:
depends_on:
localrecall:
condition: service_started
image: busybox
command: ["sh", "-c", "until wget -q -O - http://localrecall:8080 > /dev/null 2>&1; do echo 'Waiting for localrecall...'; sleep 1; done; echo 'localrecall is up!'"]
localagi:
depends_on:
localai:
condition: service_healthy
localrecall-healthcheck:
condition: service_completed_successfully
build:
context: .
dockerfile: Dockerfile.webui
ports:
- 8080:3000
image: quay.io/mudler/localagi:master
environment:
- LOCALAGI_MODEL=qwen_qwq-32b
- LOCALAGI_LLM_API_URL=http://localai:8080
#- LOCALAGI_LLM_API_KEY=sk-1234567890
- LOCALAGI_LOCALRAG_URL=http://localrecall:8080
- LOCALAGI_STATE_DIR=/pool
- LOCALAGI_TIMEOUT=5m
- LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false
- LOCALAGI_MULTIMODAL_MODEL=minicpm-v-2_6
- LOCALAGI_IMAGE_MODEL=flux.1-dev
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- ./volumes/localagi/:/pool

View File

@@ -24,14 +24,44 @@ services:
- ./volumes/models:/build/models:cached - ./volumes/models:/build/models:cached
- ./volumes/images:/tmp/generated/images - ./volumes/images:/tmp/generated/images
# decomment the following piece if running with Nvidia GPUs localai-nvidia:
# deploy: profiles: ["nvidia"]
# resources: extends:
# reservations: service: localai
# devices: environment:
# - driver: nvidia - LOCALAI_SINGLE_ACTIVE_BACKEND=true
# count: 1 - DEBUG=true
# capabilities: [gpu] deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
command:
- ${MODEL_NAME:-openthinker-7b}
- ${MULTIMODAL_MODEL:-minicpm-v-2_6}
- ${IMAGE_MODEL:-flux.1-dev}
- granite-embedding-107m-multilingual
localai-intel:
profiles: ["intel"]
environment:
- LOCALAI_SINGLE_ACTIVE_BACKEND=true
- DEBUG=true
extends:
service: localai
image: localai/localai:master-sycl-f32-ffmpeg-core
devices:
# On a system with integrated GPU and an Arc 770, this is the Arc 770
- /dev/dri/card1
- /dev/dri/renderD129
command:
- ${MODEL_NAME:-openthinker-7b}
- ${MULTIMODAL_MODEL:-minicpm-v-2_6}
- ${IMAGE_MODEL:-sd-1.5-ggml}
- granite-embedding-107m-multilingual
localrecall: localrecall:
image: quay.io/mudler/localrecall:main image: quay.io/mudler/localrecall:main
ports: ports:
@@ -77,3 +107,31 @@ services:
- "host.docker.internal:host-gateway" - "host.docker.internal:host-gateway"
volumes: volumes:
- ./volumes/localagi/:/pool - ./volumes/localagi/:/pool
localagi-nvidia:
profiles: ["nvidia"]
extends:
service: localagi
environment:
- LOCALAGI_MODEL=${MODEL_NAME:-openthinker-7b}
- LOCALAGI_MULTIMODAL_MODEL=${MULTIMODAL_MODEL:-minicpm-v-2_6}
- LOCALAGI_IMAGE_MODEL=${IMAGE_MODEL:-flux.1-dev}
- LOCALAGI_LLM_API_URL=http://localai:8080
- LOCALAGI_LOCALRAG_URL=http://localrecall:8080
- LOCALAGI_STATE_DIR=/pool
- LOCALAGI_TIMEOUT=5m
- LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false
localagi-intel:
profiles: ["intel"]
extends:
service: localagi
environment:
- LOCALAGI_MODEL=${MODEL_NAME:-openthinker-7b}
- LOCALAGI_MULTIMODAL_MODEL=${MULTIMODAL_MODEL:-minicpm-v-2_6}
- LOCALAGI_IMAGE_MODEL=${IMAGE_MODEL:-sd-1.5-ggml}
- LOCALAGI_LLM_API_URL=http://localai:8080
- LOCALAGI_LOCALRAG_URL=http://localrecall:8080
- LOCALAGI_STATE_DIR=/pool
- LOCALAGI_TIMEOUT=5m
- LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false