feat: improve action picking

Signed-off-by: mudler <mudler@localai.io>
Add Github reviewer and improve reasoning
2025-04-11 20:43:48 +02:00 · 2025-04-10 23:36:44 +02:00
14 changed files with 311 additions and 298 deletions
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -3,7 +3,7 @@ name: Run Go Tests
 on:
  push:
    branches:
-      - 'main'
+      - '**'
  pull_request:
    branches:
      - '**'
--- a/2
+++ b/2
@@ -3,7 +3,7 @@ IMAGE_NAME?=webui
 ROOT_DIR:=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))

 prepare-tests:
-	docker compose up -d --build
+	docker compose up -d

 cleanup-tests:
 	docker compose down
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 <p align="center">
-  <img src="./webui/react-ui/public/logo_1.png" alt="LocalAGI Logo" width="220"/>
+  <img src="https://github.com/user-attachments/assets/6958ffb3-31cf-441e-b99d-ce34ec6fc88f" alt="LocalAGI Logo" width="220"/>
 </p>

 <h3 align="center"><em>Your AI. Your Hardware. Your Rules.</em></h3>
@@ -45,129 +45,14 @@ LocalAGI ensures your data stays exactly where you want it—on your hardware. N
 git clone https://github.com/mudler/LocalAGI
 cd LocalAGI

-# CPU setup (default)
-docker compose up
+# CPU setup
+docker compose up -f docker-compose.yml

-# NVIDIA GPU setup
-docker compose -f docker-compose.nvidia.yaml up
-
-# Intel GPU setup (for Intel Arc and integrated GPUs)
-docker compose -f docker-compose.intel.yaml up
-
-# Start with a specific model (see available models in models.localai.io, or localai.io to use any model in huggingface)
-MODEL_NAME=gemma-3-12b-it docker compose up
-
-# NVIDIA GPU setup with custom multimodal and image models
-MODEL_NAME=gemma-3-12b-it \
-MULTIMODAL_MODEL=minicpm-v-2_6 \
-IMAGE_MODEL=flux.1-dev \
-docker compose -f docker-compose.nvidia.yaml up
+# GPU setup
+docker compose up -f docker-compose.gpu.yml
 ```

-Now you can access and manage your agents at [http://localhost:8080](http://localhost:8080)
-
-## 📚🆕 Local Stack Family
-
-🆕 LocalAI is now part of a comprehensive suite of AI tools designed to work together:
-
-<table>
-  <tr>
-    <td width="50%" valign="top">
-      <a href="https://github.com/mudler/LocalAI">
-        <img src="https://raw.githubusercontent.com/mudler/LocalAI/refs/heads/rebranding/core/http/static/logo_horizontal.png" width="300" alt="LocalAI Logo">
-      </a>
-    </td>
-    <td width="50%" valign="top">
-      <h3><a href="https://github.com/mudler/LocalRecall">LocalAI</a></h3>
-      <p>LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local AI inferencing. Does not require GPU.</p>
-    </td>
-  </tr>
-  <tr>
-    <td width="50%" valign="top">
-      <a href="https://github.com/mudler/LocalRecall">
-        <img src="https://raw.githubusercontent.com/mudler/LocalRecall/refs/heads/main/static/localrecall_horizontal.png" width="300" alt="LocalRecall Logo">
-      </a>
-    </td>
-    <td width="50%" valign="top">
-      <h3><a href="https://github.com/mudler/LocalRecall">LocalRecall</a></h3>
-      <p>A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.</p>
-    </td>
-  </tr>
-</table>
-
-## 🖥️ Hardware Configurations
-
-LocalAGI supports multiple hardware configurations through Docker Compose profiles:
-
-### CPU (Default)
- No special configuration needed
- Runs on any system with Docker
- Best for testing and development
- Supports text models only
-
-### NVIDIA GPU
- Requires NVIDIA GPU and drivers
- Uses CUDA for acceleration
- Best for high-performance inference
- Supports text, multimodal, and image generation models
- Run with: `docker compose -f docker-compose.nvidia.yaml up`
- Default models:
-  - Text: `arcee-agent`
-  - Multimodal: `minicpm-v-2_6`
-  - Image: `flux.1-dev`
- Environment variables:
-  - `MODEL_NAME`: Text model to use
-  - `MULTIMODAL_MODEL`: Multimodal model to use
-  - `IMAGE_MODEL`: Image generation model to use
-  - `LOCALAI_SINGLE_ACTIVE_BACKEND`: Set to `true` to enable single active backend mode
-
-### Intel GPU
- Supports Intel Arc and integrated GPUs
- Uses SYCL for acceleration
- Best for Intel-based systems
- Supports text, multimodal, and image generation models
- Run with: `docker compose -f docker-compose.intel.yaml up`
- Default models:
-  - Text: `arcee-agent`
-  - Multimodal: `minicpm-v-2_6`
-  - Image: `sd-1.5-ggml`
- Environment variables:
-  - `MODEL_NAME`: Text model to use
-  - `MULTIMODAL_MODEL`: Multimodal model to use
-  - `IMAGE_MODEL`: Image generation model to use
-  - `LOCALAI_SINGLE_ACTIVE_BACKEND`: Set to `true` to enable single active backend mode
-
-## Customize models
-
-You can customize the models used by LocalAGI by setting environment variables when running docker-compose. For example:
-
-```bash
-# CPU with custom model
-MODEL_NAME=gemma-3-12b-it docker compose up
-
-# NVIDIA GPU with custom models
-MODEL_NAME=gemma-3-12b-it \
-MULTIMODAL_MODEL=minicpm-v-2_6 \
-IMAGE_MODEL=flux.1-dev \
-docker compose -f docker-compose.nvidia.yaml up
-
-# Intel GPU with custom models
-MODEL_NAME=gemma-3-12b-it \
-MULTIMODAL_MODEL=minicpm-v-2_6 \
-IMAGE_MODEL=sd-1.5-ggml \
-docker compose -f docker-compose.intel.yaml up
-```
-
-If no models are specified, it will use the defaults:
- Text model: `arcee-agent`
- Multimodal model: `minicpm-v-2_6`
- Image model: `flux.1-dev` (NVIDIA) or `sd-1.5-ggml` (Intel)
-
-Good (relatively small) models that have been tested are:
-
- `qwen_qwq-32b` (best in co-ordinating agents)
- `gemma-3-12b-it`
- `gemma-3-27b-it`
+Access your agents at `http://localhost:8080`

 ## 🏆 Why Choose LocalAGI?

@@ -213,8 +98,6 @@ Explore detailed documentation including:

 ### Environment Configuration

-LocalAGI supports environment configurations. Note that these environment variables needs to be specified in the localagi container in the docker-compose file to have effect.
-
 | Variable | What It Does |
 |----------|--------------|
 | `LOCALAGI_MODEL` | Your go-to model |
--- a/core/action/goal.go
+++ b/core/action/goal.go
@@ -10,11 +10,12 @@ import (
 // NewGoal creates a new intention action
 // The inention action is special as it tries to identify
 // a tool to use and a reasoning over to use it
-func NewGoal() *GoalAction {
-	return &GoalAction{}
+func NewGoal(s ...string) *GoalAction {
+	return &GoalAction{tools: s}
 }

 type GoalAction struct {
+	tools []string
 }
 type GoalResponse struct {
 	Goal     string `json:"goal"`
--- a/core/action/plan.go
+++ b/core/action/plan.go
@@ -41,7 +41,7 @@ func (a *PlanAction) Plannable() bool {
 func (a *PlanAction) Definition() types.ActionDefinition {
 	return types.ActionDefinition{
 		Name:        PlanActionName,
-		Description: "Use it for situations that involves doing more actions in sequence.",
+		Description: "Use this tool for solving complex tasks that involves calling more tools in sequence.",
 		Properties: map[string]jsonschema.Definition{
 			"subtasks": {
 				Type:        jsonschema.Array,
--- a/core/agent/actions.go
+++ b/core/agent/actions.go
@@ -24,27 +24,15 @@ type decisionResult struct {
 func (a *Agent) decision(
 	ctx context.Context,
 	conversation []openai.ChatCompletionMessage,
-	tools []openai.Tool, toolchoice string, maxRetries int) (*decisionResult, error) {
-
-	var choice *openai.ToolChoice
-
-	if toolchoice != "" {
-		choice = &openai.ToolChoice{
-			Type:     openai.ToolTypeFunction,
-			Function: openai.ToolFunction{Name: toolchoice},
-		}
-	}
+	tools []openai.Tool, toolchoice any, maxRetries int) (*decisionResult, error) {

 	var lastErr error
 	for attempts := 0; attempts < maxRetries; attempts++ {
 		decision := openai.ChatCompletionRequest{
-			Model:    a.options.LLMAPI.Model,
-			Messages: conversation,
-			Tools:    tools,
-		}
-
-		if choice != nil {
-			decision.ToolChoice = *choice
+			Model:      a.options.LLMAPI.Model,
+			Messages:   conversation,
+			Tools:      tools,
+			ToolChoice: toolchoice,
 		}

 		resp, err := a.client.CreateChatCompletion(ctx, decision)
@@ -54,9 +42,6 @@ func (a *Agent) decision(
 			continue
 		}

-		jsonResp, _ := json.Marshal(resp)
-		xlog.Debug("Decision response", "response", string(jsonResp))
-
 		if len(resp.Choices) != 1 {
 			lastErr = fmt.Errorf("no choices: %d", len(resp.Choices))
 			xlog.Warn("Attempt to make a decision failed", "attempt", attempts+1, "error", lastErr)
@@ -204,7 +189,10 @@ func (a *Agent) generateParameters(ctx context.Context, pickTemplate string, act
 		result, attemptErr = a.decision(ctx,
 			cc,
 			a.availableActions().ToTools(),
-			act.Definition().Name.String(),
+			openai.ToolChoice{
+				Type:     openai.ToolTypeFunction,
+				Function: openai.ToolFunction{Name: act.Definition().Name.String()},
+			},
 			maxAttempts,
 		)
 		if attemptErr == nil && result.actionParams != nil {
@@ -265,7 +253,6 @@ func (a *Agent) handlePlanning(ctx context.Context, job *types.Job, chosenAction

 		params, err := a.generateParameters(ctx, pickTemplate, subTaskAction, conv, subTaskReasoning, maxRetries)
 		if err != nil {
-			xlog.Error("error generating action's parameters", "error", err)
 			return conv, fmt.Errorf("error generating action's parameters: %w", err)

 		}
@@ -295,7 +282,6 @@ func (a *Agent) handlePlanning(ctx context.Context, job *types.Job, chosenAction

 		result, err := a.runAction(ctx, subTaskAction, actionParams)
 		if err != nil {
-			xlog.Error("error running action", "error", err)
 			return conv, fmt.Errorf("error running action: %w", err)
 		}

@@ -381,9 +367,7 @@ func (a *Agent) prepareHUD() (promptHUD *PromptHUD) {
 func (a *Agent) pickAction(ctx context.Context, templ string, messages []openai.ChatCompletionMessage, maxRetries int) (types.Action, types.ActionParams, string, error) {
 	c := messages

-	xlog.Debug("[pickAction] picking action starts", "messages", messages)
-
-	// Identify the goal of this conversation
+	xlog.Debug("[pickAction] picking action", "messages", messages)

 	if !a.options.forceReasoning {
 		xlog.Debug("not forcing reasoning")
@@ -392,7 +376,7 @@ func (a *Agent) pickAction(ctx context.Context, templ string, messages []openai.
 		thought, err := a.decision(ctx,
 			messages,
 			a.availableActions().ToTools(),
-			"",
+			nil,
 			maxRetries)
 		if err != nil {
 			return nil, nil, "", err
@@ -431,83 +415,120 @@ func (a *Agent) pickAction(ctx context.Context, templ string, messages []openai.
 		}, c...)
 	}

-	thought, err := a.decision(ctx,
-		c,
-		types.Actions{action.NewReasoning()}.ToTools(),
-		action.NewReasoning().Definition().Name.String(), maxRetries)
-	if err != nil {
-		return nil, nil, "", err
-	}
-	originalReasoning := ""
-	response := &action.ReasoningResponse{}
-	if thought.actionParams != nil {
-		if err := thought.actionParams.Unmarshal(response); err != nil {
-			return nil, nil, "", err
-		}
-		originalReasoning = response.Reasoning
-	}
-	if thought.message != "" {
-		originalReasoning = thought.message
-	}
-
-	xlog.Debug("[pickAction] picking action", "messages", c)
-	// thought, err := a.askLLM(ctx,
-	// 	c,
-
-	actionsID := []string{"reply"}
+	actionsID := []string{}
 	for _, m := range a.availableActions() {
 		actionsID = append(actionsID, m.Definition().Name.String())
 	}

-	xlog.Debug("[pickAction] actionsID", "actionsID", actionsID)
+	// thoughtPromptStringBuilder := strings.Builder{}
+	// thoughtPromptStringBuilder.WriteString("You have to pick an action based on the conversation and the prompt. Describe the full reasoning process for your choice. Here is a list of actions: ")
+	// for _, m := range a.availableActions() {
+	// 	thoughtPromptStringBuilder.WriteString(
+	// 		m.Definition().Name.String() + ": " + m.Definition().Description + "\n",
+	// 	)
+	// }

-	intentionsTools := action.NewIntention(actionsID...)
-	// TODO: FORCE to select ana ction here
-	// NOTE: we do not give the full conversation here to pick the action
-	// to avoid hallucinations
+	// thoughtPromptStringBuilder.WriteString("To not use any action, respond with 'none'")
+
+	//thoughtPromptStringBuilder.WriteString("\n\nConversation: " + Messages(c).RemoveIf(func(msg openai.ChatCompletionMessage) bool {
+	//	return msg.Role == "system"
+	//}).String())
+
+	//thoughtPrompt := thoughtPromptStringBuilder.String()
+
+	//thoughtConv := []openai.ChatCompletionMessage{}
+
+	thought, err := a.askLLM(ctx,
+		c,
+		maxRetries,
+	)
+	if err != nil {
+		return nil, nil, "", err
+	}
+	originalReasoning := thought.Content
+
+	// From the thought, get the action call
+	// Get all the available actions IDs
+
+	// by grammar, let's decide if we have achieved the goal
+	//  1. analyze response and check if  goal is achieved

-	// Extract an action
 	params, err := a.decision(ctx,
-		append(c, openai.ChatCompletionMessage{
-			Role:    "system",
-			Content: "Pick the relevant action given the following reasoning: " + originalReasoning,
-		}),
-		types.Actions{intentionsTools}.ToTools(),
-		intentionsTools.Definition().Name.String(), maxRetries)
+		[]openai.ChatCompletionMessage{
+			{
+				Role:    "system",
+				Content: "Extract an action to perform from the following reasoning: ",
+			},
+			{
+				Role:    "user",
+				Content: originalReasoning,
+			}},
+		types.Actions{action.NewGoal()}.ToTools(),
+		action.NewGoal().Definition().Name, maxRetries)
 	if err != nil {
 		return nil, nil, "", fmt.Errorf("failed to get the action tool parameters: %v", err)
 	}

-	if params.actionParams == nil {
-		xlog.Debug("[pickAction] no action params found")
-		return nil, nil, params.message, nil
-	}
-
-	actionChoice := action.IntentResponse{}
-	err = params.actionParams.Unmarshal(&actionChoice)
+	goalResponse := action.GoalResponse{}
+	err = params.actionParams.Unmarshal(&goalResponse)
 	if err != nil {
 		return nil, nil, "", err
 	}

-	if actionChoice.Tool == "" || actionChoice.Tool == "reply" {
-		xlog.Debug("[pickAction] no action found, replying")
+	if goalResponse.Achieved {
+		xlog.Debug("[pickAction] goal achieved", "goal", goalResponse.Goal)
 		return nil, nil, "", nil
 	}

-	chosenAction := a.availableActions().Find(actionChoice.Tool)
+	// if the goal is not achieved, pick an action
+	xlog.Debug("[pickAction] goal not achieved", "goal", goalResponse.Goal)

-	xlog.Debug("[pickAction] chosenAction", "chosenAction", chosenAction, "actionName", actionChoice.Tool)
+	xlog.Debug("[pickAction] thought", "conv", c, "originalReasoning", originalReasoning)

-	// // Let's double check if the action is correct by asking the LLM to judge it
+	// TODO: FORCE to select ana ction here
+	// NOTE: we do not give the full conversation here to pick the action
+	// to avoid hallucinations
+	params, err = a.decision(ctx,
+		[]openai.ChatCompletionMessage{
+			{
+				Role:    "system",
+				Content: "Extract an action to perform from the following reasoning: ",
+			},
+			{
+				Role:    "user",
+				Content: originalReasoning,
+			}},
+		a.availableActions().ToTools(),
+		nil, maxRetries)
+	if err != nil {
+		return nil, nil, "", fmt.Errorf("failed to get the action tool parameters: %v", err)
+	}

-	// if chosenAction!= nil {
-	// 	promptString:= "Given the following goal and thoughts, is the action correct? \n\n"
-	// 	promptString+= fmt.Sprintf("Goal: %s\n", goalResponse.Goal)
-	// 	promptString+= fmt.Sprintf("Thoughts: %s\n", originalReasoning)
-	// 	promptString+= fmt.Sprintf("Action: %s\n", chosenAction.Definition().Name.String())
-	// 	promptString+= fmt.Sprintf("Action description: %s\n", chosenAction.Definition().Description)
-	// 	promptString+= fmt.Sprintf("Action parameters: %s\n", params.actionParams)
+	chosenAction := a.availableActions().Find(params.actioName)

+	// xlog.Debug("[pickAction] params", "params", params)
+
+	// if params.actionParams == nil {
+	// 	return nil, nil, params.message, nil
+	// }
+
+	// xlog.Debug("[pickAction] actionChoice", "actionChoice", params.actionParams, "message", params.message)
+
+	// actionChoice := action.IntentResponse{}
+
+	// err = params.actionParams.Unmarshal(&actionChoice)
+	// if err != nil {
+	// 	return nil, nil, "", err
+	// }
+
+	// if actionChoice.Tool == "" || actionChoice.Tool == "none" {
+	// 	return nil, nil, "", nil
+	// }
+
+	// // Find the action
+	// chosenAction := a.availableActions().Find(actionChoice.Tool)
+	// if chosenAction == nil {
+	// 	return nil, nil, "", fmt.Errorf("no action found for intent:" + actionChoice.Tool)
 	// }

 	return chosenAction, nil, originalReasoning, nil
--- a/core/agent/agent.go
+++ b/core/agent/agent.go
@@ -249,7 +249,7 @@ func (a *Agent) runAction(ctx context.Context, chosenAction types.Action, params
 		}
 	}

-	xlog.Info("[runAction] Running action", "action", chosenAction.Definition().Name, "agent", a.Character.Name, "params", params.String())
+	xlog.Info("Running action", "action", chosenAction.Definition().Name, "agent", a.Character.Name)

 	if chosenAction.Definition().Name.Is(action.StateActionName) {
 		// We need to store the result in the state
@@ -270,8 +270,6 @@ func (a *Agent) runAction(ctx context.Context, chosenAction types.Action, params
 		}
 	}

-	xlog.Debug("[runAction] Action result", "action", chosenAction.Definition().Name, "params", params.String(), "result", result.Result)
-	
 	return result, nil
 }

@@ -605,13 +603,7 @@ func (a *Agent) consumeJob(job *types.Job, role string) {
 	var err error
 	conv, err = a.handlePlanning(job.GetContext(), job, chosenAction, actionParams, reasoning, pickTemplate, conv)
 	if err != nil {
-		xlog.Error("error handling planning", "error", err)
-		//job.Result.Conversation = conv
-		//job.Result.SetResponse(msg.Content)
-		a.reply(job, role, append(conv, openai.ChatCompletionMessage{
-			Role:    "assistant",
-			Content: fmt.Sprintf("Error handling planning: %v", err),
-		}), actionParams, chosenAction, reasoning)
+		job.Result.Finish(fmt.Errorf("error running action: %w", err))
 		return
 	}

@@ -697,6 +689,26 @@ func (a *Agent) consumeJob(job *types.Job, role string) {
 		job.SetNextAction(&followingAction, &followingParams, reasoning)
 		a.consumeJob(job, role)
 		return
+	} else if followingAction == nil {
+		xlog.Info("Not following another action", "agent", a.Character.Name)
+
+		if !a.options.forceReasoning {
+			xlog.Info("Finish conversation with reasoning", "reasoning", reasoning, "agent", a.Character.Name)
+
+			msg := openai.ChatCompletionMessage{
+				Role:    "assistant",
+				Content: reasoning,
+			}
+
+			conv = append(conv, msg)
+			job.Result.SetResponse(msg.Content)
+			job.Result.Conversation = conv
+			job.Result.AddFinalizer(func(conv []openai.ChatCompletionMessage) {
+				a.saveCurrentConversation(conv)
+			})
+			job.Result.Finish(nil)
+			return
+		}
 	}

 	a.reply(job, role, conv, actionParams, chosenAction, reasoning)
--- a/core/agent/agent_test.go
+++ b/core/agent/agent_test.go
@@ -126,8 +126,6 @@ var _ = Describe("Agent test", func() {
 			agent, err := New(
 				WithLLMAPIURL(apiURL),
 				WithModel(testModel),
-				EnableForceReasoning,
-				WithTimeout("10m"),
 				WithLoopDetectionSteps(3),
 				//	WithRandomIdentity(),
 				WithActions(&TestAction{response: map[string]string{
@@ -176,7 +174,7 @@ var _ = Describe("Agent test", func() {
 			agent, err := New(
 				WithLLMAPIURL(apiURL),
 				WithModel(testModel),
-				WithTimeout("10m"),
+
 				//	WithRandomIdentity(),
 				WithActions(&TestAction{response: map[string]string{
 					"boston": testActionResult,
@@ -201,7 +199,6 @@ var _ = Describe("Agent test", func() {
 			agent, err := New(
 				WithLLMAPIURL(apiURL),
 				WithModel(testModel),
-				WithTimeout("10m"),
 				EnableHUD,
 				//	EnableStandaloneJob,
 				//	WithRandomIdentity(),
@@ -238,7 +235,7 @@ var _ = Describe("Agent test", func() {
 			defer agent.Stop()

 			result := agent.Ask(
-				types.WithText("Thoroughly plan a trip to San Francisco from Venice, Italy; check flight times, visa requirements and whether electrical items are allowed in cabin luggage."),
+				types.WithText("plan a trip to San Francisco from Venice, Italy"),
 			)
 			Expect(len(result.State)).To(BeNumerically(">", 1))

@@ -260,7 +257,6 @@ var _ = Describe("Agent test", func() {
 				WithLLMAPIURL(apiURL),
 				WithModel(testModel),
 				WithLLMAPIKey(apiKeyURL),
-				WithTimeout("10m"),
 				WithNewConversationSubscriber(func(m openai.ChatCompletionMessage) {
 					mu.Lock()
 					message = m
--- a/core/agent/templates.go
+++ b/core/agent/templates.go
@@ -115,7 +115,7 @@ Available Tools:
 const reSelfEvalTemplate = pickSelfTemplate

 const pickActionTemplate = hudTemplate + `
-Your only task is to analyze the conversation and determine a goal and the best tool to use, or just a final response if we have fullfilled the goal.
+Your only task is to analyze the situation and determine a goal and the best tool to use, or just a final response if we have fullfilled the goal.

 Guidelines:
 1. Review the current state, what was done already and context
--- a/docker-compose.gpu.intel.yaml
+++ b/docker-compose.gpu.intel.yaml
@@ -0,0 +1,75 @@
+services:
+  localai:
+    # See https://localai.io/basics/container/#standard-container-images for
+    # a list of available container images (or build your own with the provided Dockerfile)
+    # Available images with CUDA, ROCm, SYCL, Vulkan
+    # Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
+    # Image list (dockerhub): https://hub.docker.com/r/localai/localai
+    image: localai/localai:master-sycl-f32-ffmpeg-core
+    command: 
+    # - rombo-org_rombo-llm-v3.0-qwen-32b # minimum suggested model
+    - arcee-agent # (smaller)
+    - granite-embedding-107m-multilingual
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
+      interval: 60s
+      timeout: 10m
+      retries: 120
+    ports:
+    - 8081:8080
+    environment:
+      - DEBUG=true
+      #- LOCALAI_API_KEY=sk-1234567890
+    volumes:
+      - ./volumes/models:/build/models:cached
+      - ./volumes/images:/tmp/generated/images
+    devices:
+      # On a system with integrated GPU and an Arc 770, this is the Arc 770
+      - /dev/dri/card1
+      - /dev/dri/renderD129
+
+  localrecall:
+    image: quay.io/mudler/localrecall:main
+    ports:
+      - 8080
+    environment:
+      - COLLECTION_DB_PATH=/db
+      - EMBEDDING_MODEL=granite-embedding-107m-multilingual
+      - FILE_ASSETS=/assets
+      - OPENAI_API_KEY=sk-1234567890
+      - OPENAI_BASE_URL=http://localai:8080
+    volumes:
+      - ./volumes/localrag/db:/db
+      - ./volumes/localrag/assets/:/assets
+
+  localrecall-healthcheck:
+    depends_on:
+      localrecall:
+        condition: service_started
+    image: busybox
+    command: ["sh", "-c", "until wget -q -O - http://localrecall:8080 > /dev/null 2>&1; do echo 'Waiting for localrecall...'; sleep 1; done; echo 'localrecall is up!'"]
+
+  localagi:
+    depends_on:
+      localai:
+        condition: service_healthy
+      localrecall-healthcheck:
+        condition: service_completed_successfully
+    build:
+      context: .
+      dockerfile: Dockerfile.webui
+    ports:
+      - 8080:3000
+    image: quay.io/mudler/localagi:master
+    environment:
+      - LOCALAGI_MODEL=arcee-agent
+      - LOCALAGI_LLM_API_URL=http://localai:8080
+      #- LOCALAGI_LLM_API_KEY=sk-1234567890
+      - LOCALAGI_LOCALRAG_URL=http://localrecall:8080
+      - LOCALAGI_STATE_DIR=/pool
+      - LOCALAGI_TIMEOUT=5m
+      - LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    volumes:
+      - ./volumes/localagi/:/pool
--- a/docker-compose.gpu.yaml
+++ b/docker-compose.gpu.yaml
@@ -0,0 +1,85 @@
+services:
+  localai:
+    # See https://localai.io/basics/container/#standard-container-images for
+    # a list of available container images (or build your own with the provided Dockerfile)
+    # Available images with CUDA, ROCm, SYCL, Vulkan
+    # Image list (quay.io): https://quay.io/repository/go-skynet/local-ai?tab=tags
+    # Image list (dockerhub): https://hub.docker.com/r/localai/localai
+    image: localai/localai:master-gpu-nvidia-cuda-12
+    command: 
+    - mlabonne_gemma-3-27b-it-abliterated
+    - qwen_qwq-32b
+    # Other good alternative options:
+    # - rombo-org_rombo-llm-v3.0-qwen-32b # minimum suggested model
+    # - arcee-agent
+    - granite-embedding-107m-multilingual
+    - flux.1-dev
+    - minicpm-v-2_6
+    environment:
+      # Enable if you have a single GPU which don't fit all the models
+      - LOCALAI_SINGLE_ACTIVE_BACKEND=true
+      - DEBUG=true
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
+      interval: 10s
+      timeout: 20m
+      retries: 20
+    ports:
+    - 8081:8080
+    volumes:
+      - ./volumes/models:/build/models:cached
+      - ./volumes/images:/tmp/generated/images
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+  localrecall:
+    image: quay.io/mudler/localrecall:main
+    ports:
+      - 8080
+    environment:
+      - COLLECTION_DB_PATH=/db
+      - EMBEDDING_MODEL=granite-embedding-107m-multilingual
+      - FILE_ASSETS=/assets
+      - OPENAI_API_KEY=sk-1234567890
+      - OPENAI_BASE_URL=http://localai:8080
+    volumes:
+      - ./volumes/localrag/db:/db
+      - ./volumes/localrag/assets/:/assets
+
+  localrecall-healthcheck:
+    depends_on:
+      localrecall:
+        condition: service_started
+    image: busybox
+    command: ["sh", "-c", "until wget -q -O - http://localrecall:8080 > /dev/null 2>&1; do echo 'Waiting for localrecall...'; sleep 1; done; echo 'localrecall is up!'"]
+
+  localagi:
+    depends_on:
+      localai:
+        condition: service_healthy
+      localrecall-healthcheck:
+        condition: service_completed_successfully
+    build:
+      context: .
+      dockerfile: Dockerfile.webui
+    ports:
+      - 8080:3000
+    image: quay.io/mudler/localagi:master
+    environment:
+      - LOCALAGI_MODEL=qwen_qwq-32b
+      - LOCALAGI_LLM_API_URL=http://localai:8080
+      #- LOCALAGI_LLM_API_KEY=sk-1234567890
+      - LOCALAGI_LOCALRAG_URL=http://localrecall:8080
+      - LOCALAGI_STATE_DIR=/pool
+      - LOCALAGI_TIMEOUT=5m
+      - LOCALAGI_ENABLE_CONVERSATIONS_LOGGING=false
+      - LOCALAGI_MULTIMODAL_MODEL=minicpm-v-2_6
+      - LOCALAGI_IMAGE_MODEL=flux.1-dev
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    volumes:
+      - ./volumes/localagi/:/pool
--- a/docker-compose.intel.yaml
+++ b/docker-compose.intel.yaml
@@ -1,33 +0,0 @@
-services:
-  localai:
-    extends:
-      file: docker-compose.yaml
-      service: localai
-    environment:
-      - LOCALAI_SINGLE_ACTIVE_BACKEND=true
-      - DEBUG=true
-    image: localai/localai:master-sycl-f32-ffmpeg-core
-    devices:
-      # On a system with integrated GPU and an Arc 770, this is the Arc 770
-      - /dev/dri/card1
-      - /dev/dri/renderD129
-    command: 
-    - ${MODEL_NAME:-arcee-agent}
-    - ${MULTIMODAL_MODEL:-minicpm-v-2_6}
-    - ${IMAGE_MODEL:-sd-1.5-ggml}
-    - granite-embedding-107m-multilingual
-
-  localrecall:
-    extends:
-      file: docker-compose.yaml
-      service: localrecall
-
-  localrecall-healthcheck:
-    extends:
-      file: docker-compose.yaml
-      service: localrecall-healthcheck
-
-  localagi:
-    extends:
-      file: docker-compose.yaml
-      service: localagi
--- a/docker-compose.nvidia.yaml
+++ b/docker-compose.nvidia.yaml
@@ -1,31 +0,0 @@
-services:
-  localai:
-    extends:
-      file: docker-compose.yaml
-      service: localai
-    environment:
-      - LOCALAI_SINGLE_ACTIVE_BACKEND=true
-      - DEBUG=true
-    image: localai/localai:master-sycl-f32-ffmpeg-core
-    deploy:
-      resources:
-        reservations:
-          devices:
-            - driver: nvidia
-              count: 1
-              capabilities: [gpu]
-
-  localrecall:
-    extends:
-      file: docker-compose.yaml
-      service: localrecall
-
-  localrecall-healthcheck:
-    extends:
-      file: docker-compose.yaml
-      service: localrecall-healthcheck
-
-  localagi:
-    extends:
-      file: docker-compose.yaml
-      service: localagi
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@@ -7,9 +7,7 @@ services:
    # Image list (dockerhub): https://hub.docker.com/r/localai/localai
    image: localai/localai:master-ffmpeg-core
    command: 
-    - ${MODEL_NAME:-arcee-agent}
-    - ${MULTIMODAL_MODEL:-minicpm-v-2_6}
-    - ${IMAGE_MODEL:-flux.1-dev}
+    - arcee-agent # (smaller)
    - granite-embedding-107m-multilingual
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
@@ -25,6 +23,14 @@ services:
      - ./volumes/models:/build/models:cached
      - ./volumes/images:/tmp/generated/images

+    # decomment the following piece if running with Nvidia GPUs
+    # deploy:
+    #   resources:
+    #     reservations:
+    #       devices:
+    #         - driver: nvidia
+    #           count: 1
+    #           capabilities: [gpu]
  localrecall:
    image: quay.io/mudler/localrecall:main
    ports:
@@ -59,9 +65,7 @@ services:
      - 8080:3000
    #image: quay.io/mudler/localagi:master
    environment:
-      - LOCALAGI_MODEL=${MODEL_NAME:-arcee-agent}
-      - LOCALAGI_MULTIMODAL_MODEL=${MULTIMODAL_MODEL:-minicpm-v-2_6}
-      - LOCALAGI_IMAGE_MODEL=${IMAGE_MODEL:-sd-1.5-ggml}
+      - LOCALAGI_MODEL=arcee-agent
      - LOCALAGI_LLM_API_URL=http://localai:8080
      #- LOCALAGI_LLM_API_KEY=sk-1234567890
      - LOCALAGI_LOCALRAG_URL=http://localrecall:8080
Author	SHA1	Message	Date
mudler	e4271b4d2f	feat: improve action picking Signed-off-by: mudler <mudler@localai.io>	2025-04-11 20:43:48 +02:00
Ettore Di Giacinto	9dad2b0ba4	Add Github reviewer and improve reasoning	2025-04-10 23:36:44 +02:00