Tutorial: Implementing Retrieval-Augmented Generation (RAG) with Open WebUI, n8n, and InfluxDB

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by combining them with external data sources to provide contextually relevant and accurate responses. This tutorial guides you through setting up a fully local RAG system using user-friendly UI tools: Open WebUI for the chat interface and RAG pipeline, n8n for workflow automation, and InfluxDB for storing and managing metadata or logs. By the end, you'll have a functional RAG system that retrieves information from your documents and integrates it with a locally hosted LLM.


Prerequisites

Before starting, ensure you have:

  • Docker installed to run Open WebUI, n8n, and InfluxDB.
  • A locally hosted LLM (e.g., Llama 3.1 8B) running via Ollama.
  • A set of documents (e.g., PDFs, Markdown, or text files) to use as your knowledge base.
  • Basic familiarity with Docker and web interfaces.

Step 1: Set Up Open WebUI for the RAG Interface

Open WebUI is an open-source, self-hosted platform that provides a user-friendly interface for interacting with LLMs and implementing RAG.

  1. Install Open WebUI with Docker:
    • Access Open WebUI at http://localhost:8080 in your browser.
    • Create an admin account when prompted.
  2. Configure the LLM:
    • In Open WebUI, go to Settings > Connections and set the Ollama API endpoint to http://host.docker.internal:11434/v1. Save the configuration.
  3. Upload Documents for RAG:
    • Navigate to Workspace > Documents in Open WebUI.
    • Upload your documents (e.g., PDFs or Markdown files) to the knowledge base.
    • Click Scan to process the documents and store their embeddings in the default vector database (Chroma, included with Open WebUI).
  4. Configure RAG Settings:
    • Go to Admin Panel > Settings > Documents.
    • Set the context length to 8192+ tokens to ensure sufficient space for retrieved data (default is 2048, which may be limiting).
    • Optionally, change the embedding model to a high-quality one like all-MiniLM-L6-v2 for better retrieval accuracy.
    • Save the settings.
  5. Test RAG:
    • Start a new chat in Open WebUI and select your LLM model (e.g., Llama 3.1 8B).
    • Use the # prefix followed by a query to retrieve relevant documents (e.g., #What is the capital of France?).
    • Verify that a document icon appears above the chat input, indicating successful retrieval.

Customize the RAG template for better response formatting. Use the following example:

**Generate Response to User Query**
**Step 1: Parse Context Information**
Extract and utilize relevant knowledge from the provided context within <context></context> tags.
**Step 2: Analyze User Query**
Carefully read and comprehend the user's query, pinpointing key concepts and intent.
**Step 3: Determine Response**
Provide a concise and accurate response in the same language as the query.
**Step 4: Handle Uncertainty**
If the answer is unclear, state that you don't know.

Pull a model (e.g., Llama 3.1 8B):

docker exec ollama ollama pull llama3.1:8b

Ensure Ollama is running locally (default port: 11434). Start it with:

docker run -d -p 11434:11434 --name ollama ollama/ollama

Pull and run the Open WebUI Docker image:

docker run -d -p 8080:8080 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui \
--restart always ghcr.io/open-webui/open-webui:main

Step 2: Automate Workflows with n8n

n8n is a low-code workflow automation platform that can enhance your RAG system by automating data ingestion, processing, or response handling.

  1. Install n8n with Docker:
    • Access n8n at http://localhost:5678 and set up a user account.
  2. Create a Workflow for Document Processing:
    • In the n8n dashboard, create a new workflow.
    • Add a Webhook node as the starting point:
      • Set the HTTP method to POST.
      • Copy the generated webhook URL for use in Open WebUI.
      • Set Response Mode to "Using 'Respond to Webhook' node".
    • Add a HTTP Request node to fetch documents from a source (e.g., a local folder or API).
      • Example: Use GET to retrieve Markdown files from a local server.
    • Add an Open WebUI node (if available) or use an HTTP Request node to send processed documents to Open WebUI’s API for ingestion:
      • Endpoint: http://localhost:8080/api/documents.
      • Method: POST.
      • Body: { "content": "{{$node['Code'].json.text}}" }.
  3. Connect n8n to Open WebUI:
    • In Open WebUI, go to Admin Panel > Functions and create a new Pipe function.
    • Replace YOUR_N8N_WEBHOOK_URL and YOUR_AUTH_KEY with your n8n webhook URL and a secure key.
    • Save the Pipe function as "n8n Assistant".
  4. Test the Workflow:
    • Trigger the n8n workflow by sending a test message from Open WebUI using the Pipe function.
    • Verify that documents are processed and sent to Open WebUI for RAG retrieval.

Use the following template to send messages to the n8n webhook and receive responses:

import requests
class Pipe:
    def __init__(self):
        self.valves = {
            "n8n_webhook_url": "YOUR_N8N_WEBHOOK_URL",
            "auth_key": "YOUR_AUTH_KEY"
        }
    async def pipe(self, body, user, *args, **kwargs):
        headers = {"Authorization": f"Bearer {self.valves['auth_key']}"}
        response = requests.post(self.valves["n8n_webhook_url"], json=body, headers=headers)
        return response.json().get("response", "No response from n8n")

Add a Code node to preprocess documents (e.g., clean text or split into chunks):

return items.map(item => ({
  json: {
    text: item.json.content.replace(/[\r\n]+/g, ' ').trim() // Clean text
  }
}));

Run the n8n Docker container:

docker run -d --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n \
--restart always n8nio/n8n

Step 3: Store Metadata with InfluxDB

InfluxDB can store metadata or logs from your RAG system, such as query timestamps, retrieved document IDs, or response times, for monitoring and analysis.

  1. Install InfluxDB with Docker:
    • Access InfluxDB at http://localhost:8086 and complete the initial setup (create an organization, bucket, and API token).
  2. Configure n8n to Log to InfluxDB:
    • In your n8n workflow, add an InfluxDB node after the Open WebUI node.
    • Configure the node:
      • URL: http://localhost:8086.
      • Token: Your InfluxDB API token.
      • Organization: Your InfluxDB organization name.
      • Bucket: Your InfluxDB bucket name (e.g., rag_logs).
      • Measurement: rag_events.
      • Tags: user_id={{user.id}}.
      • Fields: query={{body.query}}, response_time={{Date.now() - body.timestamp}}.
    • Save and activate the workflow.
  3. Query InfluxDB for Insights:
    • In the InfluxDB UI, go to Data Explorer.
    • Select the rag_logs bucket and query for metrics like average response time or query frequency.

Example query (Flux):

from(bucket: "rag_logs")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "rag_events")
  |> mean(column: "response_time")

Run the InfluxDB Docker container:

docker run -d -p 8086:8086 --name influxdb \
-v influxdb_data:/var/lib/influxdb2 influxdb:latest

Step 4: Test the RAG System

  1. Run a Query:
    • In Open WebUI, start a new chat and type a query prefixed with # (e.g., #What is the main topic of my uploaded document?).
    • Verify that the system retrieves relevant document chunks and the LLM generates a concise response.
  2. Check Logs in InfluxDB:
    • Open the InfluxDB UI and confirm that query metadata (e.g., query text, response time) is logged.
    • Use the Data Explorer to visualize trends or identify bottlenecks.
  3. Automate Document Updates:
    • Schedule the n8n workflow to periodically fetch and process new documents (e.g., using a Schedule node set to run daily).
    • Ensure Open WebUI scans new documents automatically via the Scan button or API.

Step 5: Optimize and Troubleshoot

  • Increase Context Length: If responses lack context, ensure the LLM’s context window is set to 8192+ tokens in Ollama’s settings.
  • Improve Embeddings: Use a high-quality embedding model like Instructor X for better retrieval accuracy.
  • Monitor Performance: Use InfluxDB to track response times and optimize the workflow if delays occur.
  • Debug Retrieval Issues: If the LLM hallucinates or misses information, check the document ingestion process in Open WebUI’s Admin Settings > Documents and ensure the correct content extraction engine is used.

Conclusion

You’ve now set up a local RAG system using Open WebUI for the chat interface and RAG pipeline, n8n for workflow automation, and InfluxDB for logging metadata. This setup allows you to leverage your own documents to enhance LLM responses while maintaining data privacy. Experiment with different documents, embedding models, and n8n workflows to tailor the system to your needs.

For further exploration, check out:

  • Open WebUI RAG Documentation
  • n8n Documentation
  • InfluxDB Documentation

Happy building!

Read more