How to Build a Local AI Knowledge Assistant for Your Company Docs

Introduction

Teams often struggle to find the right information buried across Confluence pages, Notion docs, wikis, or GitHub READMEs. Searching manually wastes time and disrupts focus.
An internal AI knowledge assistant solves this by indexing your organization’s content and allowing anyone to query it conversationally, without sending private data outside your environment.

In this guide, you’ll build a local retrieval-augmented generation (RAG) system that answers questions using your company’s documentation, all hosted within your secure infrastructure.

What You’ll Build

A private AI assistant that:

  • Reads and indexes internal documents (Markdown, PDF, Notion, Confluence)
  • Converts them into searchable vector embeddings
  • Answers questions by retrieving relevant passages and generating context-aware responses
  • Runs locally (no external API dependency if you choose a local LLM)

Step 1: Set Up Your Environment

Prerequisites

  • Python 3.9+
  • Docker (optional, for local LLM or Qdrant)
  • Basic understanding of APIs and environment variables

Install Required Packages

pip install langchain llama-index openai chromadb qdrant-client fastapi uvicorn sentence-transformers

If you prefer a local LLM (for example, using Ollama):

curl https://ollama.ai/install.sh | sh
ollama pull mistral

Step 2: Ingest and Clean Your Documents

Create a Python script ingest_docs.py that collects and cleans company documentation.

from langchain.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os, glob

# Load all markdown and txt files from docs folder
loader = DirectoryLoader('./docs', glob='**/*.md', loader_cls=TextLoader)
documents = loader.load()

# Split long documents into chunks for embeddings
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

print(f"Loaded {len(documents)} docs, split into {len(chunks)} chunks.")

You can extend this using other loaders:

  • NotionLoader from langchain.document_loaders
  • ConfluenceLoader for corporate wiki integration
  • PDFLoader for scanned reports

Step 3: Generate Embeddings and Store Them

Use either SentenceTransformers locally or OpenAI embeddings for higher accuracy.

from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Qdrant

embedding_fn = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Connect to local Qdrant instance
qdrant = Qdrant.from_documents(
    documents=chunks,
    embedding=embedding_fn,
    url="http://localhost:6333",
    collection_name="company_knowledge"
)

print("Embeddings created and stored successfully.")

To run Qdrant locally:

docker run -p 6333:6333 qdrant/qdrant

Step 4: Query the Assistant Locally

Now you’ll connect the retrieval layer to an LLM using LangChain’s RetrievalQA.

from langchain.chains import RetrievalQA
from langchain.llms import Ollama

llm = Ollama(model="mistral")  # local LLM via Ollama
retriever = qdrant.as_retriever(search_kwargs={"k": 3})

qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff"
)

question = "How do we deploy our production environment?"
answer = qa.run(question)

print("Answer:", answer)

If you prefer to use OpenAI’s API instead:

from langchain.llms import OpenAI
llm = OpenAI(model="gpt-4-turbo", temperature=0)

Step 5: Serve It Through a Chat Interface

Use FastAPI for a lightweight REST service or Streamlit for a chat UI.

FastAPI Example

from fastapi import FastAPI, Query
from pydantic import BaseModel

app = FastAPI()

class QueryRequest(BaseModel):
    question: str

@app.post("/ask")
def ask(request: QueryRequest):
    answer = qa.run(request.question)
    return {"answer": answer}

Run with:

uvicorn app:app --host 0.0.0.0 --port 8000

Now you can query it via:

curl -X POST "http://localhost:8000/ask" -H "Content-Type: application/json" -d '{"question":"What’s our VPN setup?"}'

Step 6: Secure and Extend

  • Authentication: Add token-based access control in FastAPI.
  • Scheduling: Re-index docs nightly using a cron job.
  • Versioning: Store embedding metadata (document path, hash, version).
  • UI: Build a Streamlit or React chat front-end using /ask API.
  • Caching: Implement Redis or SQLite caching for repeated queries.

Step 7: Keep It Private and Compliant

To ensure data privacy:

  • Use local embedding models and local LLMs when possible.
  • If using OpenAI or Anthropic APIs, redact sensitive content before sending.
  • Log all queries and responses for transparency.
  • Ensure compliance with internal data policies and GDPR requirements.

Example Folder Structure

ai-knowledge-assistant/
│
├── docs/                      # Company documentation
├── ingest_docs.py
├── app.py                     # FastAPI service
├── requirements.txt
├── vectorstore/               # Local Qdrant data
└── config.env                 # API keys, paths, etc.

Troubleshooting

ProblemCauseFix
Embeddings take too longLarge docs or remote embedding modelUse local SentenceTransformer
Wrong or outdated answersOld embeddingsRe-run ingest_docs.py regularly
Incomplete responsesContext window too smallUse larger model or chunk overlap 300+
Docker memory issuesQdrant indexing large corpusIncrease Docker memory limit

References & Resources

Conclusion

By following this tutorial, you’ve built a private AI knowledge assistant that understands your company’s internal documents, runs locally, and can be extended to any department.

This setup saves countless hours spent on repetitive searches and scales naturally. You can connect more data sources, plug in a front-end chat UI, or fine-tune models on internal phrasing and acronyms.

Ivan Dabić

A man with a beard and glasses, wearing an orange hoodie and a black cap with a Hard Rock Cafe logo, stands with his arms crossed against a plain white background.

Ivan Dabić

Co-founder and CEO of BlueGrid.io, with a background in cloud infrastructure, distributed systems, monitoring, and security operations. He works closely with engineering teams to build and operate reliable systems while documenting both technical and organizational aspects of modern engineering work.

Ivan is a metalhead, and big fan of cyberpunk move genre. If you are his secret Santa go with Star Wars Lego box!

Share this post

Share this link via

Or copy link