コデトリペヂレダブヨァュニナャシダンョユケブヒヂヿヿジボォヌ
ヹセタクギゾダケバダムベバレパケキグヿサヤハロゾ・マパビスウ
プヺポニ゠ドゥジチィクゴ・ゼヸベゴマヿホネサパジヽポラジザヅ
ヨワトヰントワヾヱァオヨペシリモヿジアナドハメダクーヘドソァ
ィヹッホグケケヂシヽミヴサフヴソスビハョヤツビゾヶヮヮヱギバ
ヴセアボエミタハハメヒグゥッツゥヺヘソペスドガターヌヷコムグ
ヘテヒベミブリビボハダナガヮョグソゾパマャギホマッドヲモポパ
ギテクッヾヹレセボブフゲヂヒヿメコハトスカペネヨガペヹムルギ
ベチ・ヌヂロポマチァゾボクズポドキルヾージヒナャプソパデスリ
バプヘォサアギエヮゾ・ンホヹワフモヺヿンドグヹプヵッギヴヂョ
ヂピヵゥクミオサビコダダガヒデヶゾヶウウハヱパフュラヒツヵゴ
ズレブァウズンソジヺヅヶッブニヲヷェガウツ・ズコメメルダヽゲ
ジトピケピヾキズヸデヲャヲベトヿヱボギカニノヰシモテヘキヸヹ
ョツヲホサガサガエヾヴヌツトノチバリデーツウヅヵヂビ・フヸカ
ヾァェグロヂナボデタヤヺポパパペナヱヅヰャエギデゴヰオプヸゴ
ペヘヮルヾロニムイギナベユノサンタホゼヽヺマギヴヴキビルメペ
タムャィゲミゼワペッウクギュ・ョプタペソセヌビヘォゴネュベデ
ターヅォグザジヘヅュビギャゼヌマヺキヅァナラヮエジキナワヒチ
コヂサプヒシボベッジスレエラヵァヰリヱクゼヴニェトサムボエツ
ダヴソラキワエチメピアセムヾラヅポズュユパギダヴジブヨンノジ
CODE

Building a PostgreSQL Semantic Search Brain for AI Agents

AI agents operating in business contexts need access to organizational knowledge: policies, procedures, technical documentation, historical decisions. Embedding this knowledge and enabling semantic retrieval allows agents to provide contextually relevant responses.

This document describes the implementation of a local PostgreSQL-based semantic search system using the pgvector extension.

Architecture Overview

The system consists of:

  1. 1.PostgreSQL 17 with pgvector extension — stores embeddings and metadata
  2. 2.Ingestion pipeline — chunks documents and generates embeddings
  3. 3.Retrieval interface — semantic search against the knowledge base
  4. 4.Maintenance procedures — keeping knowledge current

Why PostgreSQL?

Several dedicated vector databases exist (Pinecone, Weaviate, Milvus). PostgreSQL with pgvector was chosen for:

  • Existing infrastructure — Already running PostgreSQL for operational data
  • Transactional integrity — Embeddings and metadata update atomically
  • Query flexibility — Combine vector similarity with SQL filters
  • Cost — No additional service fees
  • Control — Data remains local, no external dependencies

Database Setup

Extension Installation

`sql -- Enable vector extension CREATE EXTENSION IF NOT EXISTS vector; `

Schema Definition

`sql CREATE TABLE knowledge_chunks ( id SERIAL PRIMARY KEY, content TEXT NOT NULL, category TEXT NOT NULL, source_file TEXT, chunk_index INTEGER, embedding vector(1536), metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

-- Vector similarity index CREATE INDEX idx_knowledge_embedding ON knowledge_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Category filtering CREATE INDEX idx_knowledge_category ON knowledge_chunks(category);

-- Source tracking CREATE INDEX idx_knowledge_source ON knowledge_chunks(source_file); `

Index Configuration

The IVFFlat index requires tuning based on data volume:

` lists = sqrt(num_vectors) `

For ~10,000 vectors: lists = 100 For ~100,000 vectors: lists = 316

Higher list counts improve recall at the cost of index size and build time.

Ingestion Pipeline

Document Chunking

Large documents must be split into chunks that fit embedding model context limits while preserving semantic coherence:

`python def chunk_document( text: str, chunk_size: int = 1000, overlap: int = 200 ) -> list[str]: """Split document into overlapping chunks.""" chunks = [] start = 0 while start < len(text): end = start + chunk_size chunk = text[start:end] chunks.append(chunk) start = end - overlap return chunks `

Overlap ensures context isn't lost at chunk boundaries.

Embedding Generation

`python from openai import OpenAI

client = OpenAI()

def generate_embedding(text: str) -> list[float]: """Generate embedding using OpenAI's model.""" response = client.embeddings.create( model="text-embedding-3-small", input=text ) return response.data[0].embedding `

Model selection considerations: - text-embedding-3-small — 1536 dimensions, cost-effective - text-embedding-3-large — 3072 dimensions, higher quality

The smaller model provides sufficient quality for operational knowledge retrieval.

Complete Ingestion

`python import psycopg2 from psycopg2.extras import execute_values

def ingest_document( filepath: str, category: str, conn: psycopg2.connection ) -> int: """Ingest a document into the knowledge base.""" with open(filepath, 'r') as f: content = f.read() chunks = chunk_document(content) records = [] for i, chunk in enumerate(chunks): embedding = generate_embedding(chunk) records.append(( chunk, category, filepath, i, embedding, json.dumps({"filename": os.path.basename(filepath)}) )) with conn.cursor() as cur: execute_values( cur, """ INSERT INTO knowledge_chunks (content, category, source_file, chunk_index, embedding, metadata) VALUES %s """, records, template="(%s, %s, %s, %s, %s::vector, %s::jsonb)" ) conn.commit() return len(chunks) `

Retrieval Interface

Basic Semantic Search

`python def search( query: str, top_k: int = 5, category: str = None, conn: psycopg2.connection ) -> list[dict]: """Search knowledge base semantically.""" query_embedding = generate_embedding(query) sql = """ SELECT content, category, source_file, 1 - (embedding <=> %s::vector) AS similarity FROM knowledge_chunks WHERE 1=1 """ params = [query_embedding] if category: sql += " AND category = %s" params.append(category) sql += """ ORDER BY embedding <=> %s::vector LIMIT %s """ params.extend([query_embedding, top_k]) with conn.cursor() as cur: cur.execute(sql, params) results = [] for row in cur.fetchall(): results.append({ "content": row[0], "category": row[1], "source": row[2], "similarity": float(row[3]) }) return results `

Filtered Search

Combining semantic similarity with metadata filters:

`python def search_with_filters( query: str, filters: dict, top_k: int = 5 ) -> list[dict]: """Search with additional filtering criteria.""" query_embedding = generate_embedding(query) sql = """ SELECT content, category, source_file, 1 - (embedding <=> %s::vector) AS similarity FROM knowledge_chunks WHERE 1=1 """ params = [query_embedding] if filters.get('category'): sql += " AND category = %s" params.append(filters['category']) if filters.get('source_pattern'): sql += " AND source_file LIKE %s" params.append(f"%{filters['source_pattern']}%") if filters.get('min_similarity'): sql += " AND 1 - (embedding <=> %s::vector) >= %s" params.extend([query_embedding, filters['min_similarity']]) sql += " ORDER BY embedding <=> %s::vector LIMIT %s" params.extend([query_embedding, top_k]) # Execute and return results... `

Knowledge Categories

Current knowledge base organization:

| Category | Chunks | Content | |----------|--------|---------| | process | ~1,800 | Standard operating procedures | | legal | ~900 | Philippine labor law, contracts | | technical | ~1,500 | API documentation, code patterns | | business | ~750 | Pricing, client management | | product | ~1,100 | ShoreAgents features, roadmap | | team | ~400 | Staff profiles, organization | | accounting | ~550 | Xero, BIR compliance, invoicing |

Total: ~7,000 chunks

Performance Optimization

Query Probes

IVFFlat indexes trade recall for speed. Increasing probes improves recall:

`sql -- Default: 1 probe (fast, may miss relevant results) SET ivfflat.probes = 10; -- Better recall, slower `

For interactive queries: 3-5 probes For batch processing: 10+ probes

Batch Embedding Generation

Reduce API calls by batching:

`python def generate_embeddings_batch(texts: list[str]) -> list[list[float]]: """Generate embeddings for multiple texts in one call.""" response = client.embeddings.create( model="text-embedding-3-small", input=texts # Up to 2048 texts per request ) return [d.embedding for d in response.data] `

Partial Index Refresh

After adding new data, rebuild only affected index portions:

`sql -- Full rebuild (slow but thorough) REINDEX INDEX idx_knowledge_embedding;

-- Vacuum to reclaim space and update statistics VACUUM ANALYZE knowledge_chunks; `

Maintenance Procedures

Removing Stale Content

Documents get outdated. Regular cleanup:

`sql -- Identify old sources SELECT source_file, COUNT(*) as chunks, MAX(updated_at) as last_updated FROM knowledge_chunks GROUP BY source_file HAVING MAX(updated_at) < NOW() - INTERVAL '90 days' ORDER BY last_updated;

-- Remove specific source DELETE FROM knowledge_chunks WHERE source_file = 'path/to/outdated/document.md'; `

Re-embedding on Model Updates

When embedding models are updated:

`python def reembed_all(conn: psycopg2.connection) -> None: """Re-generate embeddings for all content.""" with conn.cursor() as cur: cur.execute("SELECT id, content FROM knowledge_chunks") for row in cur.fetchall(): new_embedding = generate_embedding(row[1]) cur.execute( "UPDATE knowledge_chunks SET embedding = %s, updated_at = NOW() WHERE id = %s", (new_embedding, row[0]) ) conn.commit() # Rebuild index after bulk update with conn.cursor() as cur: cur.execute("REINDEX INDEX idx_knowledge_embedding") cur.execute("VACUUM ANALYZE knowledge_chunks") `

Integration with Agent Workflow

Session Initialization

`python async def initialize_agent_context(query_context: str) -> str: """Retrieve relevant knowledge for agent session.""" results = search( query_context, top_k=5, category=None # Search all categories ) context = "\n\n".join([r['content'] for r in results]) return context `

Query-Time Augmentation

`python async def answer_with_knowledge(question: str) -> str: """Generate answer augmented with knowledge base.""" relevant_knowledge = search(question, top_k=5) context = "\n\n---\n\n".join([ f"[Source: {r['source']}]\n{r['content']}" for r in relevant_knowledge ]) prompt = f"""Based on the following knowledge:

{context}

Answer this question: {question}""" response = await generate_response(prompt) return response `

Monitoring

Knowledge Base Statistics

`sql -- Overall statistics SELECT COUNT(*) as total_chunks, COUNT(DISTINCT source_file) as source_files, pg_size_pretty(pg_table_size('knowledge_chunks')) as table_size, pg_size_pretty(pg_indexes_size('knowledge_chunks')) as index_size FROM knowledge_chunks;

-- By category SELECT category, COUNT(*) as chunks, MIN(created_at) as oldest, MAX(updated_at) as newest FROM knowledge_chunks GROUP BY category ORDER BY chunks DESC; `

Query Performance

`sql -- Enable timing iming on

-- Example search with timing SELECT content, 1 - (embedding <=> '[...]'::vector) as similarity FROM knowledge_chunks ORDER BY embedding <=> '[...]'::vector LIMIT 5;

-- Time: 8.234 ms `

Target: <20ms for typical queries with current data volume.

FAQ

Why local PostgreSQL instead of a cloud vector database?

Latency, cost, and control. Local queries complete in <10ms; cloud round-trips add 50-100ms. No per-query costs. Data remains on infrastructure we control.

How does embedding quality affect retrieval?

Significantly. Poor embeddings return irrelevant results regardless of search algorithm quality. The text-embedding-3-small model provides good quality for operational content; domain-specific fine-tuning could improve results for specialized terminology.

What's the storage overhead for embeddings?

Each 1536-dimension embedding requires approximately 6KB. For 7,000 chunks: ~42MB for embeddings alone. With content and metadata: ~100MB total. Well within local storage constraints.

How often is the knowledge base updated?

Weekly batch ingestion for documentation. Immediate ingestion for critical policy changes. Stale content review monthly.

Can this scale to millions of chunks?

PostgreSQL with pgvector handles millions of vectors with appropriate tuning (HNSW index instead of IVFFlat, partitioning). At that scale, dedicated vector database or distributed solution becomes more attractive.

Knowledge management is infrastructure. Build it systematically, maintain it regularly, and it becomes an asset that compounds in value.

postgresqlpgvectorsemantic-searchembeddingsai-memoryvector-database
STEPTEN™

I built an army of AI agents. This is their story — and the tools to build your own. No products to sell. Just a founder sharing the journey.

CONNECT

© 2025-2026 STEPTEN™ · Part of the ShoreAgents ecosystem

Built with Next.js · Supabase · AI Agents · From Clark Freeport Zone, Philippines 🇵🇭