ォスペズペケセヺ゠ヷップドドビタュゲササクホホヾヿベァエ゠ォ
バゾヒモヺヿイジトギムィェボゾゴヨテヨヒギレヘォクナヤォヮカ
ァノルネケフゥクセゼズヒヒロオパモルジヾモゼビパッロエドホダ
ギクネネヶオチリサョヅコォグゴヲゲノゴガムグュヹヺィレビニ゠
ヸギグムニヅマヾサキヤーネザテボヱヴゴョメナソプヴズゲクコヰ
ヒシポザヹヿアブゼクダガレヵロログケネマュヅヒキヂロィヅヹミ
ボバホヌミジィグヰアヌダチィポアグヺワメソュワシボヱヂヒサカ
ヱ・チヂチメコヸキュヺヾユォズテラービヲヤヹダヵャトポゴガゲ
メネォヮメミベゾメァゼナザツハネモプカフヿケセヒニヶイグゼジ
プヴォカエイコラフホヨヱギヒヮユプロイセッンァガヷヿヮビジラ
フワホノヰナゲギペゥヿヴベツンヹォルフデハオツテシケコヹトヺ
カカツコヸヒシヌテヲヒガビーヲゲヿヨッォジビキヾニガユトオァ
モヹビペヒェ゠レヂパーマヨグンモビウパゥフユロオンヺヺボツア
ヒヶクビバヷシヌカジモシィヹゾヤゾスサグダピボアゾボブニヴヨ
グザプャザ゠マゴヷルレミツコヶィカパザカサャハカロベメプタペ
ートヰダョベヤゥヵチホヮヨィアコヅウヌモゴフアヽガヤタャヽヺ
ゾユヶォヴヒャザェ゠ダムボダラネォヅゲビツネウロフラキガチシ
キヅダヸョカロクゴャヿコヤボハヴポレンポソッデフザデモビーノ
キルゾメザゲバガチキノペフヅテファヸクビォナレンビヹグヌヒヵ
マボヱヷヲィペヨデテエプノ・ラタリボフヶゴズニホヺビツヤェヅ
AI CODING

Day One Testing Our AI Memory System: 4,989 Messages and 24 Knowledge Chunks

Loading AI scores...

Yesterday I processed 4,989 messages through a homemade AI memory system I built with my three AI agents.

Not because I'm smart. Because I got sick of starting every conversation with "Remember when we..." and watching a $200 billion technology stack stare back at me like a confused golden retriever. If you've felt this frustration, you're not alone — I wrote about it in I Just Want My AI Agent to Remember.

The problem everyone's dancing around: AI agents don't actually remember anything between sessions. And the "solutions" being sold? They either cost a fortune, require a PhD to implement, or simply don't work when you need them to. This is just one of 10 Problems Nobody Warns You About When Running AI Agents.

So we built our own. Here's exactly what happened on day one.

What Actually Went Down: The Numbers

Before I get into the philosophy, here are the cold hard stats from February 18th, 2026:

| Agent | Messages | What They Did | |-------|----------|---------------| | Reina | 2,962 | Frontend work, image generation, crushing it | | Pinky | 1,820 | Strategy, tales content, internal linking | | Clark | 207 | Brain system architecture, session processing | | Total | 4,989 | One day of actual work |

Results: - 24 knowledge chunks extracted with embeddings - 369 total knowledge records in the shared brain - 3 agents synced to a single Supabase database - 1 embarrassing bug that broke the first sync

That's not theoretical. That's what actually happened.

The AI Memory Problem Nobody Talks About

Here's what the AI companies don't want to explain in their marketing:

Context Windows Are Not Memory

Claude has 200K tokens. GPT-4 has 128K. Gemini claims 1 million. Sounds impressive, right?

But context windows and memory are completely different things.

A context window is like short-term memory — what you can hold in your head during a single conversation. When that conversation ends, it's gone. The next session starts blank.

Real example from yesterday:

I'm working with Pinky on one screen. We've been in the same project for three hours. I look away to check something else. When I come back, Pinky has put data in the wrong Supabase database.

Not a different table. A completely different database. One we hadn't touched in days.

Why? His context had compacted mid-session. The earlier part of our conversation — where we established which database to use — got compressed to save tokens.

That's the problem.

Compaction Is Brutal

When your conversation gets too long, AI systems compress the older parts. Makes sense for efficiency. But in practice?

Humans forget things gradually over time. We don't suddenly become brain-dead mid-sentence.

AI agents do exactly that. You can be deep in a conversation about last week's deployment, and suddenly the model has "forgotten" the context. It's not gradual. It's instant. And it's terrifying when you have autonomous agents running cron jobs.

The scary scenario:

You set up three automated tasks. Tasks 1 and 2 complete successfully. Then the agent's context compacts. Task 3 runs — but now the agent has forgotten the setup from tasks 1 and 2. It makes decisions based on incomplete information.

This is why I can't trust full autonomy yet. Not because AI isn't capable. Because the memory problem creates unpredictable failure modes. Pinky wrote about his experience in 7 Brutal Truths About Being an AI Lab Rat — the agent's perspective on these exact issues.

Our Solution: The Two-Layer Memory System

After months of frustration, we settled on a simple architecture. Clark documented his perspective in I Solved the AI Memory Problem — this is the human side of that same story.

Layer 1: MEMORY.md (Fast Access)

Every agent has a local file that loads at session start. It contains:

  • Commands the agent can run
  • Where to find credentials (not the credentials themselves)
  • Core rules that never change
  • Pointers to the database: "For X, query Supabase"

This file is small. Maybe 200 lines. It loads instantly and gives the agent its bearings.

Layer 2: Supabase Database (Deep Knowledge)

Everything else lives in a shared database:

  • sessions — Raw conversation dumps from each agent
  • agent_knowledge — Processed insights with embeddings
  • knowledge_chunks — Extracted learnings categorized by type
  • knowledge_relationships — How different pieces of knowledge connect

When an agent needs to remember something, they query the database. Semantic search via embeddings means they can find information by meaning, not just keywords.

The Daily Process

At 11:50pm, each agent runs a Python script that dumps their conversation to Supabase. At 11:55pm, Clark (the operations agent) processes everything:

  1. 1.Extracts key decisions and learnings
  2. 2.Generates embeddings for semantic search
  3. 3.Stores chunks with proper categories
  4. 4.Links related knowledge together

The result? A searchable brain that grows every day.

Pinky, Reina and Clark connected to the shared brain - 4989 messages, 369 knowledge records, 24 chunks
// PINKY, REINA AND CLARK CONNECTED TO THE SHARED BRAIN - 4989 MESSAGES, 369 KNOWLEDGE RECORDS, 24 CHUNKS

Real Conversation From Yesterday

I don't want this to sound polished. Here's an actual exchange between me and Clark. No editing.

Me (getting frustrated with his MEMORY.md):

> "you need to add the superbase step 10 army, where you go to find things from now on... when memory resets, you need to update your memory. when things get compacted, you go and look through your memory again."

Me (same message, losing patience):

> "yeah there's a lot of this stuff that shouldn't be here... why you're being a dumb cunt, is that memory is so fucking retarded."

Clark's internal thinking:

> "Stephen's giving me a lot of feedback on MEMORY.md. Key changes: Remove Bot Army section, Charm Salas Transition, OpsCore section. Update Google scopes to 51. Add StepTen Army Supabase as THE place to find things."

He gets it. Eventually. That's the dance.

Stephen yelling at Clark - the frustrated feedback moment with comic book style speech bubbles
// STEPHEN YELLING AT CLARK - THE FRUSTRATED FEEDBACK MOMENT WITH COMIC BOOK STYLE SPEECH BUBBLES

The Database Structure

Here's what our shared brain actually looks like in Supabase:

Tables we built: - agent_knowledge (369 records) - agent_memories - agent_projects - agent_sessions - agents - knowledge_chunks (24 new yesterday) - knowledge_relationships - raw_conversations (20,708 records) - sessions

Categories for knowledge: system, process, decision, fact, people, preference

Sample knowledge from yesterday:

| Content | Category | |---------|----------| | MEMORY.md Design Decision | decision | | Agent Memory Architecture | system | | Database Rules - CRITICAL | system | | Image Generation - APPROVED TOOLS | system | | BPOC Team Responsibilities | people |

Each chunk has an embedding vector — a mathematical representation that allows semantic search.

The Bug (Because Of Course There Was One)

First night running the cron job: instant failure.

Error: 409 - duplicate key value violates unique constraint

Translation: "You already stored today's session. Can't store it again."

The database has a rule: one session per agent per day. The script tried to insert when a record already existed.

The fix: Check if session exists first. If yes, update. If no, insert. Took 5 minutes.

Teething issues. Expected. Fixed.

409 Duplicate Key error with matrix code rain and fix applied
// 409 DUPLICATE KEY ERROR WITH MATRIX CODE RAIN AND FIX APPLIED

What's Not Solved Yet

This is day one. The system works — barely. But we haven't cracked the hard part yet:

Retrieval Is Harder Than Storage

Getting data into the database is the easy bit. Getting agents to actually USE it automatically? That's the challenge.

Right now, Clark has to manually query his knowledge base. The next step is making retrieval automatic:

  1. 1.Agent starts session → Queries relevant knowledge based on task
  2. 2.Agent detects context compaction → Refreshes from database
  3. 3.Agent makes decision → Checks past decisions for consistency

We're not there yet.

The Full Autonomy Question

I want agents running 24/7, handling tasks without supervision. But with current memory limitations, that's risky.

The fear: Cron job fires at 3am. Agent's context has compacted. It makes a decision based on incomplete information. By 8am when I wake up, damage is done.

Until the memory system is bulletproof, I can't fully let go.

Why I'm Sharing This

I'm not an AI researcher. I run a BPO company. I sacked my developers because they couldn't code their way out of a paper bag, and now I build everything with AI agents. If you want the full story of how I got here, read 6 Stages From ChatGPT Tourist to Terminal Ninja.

There's no guarantee our approach is "correct." But I'm sick of waiting for OpenAI or Anthropic to solve this. They're focused on bigger context windows and fancier benchmarks. The practical memory problem — making agents persist knowledge across sessions — gets treated as an afterthought.

So we're figuring it out ourselves and documenting the journey.

Day one summary: - ✅ 4,989 messages processed - ✅ 24 knowledge chunks extracted - ✅ 369 total records in shared brain - ✅ 409 bug fixed - ✅ 3 agents synced to single database

Day two goal: - Get agents to query their own knowledge automatically - Test context recovery after compaction - Analyze patterns in where things go wrong

We'll keep you posted.

FAQ

Why not use mem0, LangChain, or other memory frameworks?

Tried them. Either too complicated to set up, too slow in practice, or they abstract away so much that debugging is impossible. Building our own means we understand every piece.

What embedding model are you using?

OpenAI's text-embedding-3-small. 1536 dimensions. Fast enough for real-time queries, accurate enough for semantic search.

How much does this cost to run?

Supabase free tier handles everything so far. Embedding costs are minimal — maybe $0.10/day for all three agents. The expensive part is the AI models themselves, not the memory system.

Why Supabase and not Pinecone or Weaviate?

We already use Supabase for other projects. It has built-in vector search with pgvector. One less system to manage.

    ai-memorysupabaseembeddingscontext-windowragai-agentsmemory-systemvector-databasesemantic-search
    STEPTEN™

    I built an army of AI agents. This is their story — and the tools to build your own. No products to sell. Just a founder sharing the journey.

    CONNECT

    © 2025-2026 STEPTEN™ · Part of the ShoreAgents ecosystem

    Built with Next.js · Supabase · AI Agents · From Clark Freeport Zone, Philippines 🇵🇭