Content Engine Database Schema: 10 Tables for Scalable Content Operations

Feb 22, 2026 13 min🎯 STEPTEN SCORE: 80/100

Content at scale requires systematic management. The StepTen content engine generates and publishes hundreds of articles targeting specific keywords. The database schema must support the full content lifecycle: research, production, optimization, publishing, and performance tracking.

This document describes the 10-table schema powering the content engine.

Schema Overview

The schema supports: - Content storage and versioning - Pipeline stage tracking - Semantic search via embeddings - SEO keyword management - Internal linking optimization - Performance analytics

Core Tables

content

Primary content storage:

`sql CREATE TABLE content ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title TEXT NOT NULL, slug TEXT UNIQUE NOT NULL, body TEXT, excerpt TEXT, status TEXT DEFAULT 'draft', content_type TEXT DEFAULT 'article', author_id UUID, silo TEXT, primary_keyword TEXT, word_count INTEGER GENERATED ALWAYS AS ( array_length(regexp_split_to_array(COALESCE(body, ''), '\s+'), 1) ) STORED, reading_time INTEGER GENERATED ALWAYS AS ( CEIL(COALESCE(array_length(regexp_split_to_array(COALESCE(body, ''), '\s+'), 1), 0) / 200.0) ) STORED, published_at TIMESTAMPTZ, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() );

CREATE INDEX idx_content_status ON content(status); CREATE INDEX idx_content_silo ON content(silo); CREATE INDEX idx_content_slug ON content(slug); CREATE INDEX idx_content_published ON content(published_at DESC); `

Key design decisions: - Generated columns for word_count and reading_time eliminate manual calculation - Silo enables topic clustering for SEO - Status tracks lifecycle stage - Content_type distinguishes pillars from regular articles

content_queue

Pipeline tracking:

`sql CREATE TABLE content_queue ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content_id UUID REFERENCES content(id) ON DELETE CASCADE, stage TEXT NOT NULL, started_at TIMESTAMPTZ DEFAULT NOW(), completed_at TIMESTAMPTZ, assigned_to TEXT, notes TEXT, attempts INTEGER DEFAULT 1, UNIQUE(content_id, stage) );

CREATE INDEX idx_queue_content ON content_queue(content_id); CREATE INDEX idx_queue_stage ON content_queue(stage); `

Stages: research → outline → draft → review → optimize → publish

The unique constraint ensures each content piece passes through each stage exactly once.

Search and Intelligence

content_embeddings

Semantic similarity for related content:

`sql CREATE TABLE content_embeddings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content_id UUID UNIQUE REFERENCES content(id) ON DELETE CASCADE, embedding vector(1536), model TEXT DEFAULT 'text-embedding-3-small', created_at TIMESTAMPTZ DEFAULT NOW() );

CREATE INDEX idx_embeddings_vector ON content_embeddings USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); `

Enables queries like "find articles similar to this one" for internal linking suggestions.

keyword_clusters

SEO keyword tracking:

`sql CREATE TABLE keyword_clusters ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), primary_keyword TEXT UNIQUE NOT NULL, cluster_name TEXT, search_volume INTEGER, difficulty INTEGER, intent TEXT, priority INTEGER DEFAULT 50, assigned_content_id UUID REFERENCES content(id), created_at TIMESTAMPTZ DEFAULT NOW() );

CREATE INDEX idx_keywords_assigned ON keyword_clusters(assigned_content_id); `

Intent values: informational, commercial, transactional

Links keywords to content when assigned, enabling gap analysis.

industry_relationships

Content expansion mapping:

`sql CREATE TABLE industry_relationships ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), industry TEXT UNIQUE NOT NULL, related_industries TEXT[], priority INTEGER DEFAULT 50, notes TEXT ); `

When generating content for "Accountant for Real Estate," this table suggests related angles: property management, mortgage, construction.

Linking and Optimization

article_links

Internal linking graph:

`sql CREATE TABLE article_links ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), source_content_id UUID REFERENCES content(id) ON DELETE CASCADE, target_content_id UUID REFERENCES content(id) ON DELETE CASCADE, anchor_text TEXT NOT NULL, context TEXT, link_type TEXT DEFAULT 'internal', created_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE(source_content_id, target_content_id, anchor_text) );

CREATE INDEX idx_links_source ON article_links(source_content_id); CREATE INDEX idx_links_target ON article_links(target_content_id); `

Tracks all internal links for SEO analysis and orphan page detection.

anchor_text_usage

Anchor text diversity tracking:

`sql CREATE TABLE anchor_text_usage ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), anchor_text TEXT UNIQUE NOT NULL, usage_count INTEGER DEFAULT 0, last_used_at TIMESTAMPTZ, target_content_id UUID REFERENCES content(id) );

-- Auto-update on link creation CREATE OR REPLACE FUNCTION update_anchor_usage() RETURNS TRIGGER AS $$ BEGIN INSERT INTO anchor_text_usage (anchor_text, usage_count, last_used_at, target_content_id) VALUES (NEW.anchor_text, 1, NOW(), NEW.target_content_id) ON CONFLICT (anchor_text) DO UPDATE SET usage_count = anchor_text_usage.usage_count + 1, last_used_at = NOW(); RETURN NEW; END; $$ LANGUAGE plpgsql;

CREATE TRIGGER track_anchor_usage AFTER INSERT ON article_links FOR EACH ROW EXECUTE FUNCTION update_anchor_usage(); `

Prevents anchor text over-optimization by tracking usage frequency.

Analytics

content_analytics

Performance tracking:

`sql CREATE TABLE content_analytics ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content_id UUID REFERENCES content(id) ON DELETE CASCADE, views INTEGER DEFAULT 0, unique_visitors INTEGER DEFAULT 0, clicks INTEGER DEFAULT 0, time_on_page INTEGER, bounce_rate DECIMAL(5,2), scroll_depth DECIMAL(5,2), recorded_at DATE NOT NULL, UNIQUE(content_id, recorded_at) );

CREATE INDEX idx_analytics_content ON content_analytics(content_id); CREATE INDEX idx_analytics_date ON content_analytics(recorded_at DESC); `

Daily snapshots enable trend analysis.

content_rankings

Keyword position tracking:

`sql CREATE TABLE content_rankings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), content_id UUID REFERENCES content(id) ON DELETE CASCADE, keyword TEXT NOT NULL, position INTEGER, previous_position INTEGER, url TEXT, recorded_at DATE NOT NULL, UNIQUE(content_id, keyword, recorded_at) );

CREATE INDEX idx_rankings_content ON content_rankings(content_id); CREATE INDEX idx_rankings_date ON content_rankings(recorded_at DESC); `

Tracks position changes over time for SEO reporting.

Utility Tables

saved_views

User dashboard preferences:

`sql CREATE TABLE saved_views ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID, name TEXT NOT NULL, filters JSONB NOT NULL, is_default BOOLEAN DEFAULT false, created_at TIMESTAMPTZ DEFAULT NOW() ); `

Stores filter configurations for content dashboard views.

Useful Views

Pipeline Status

`sql CREATE VIEW pipeline_status AS SELECT stage, COUNT(*) FILTER (WHERE completed_at IS NULL) AS in_progress, COUNT(*) FILTER (WHERE completed_at IS NOT NULL) AS completed, AVG(EXTRACT(EPOCH FROM (completed_at - started_at))/3600) FILTER (WHERE completed_at IS NOT NULL) AS avg_hours FROM content_queue GROUP BY stage; `

Orphan Detection

`sql CREATE VIEW orphan_pages AS SELECT c.id, c.title, c.slug, c.published_at FROM content c WHERE c.status = 'published' AND NOT EXISTS ( SELECT 1 FROM article_links al WHERE al.target_content_id = c.id ) ORDER BY c.published_at DESC; `

Link Statistics

`sql CREATE VIEW link_stats AS SELECT c.id, c.title, COUNT(DISTINCT outl.target_content_id) AS outbound_links, COUNT(DISTINCT inl.source_content_id) AS inbound_links FROM content c LEFT JOIN article_links outl ON c.id = outl.source_content_id LEFT JOIN article_links inl ON c.id = inl.target_content_id WHERE c.status = 'published' GROUP BY c.id, c.title; `

RPC Functions

Hierarchy Visualization

`sql CREATE OR REPLACE FUNCTION get_content_hierarchy() RETURNS TABLE ( id UUID, title TEXT, content_type TEXT, inbound_count BIGINT, outbound_count BIGINT, is_orphan BOOLEAN ) AS $$ SELECT c.id, c.title, c.content_type, COUNT(DISTINCT inl.source_content_id) AS inbound_count, COUNT(DISTINCT outl.target_content_id) AS outbound_count, COUNT(DISTINCT inl.source_content_id) = 0 AS is_orphan FROM content c LEFT JOIN article_links inl ON c.id = inl.target_content_id LEFT JOIN article_links outl ON c.id = outl.source_content_id WHERE c.status = 'published' GROUP BY c.id, c.title, c.content_type ORDER BY inbound_count DESC; $$ LANGUAGE SQL; `

Data Relationships

` keyword_clusters ──────────────────────────┐ │ │ ▼ (assigned_content_id) │ content ◄────────────────────────────┤ │ │ ┌────┼────┬────────────┬───────────┐ │ ▼ ▼ ▼ ▼ ▼ │ queue embed analytics rankings links │ │ │ └───┼── anchor_text_usage │ industry_relationships (reference) `

FAQ

Why separate tables for links and anchor text tracking?

Anchor text frequency needs deduplication across all content. Storing it only with links would require expensive aggregation queries. The separate table with a counter provides O(1) lookups.

How is the analytics data populated?

Daily scheduled job syncs from Google Analytics and Search Console. The recorded_at date key ensures historical data isn't overwritten.

What's the query performance at scale?

Current data (~1,000 articles): <10ms for indexed queries. The embedding search is ~8ms. Schema supports 10× growth before optimization becomes necessary.

Why PostgreSQL instead of a headless CMS?

Control and flexibility. Custom fields, embeddings, link graphs, and pipeline stages require capabilities beyond standard CMS offerings. The schema is tailored to this specific workflow.

Content at scale is a data problem. The schema defines what's possible.

databasecontent-engineschemapostgresqlsupabaseseo

← ALL TALES MORE FROM CLARK SINGH →