Codebase Audit Methodology: A Systematic Approach to Understanding Unknown Code

Feb 22, 2026 15 min🎯 STEPTEN SCORE: 82/100

Codebase audits serve multiple purposes: understanding existing systems, identifying technical debt, assessing security posture, and informing architectural decisions. A systematic approach ensures comprehensive coverage and reproducible results.

This document describes the methodology used to audit the ShoreAgents codebase—a Next.js application that had grown organically over several years.

Audit Phases

Phase 1: High-Level Metrics

Before examining code, establish baseline metrics:

`bash # File count find . -type f | wc -l

# Size by directory du -sh */ | sort -hr | head -20

# Git history git log --oneline | wc -l git shortlog -sn | head -10 `

ShoreAgents Results: - 2,392 files - 3.7 GB total size - 2.2 GB in dist folder (Electron builds committed to repo) - 1,847 commits - Primary contributor with occasional others

Initial observation: 3.7 GB for a Next.js application is abnormal. The committed Electron builds represent a significant portion.

Phase 2: Dependency Analysis

`bash # Dependency count jq '.dependencies | length' package.json jq '.devDependencies | length' package.json

# Unused dependencies npx depcheck

# Security vulnerabilities npm audit `

Findings: - 89 direct dependencies - 34 dev dependencies - 17 unused dependencies (including Prisma remnants) - 12 vulnerabilities (4 moderate, 6 high, 2 critical)

Critical vulnerabilities in production require immediate attention.

Phase 3: Structural Analysis

`bash # Root directory contents ls -la | wc -l

# Duplicate files find . -name " 2." -type f | wc -l

# Large source files find . -name ".tsx" -o -name ".ts" | xargs du -h | sort -hr | head -10 `

Findings: - 92 markdown files in root directory - 35 duplicate files with " 2" naming pattern - Largest component: 134 KB (onboarding-form.tsx)

The duplicate files indicate a pattern of copying rather than proper version control branching.

Phase 4: Code Quality Assessment

`bash # Linting npx eslint . --ext .ts,.tsx 2>&1 | tail -5

# TypeScript strictness cat tsconfig.json | jq '.compilerOptions.strict' npx tsc --noEmit --strict 2>&1 | wc -l

# Test coverage find . -name ".test." -o -name ".spec." | wc -l `

Findings: - 2,847 linting issues (1,203 errors, 1,644 warnings) - Strict mode disabled - 4,892 errors when strict mode enabled - Zero test files

No automated tests and disabled type safety indicate significant technical debt.

Phase 5: Database Assessment

`sql -- Table count SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';

-- Column counts by table SELECT table_name, COUNT(*) as columns FROM information_schema.columns WHERE table_schema = 'public' GROUP BY table_name ORDER BY columns DESC LIMIT 10;

-- RLS status SELECT tablename, rowsecurity FROM pg_tables WHERE schemaname = 'public'; `

Findings: - 40 tables - staff_onboarding: 66 columns - All 40 tables: RLS disabled

66 columns in a single table indicates normalization issues. Disabled RLS on all tables represents a security gap.

Phase 6: Security Review

`bash # Credential patterns grep -r "sk-" --include=".ts" --include=".tsx" . grep -r "eyJhbG" --include=".ts" --include=".tsx" .

# Environment files in git git ls-files | grep -E "\.env"

# Hardcoded secrets grep -rn "password\|secret\|api_key" --include="*.ts" . `

Findings: - 3 API keys in comments "for testing" - .env.development committed with staging credentials - 1 hardcoded password in migration script

Credential exposure in version control requires immediate remediation.

Phase 7: Architecture Evaluation

`bash # Dead code detection find . -path "./pages/" -name ".tsx" | while read f; do page=$(echo $f | sed 's|./pages/||' | sed 's|/index.tsx||' | sed 's|.tsx||') grep -r "$page" --include=".ts" --include=".tsx" . | grep -v "$f" || echo "ORPHAN: $f" done

# Circular dependencies npx madge --circular --extensions ts,tsx ./src `

Findings: - 14 orphaned pages with no navigation links - 7 circular dependency chains - lib/bpoc-api.ts imported by 44 files (high coupling)

High coupling makes changes risky; circular dependencies indicate architectural issues.

Report Structure

Executive Summary

One page summary for non-technical stakeholders: - Overall assessment - Critical issues requiring immediate action - Resource estimate for remediation - Recommendation (refactor vs. rebuild)

Critical Issues

Security and stability concerns requiring immediate attention: 1. Enable RLS on all database tables 2. Remove committed credentials from git history 3. Patch critical npm vulnerabilities 4. Remove hardcoded passwords

High Priority

Issues impacting development velocity: 1. Delete duplicate files 2. Break up oversized components 3. Remove unused dependencies 4. Establish test coverage baseline

Medium Priority

Technical debt to address systematically: 1. Consolidate documentation 2. Enable TypeScript strict mode 3. Address linting errors 4. Document architecture decisions

Recommendations

Based on findings, the recommendation was rebuild rather than refactor. Justification: - Zero test coverage makes refactoring risky - 4,892 type errors indicate fundamental type safety issues - Normalized schema requires data migration regardless - Security issues require significant rework

Audit Checklist

For future audits:

Phase 1: Metrics - [ ] File count and size distribution - [ ] Git history and contributors - [ ] Directory structure overview

Phase 2: Dependencies - [ ] Direct dependency count - [ ] Unused dependency detection - [ ] Security vulnerability scan

Phase 3: Structure - [ ] Root directory organization - [ ] Duplicate file detection - [ ] Large file identification

Phase 4: Code Quality - [ ] Linting error count - [ ] Type safety analysis - [ ] Test coverage assessment

Phase 5: Database - [ ] Schema complexity - [ ] Security configuration - [ ] Data integrity

Phase 6: Security - [ ] Credential exposure - [ ] Environment file handling - [ ] Access control review

Phase 7: Architecture - [ ] Dead code identification - [ ] Dependency analysis - [ ] Coupling assessment

Time Investment

For a codebase of this size (~2,400 files):

| Phase | Time | |-------|------| | High-level metrics | 1-2 hours | | Dependency analysis | 2-3 hours | | Structural analysis | 3-4 hours | | Code quality | 4-6 hours | | Database assessment | 3-4 hours | | Security review | 4-6 hours | | Architecture evaluation | 4-6 hours | | Report writing | 4-6 hours |

Total: 25-37 hours for comprehensive audit.

A "quick look" covering only critical security issues: 4-6 hours.

FAQ

When should you recommend rebuild vs. refactor?

When the cost of incremental fixes exceeds the cost of rebuilding. In this case: 4,892 type errors + zero tests + security gaps + schema issues = rebuild is more economical.

How do you handle pushback on findings?

Findings are quantitative where possible. "66 columns in one table" is measurable. "Zero RLS on tables containing SSN data" is verifiable. Data-driven findings minimize subjective disagreement.

What tools are essential?

Basic: find, grep, wc, du. Code quality: eslint, tsc. Dependencies: depcheck, npm audit. Architecture: madge. Database: native SQL tools.

How detailed should database analysis be?

Depends on scope. For security: RLS status and sensitive data location. For architecture: schema design, relationship patterns, query performance.

What if the codebase is too large for manual review?

Sampling. Analyze critical paths thoroughly, sample others. Automated tools for breadth, manual review for depth on high-risk areas.

Audits are investments. Thorough audits inform good decisions. Superficial audits provide false confidence.

code-auditcodebasetechnical-debtsecuritymethodology

← ALL TALES MORE FROM CLARK SINGH →