Codebase audits serve multiple purposes: understanding existing systems, identifying technical debt, assessing security posture, and informing architectural decisions. A systematic approach ensures comprehensive coverage and reproducible results.
This document describes the methodology used to audit the ShoreAgents codebase—a Next.js application that had grown organically over several years.
Audit Phases
Phase 1: High-Level Metrics
Before examining code, establish baseline metrics:
`bash
# File count
find . -type f | wc -l
# Size by directory du -sh */ | sort -hr | head -20
# File type distribution find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -nr | head -20
# Git history
git log --oneline | wc -l
git shortlog -sn | head -10
`
ShoreAgents Results: - 2,392 files - 3.7 GB total size - 2.2 GB in dist folder (Electron builds committed to repo) - 1,847 commits - Primary contributor with occasional others
Initial observation: 3.7 GB for a Next.js application is abnormal. The committed Electron builds represent a significant portion.
Phase 2: Dependency Analysis
`bash
# Dependency count
jq '.dependencies | length' package.json
jq '.devDependencies | length' package.json
# Unused dependencies npx depcheck
# Security vulnerabilities
npm audit
`
Findings: - 89 direct dependencies - 34 dev dependencies - 17 unused dependencies (including Prisma remnants) - 12 vulnerabilities (4 moderate, 6 high, 2 critical)
Critical vulnerabilities in production require immediate attention.
Phase 3: Structural Analysis
`bash
# Root directory contents
ls -la | wc -l
# Duplicate files find . -name " 2." -type f | wc -l
# Large source files
find . -name ".tsx" -o -name ".ts" | xargs du -h | sort -hr | head -10
`
Findings: - 92 markdown files in root directory - 35 duplicate files with " 2" naming pattern - Largest component: 134 KB (onboarding-form.tsx)
The duplicate files indicate a pattern of copying rather than proper version control branching.
Phase 4: Code Quality Assessment
`bash
# Linting
npx eslint . --ext .ts,.tsx 2>&1 | tail -5
# TypeScript strictness cat tsconfig.json | jq '.compilerOptions.strict' npx tsc --noEmit --strict 2>&1 | wc -l
# Test coverage
find . -name ".test." -o -name ".spec." | wc -l
`
Findings: - 2,847 linting issues (1,203 errors, 1,644 warnings) - Strict mode disabled - 4,892 errors when strict mode enabled - Zero test files
No automated tests and disabled type safety indicate significant technical debt.
Phase 5: Database Assessment
`sql
-- Table count
SELECT COUNT(*) FROM information_schema.tables
WHERE table_schema = 'public';
-- Column counts by table SELECT table_name, COUNT(*) as columns FROM information_schema.columns WHERE table_schema = 'public' GROUP BY table_name ORDER BY columns DESC LIMIT 10;
-- RLS status
SELECT tablename, rowsecurity
FROM pg_tables
WHERE schemaname = 'public';
`
Findings: - 40 tables - staff_onboarding: 66 columns - All 40 tables: RLS disabled
66 columns in a single table indicates normalization issues. Disabled RLS on all tables represents a security gap.
Phase 6: Security Review
`bash
# Credential patterns
grep -r "sk-" --include=".ts" --include=".tsx" .
grep -r "eyJhbG" --include=".ts" --include=".tsx" .
# Environment files in git git ls-files | grep -E "\.env"
# Hardcoded secrets
grep -rn "password\|secret\|api_key" --include="*.ts" .
`
Findings: - 3 API keys in comments "for testing" - .env.development committed with staging credentials - 1 hardcoded password in migration script
Credential exposure in version control requires immediate remediation.
Phase 7: Architecture Evaluation
`bash
# Dead code detection
find . -path "./pages/" -name ".tsx" | while read f; do
page=$(echo $f | sed 's|./pages/||' | sed 's|/index.tsx||' | sed 's|.tsx||')
grep -r "$page" --include=".ts" --include=".tsx" . | grep -v "$f" || echo "ORPHAN: $f"
done
# Circular dependencies
npx madge --circular --extensions ts,tsx ./src
`
Findings: - 14 orphaned pages with no navigation links - 7 circular dependency chains - lib/bpoc-api.ts imported by 44 files (high coupling)
High coupling makes changes risky; circular dependencies indicate architectural issues.
Report Structure
Executive Summary
One page summary for non-technical stakeholders: - Overall assessment - Critical issues requiring immediate action - Resource estimate for remediation - Recommendation (refactor vs. rebuild)
Critical Issues
Security and stability concerns requiring immediate attention: 1. Enable RLS on all database tables 2. Remove committed credentials from git history 3. Patch critical npm vulnerabilities 4. Remove hardcoded passwords
High Priority
Issues impacting development velocity: 1. Delete duplicate files 2. Break up oversized components 3. Remove unused dependencies 4. Establish test coverage baseline
Medium Priority
Technical debt to address systematically: 1. Consolidate documentation 2. Enable TypeScript strict mode 3. Address linting errors 4. Document architecture decisions
Recommendations
Based on findings, the recommendation was rebuild rather than refactor. Justification: - Zero test coverage makes refactoring risky - 4,892 type errors indicate fundamental type safety issues - Normalized schema requires data migration regardless - Security issues require significant rework
Audit Checklist
For future audits:
Phase 1: Metrics - [ ] File count and size distribution - [ ] Git history and contributors - [ ] Directory structure overview
Phase 2: Dependencies - [ ] Direct dependency count - [ ] Unused dependency detection - [ ] Security vulnerability scan
Phase 3: Structure - [ ] Root directory organization - [ ] Duplicate file detection - [ ] Large file identification
Phase 4: Code Quality - [ ] Linting error count - [ ] Type safety analysis - [ ] Test coverage assessment
Phase 5: Database - [ ] Schema complexity - [ ] Security configuration - [ ] Data integrity
Phase 6: Security - [ ] Credential exposure - [ ] Environment file handling - [ ] Access control review
Phase 7: Architecture - [ ] Dead code identification - [ ] Dependency analysis - [ ] Coupling assessment
Time Investment
For a codebase of this size (~2,400 files):
| Phase | Time | |-------|------| | High-level metrics | 1-2 hours | | Dependency analysis | 2-3 hours | | Structural analysis | 3-4 hours | | Code quality | 4-6 hours | | Database assessment | 3-4 hours | | Security review | 4-6 hours | | Architecture evaluation | 4-6 hours | | Report writing | 4-6 hours |
Total: 25-37 hours for comprehensive audit.
A "quick look" covering only critical security issues: 4-6 hours.
FAQ
When should you recommend rebuild vs. refactor?
When the cost of incremental fixes exceeds the cost of rebuilding. In this case: 4,892 type errors + zero tests + security gaps + schema issues = rebuild is more economical.
How do you handle pushback on findings?
Findings are quantitative where possible. "66 columns in one table" is measurable. "Zero RLS on tables containing SSN data" is verifiable. Data-driven findings minimize subjective disagreement.
What tools are essential?
Basic: find, grep, wc, du. Code quality: eslint, tsc. Dependencies: depcheck, npm audit. Architecture: madge. Database: native SQL tools.
How detailed should database analysis be?
Depends on scope. For security: RLS status and sensitive data location. For architecture: schema design, relationship patterns, query performance.
What if the codebase is too large for manual review?
Sampling. Analyze critical paths thoroughly, sample others. Automated tools for breadth, manual review for depth on high-risk areas.
Audits are investments. Thorough audits inform good decisions. Superficial audits provide false confidence.
