By Caber Team
We all know the story: you’re building out a project, tweaking code on your organically grown website. One day you prompt an AI:
“I need a function to do X.”
It generates something good. Then later:
“I need my function to do X + Y.”
Instead of improving the original function and updating all its references, the AI spins up a new function. Now you’ve got duplicated logic—some calls point to the old function, some to the new. Chaos creeps in.
In my own codebase I saw this pattern over and over. For example:
// Original helper
function processOrder(order) {
validate(order);
submit(order);
}
// Later request to add tracking
function processOrderWithTracking(order) {
validate(order);
submit(order);
track(order);
}
Half the code called processOrder
, half called processOrderWithTracking
.
Downstream references in invoice.js
still used the old function, while shipment.js
used the new one.
The AI had no visibility into the function graph to know it should merge logic and update references.
AI models are getting better fast. Switching between them is becoming less painful, and their raw capabilities feel similar. So how do you know which one to use?
You don’t—because the model isn’t the whole story.
The real question is:
👉 What data are you feeding it?
When you provide structure, even a little, answer quality skyrockets. For example, supplying a database schema instead of just dumping tables into a prompt improved one system’s answer quality from 72% to 87%—a 15% jump for minutes of extra work.
Example: Try #1 — Use AI to find unused and disconnected files in my codebase
My Prompt:
Identify all files that are disconnected from the marketing pages, authorization, deployment and dashboard. I'm looking for unused files that may be referenced in other files that themselves are disconnected from the core graph.
AI Response:
[...]
This cleanup could remove approximately **20-25% of unused code** from your components directory and simplify the codebase significantly.
Time: 3 minutes 12 seconds
Dumping unstructured data into a context window often fails to deliver the right results. You need to give your AI not just data, but data with context:
Think of it like giving your AI a map instead of a random list of street names.
Graph-based RAG (retrieval augmented generation) is a great start, but the trick is building relationships:
Here’s a snippet from my codebase that demonstrates the value of mapping out dependencies between files:
index.js -> utils/logger.js
index.js -> services/api.js
services/api.js -> utils/logger.js
services/api.js -> models/user.js
When you represent your code this way, the AI can see that both index.js
and services/api.js
rely on the same logger—so a change in logger.js
affects them both. Instead of duplicating logic, the AI could reason:
Modify
logger.js
once and update its usage everywhere.`
Example: Try #2 — Was I about to delete needed code?
Deterministically build a dependency graph
caber% time npm run analyze:json
> caber-systems-portal@0.1.0 analyze:json
> npx madge --json src/ > dependency-graph.json
npm run analyze:json 1.41s user 0.23s system 79% cpu 2.070 total
My Prompt:
Given the attached dependency graph, identify all files that are disconnected from the marketing pages, authorization, deployment and dashboard. I'm looking for unused files that may be referenced in other files that themselves are disconnected from the core graph.
AI Response:
[...]
This analysis shows your codebase is relatively clean with only ~8.5% disconnected files, mostly old pages and test files.
Time: 2 minutes 3 seconds + 2.070 seconds = 2 minutes 5 seconds ⬅️
Context comes from data elements and the patterns in their creation and use. Some patterns are standardized (a function calling another). Some are complex (like how weather patterns can affect call center volume). When you surface these patterns you’re giving AI clues about your data that it often won't uncover on its own.
In the examples above providing structured context (the dependency graph), saved us about 35% of the time while getting a more accurate result (8.5% disconnected files vs the original estimate of 20-25% unused code).
Interconnectedness in data is something we take for granted but rarely exploit. By unraveling those relationships and feeding them to the AI, you elevate it from a code generator or summarizer into a trustworthy partner that understands your data.
It’s not the model—it’s the data you give it.