Adaptive Retrieval Architectures for Agentic Software Engineering

An applied research capstone examining the limits of knowledge graphs, adaptive feedback, and precomputed context in agentic software engineering workflows.
Date
Spring 2026
Blog
Link
Deck
Link
Final
Link
Demo 1
Link
Demo 2
Link
Demo 3
Link
Miro
Link
Venn diagram of the main ideas from the project

Adaptive Retrieval Architectures for Agentic Software Engineering

The "REBOOT" capstone project, undertaken by students in the Computer Science and Engineering program at Ohio State University, explores how AI coding agents can retrieve more relevant context from large and evolving codebases. Sponsored by 99P Labs and Honda Research Institute USA, the project addresses a growing challenge in AI-assisted software development: traditional retrieval systems often rely on static context pipelines that can become stale or misaligned as code changes. The team set out to investigate whether adaptive retrieval, knowledge graph-based code representations, and feedback-driven ranking could improve how agents understand and navigate complex repositories.

In the first phase, the team focused on research, system design, and technical setup. They explored open-source tools for agentic workflows and context management, including OpenCode, Graphiti, Neo4j, and tree-sitter. This phase established the foundation for REBOOT, short for Retrieval Enhancement Based On Observed Trends. The team designed an architecture that could parse codebases, extract meaningful code units such as functions and classes, and store that information in a graph structure. By using Graphiti on top of Neo4j, the system aimed to represent code context through relationships, dependencies, and semantic connections that could later be queried by an AI agent.

The second phase involved building the core middleware and ingestion pipeline. The team developed REBOOT as an MCP-compatible service, allowing coding agents such as Claude Code, Cursor, Windsurf, or OpenCode to connect through standardized tool calls. The system included endpoints for repository ingestion, code search, retrieval explanation, and feedback submission. Tree-sitter was used to parse source files into sub-file units, while the middleware handled query classification, search configuration, confidence-ranked retrieval, and result explanation. This phase transformed the project from a research concept into a working prototype that could ingest repositories, search graph-backed code context, and expose retrieval decisions in a transparent way.

In the third phase, the team added adaptive feedback and evaluation capabilities. REBOOT introduced a confidence mechanism that could reinforce useful retrieval results, reduce confidence for less useful results, and decay confidence over time. Positive and negative feedback signals were stored in SQLite and used to influence future rankings without retraining the underlying language model. The team also built an evaluation harness using SWE-bench-derived issues, generated retrieval queries, and LLM-as-judge scoring. This allowed the team to measure retrieval quality using metrics such as Precision@K, Mean Reciprocal Rank, and judge-based relevance scores. The evaluation framework became one of the project’s most valuable outputs because it gave the team a way to test the system against a simpler baseline.

The final results showed that the knowledge graph approach did not outperform a baseline exploration agent using common developer tools such as listing files, grep, and reading source files. This finding was significant. The project revealed that applying existing knowledge graph systems directly to raw code can introduce noise, especially when the graph captures low-value relationships or surfaces conceptual links instead of the concrete code structures needed for software engineering tasks. The team also found that semantic search is not always well suited for code retrieval, where exact symbols, filenames, and implementation details often matter more than broad conceptual similarity. In addition, the need to ingest repositories before query time created cost and speed challenges for larger codebases.

Throughout the project, the team used a modern technical stack including Python, FastAPI, Graphiti, Neo4j, tree-sitter, SQLite, Docker, GitHub, and MCP-compatible agent tooling. They also delivered a working MVP, documentation, graph visualization, an evaluation harness, and a final technical write-up. While the project did not prove that knowledge graphs are the best solution for code retrieval in this setting, it produced a clear and valuable research outcome. REBOOT demonstrated that agentic code retrieval systems must be evaluated against simple baselines early, that graph structure must be designed around the specific relationships that matter for code understanding, and that retrieval quality depends as much on practical search behavior as on the sophistication of the architecture. The project offers a strong foundation for future research into adaptive context systems, multi-repository knowledge bases, richer feedback signals, and more rigorous evaluation methods for AI-assisted software development.

Stay Connected

Follow our journey on Medium and LinkedIn.