Automating Code Reviews with Context: A Retrieval-Augmented Method

Nguyen Thi Lan

Authors

Nguyen Thi Lan Hanoi School of Artificial Intelligence, VIETNAM

Keywords:

Automating Code Reviews, Context, Retrieval-Augmented

Abstract

Modern software development depends heavily on code review to maintain software quality, detect defects at an early stage, and ensure that coding standards are consistently followed across large and evolving projects. Although manual review remains one of the most reliable quality assurance practices, it is also time-consuming, cognitively demanding, and difficult to scale in contemporary development environments where teams continuously integrate and deploy new code. This study proposes an automated code review framework for Python 3.13 projects that combines Large Language Models with Retrieval-Augmented Generation (RAG) to generate more accurate, context-sensitive, and actionable review comments. The proposed framework first constructs a dataset from GitHub pull requests collected through the GitHub REST API (version 2022-11-28). Review comments are then organized into semantic categories using a semi-supervised Support Vector Machine (SVM) classifier. During inference, the system retrieves the most relevant historical comments from a vector database and provides them as contextual guidance for multiple open-weight language models, including DeepSeek-Coder-33B, Qwen2.5-Coder-32B, Codestral-22B, CodeLlama-13B, Mistral-Instruct-7B, and Phi-3-Mini. The performance of the framework is evaluated through a comprehensive validation methodology that combines conventional text-generation metrics such as BLEU-4, ROUGE-L, and cosine similarity with semantic evaluation using an LLM-as-a-Judge approach and expert human assessment. Experimental findings demonstrate that retrieval augmentation significantly improves review quality for larger models. DeepSeek-Coder, for instance, achieved a 17.9% improvement in alignment score when retrieval depth was set to k = 3. Smaller models such as Phi-3-Mini, however, experienced context collapse when excessive contextual information reduced the quality of generated feedback. To address this limitation, a hybrid expert-routing mechanism was developed to dynamically assign review tasks to the most suitable model according to semantic category. The proposed system achieved an overall improvement of 13.2% over zero-shot baselines while also reducing hallucinations and generating review comments that closely resemble expert human feedback.

Automating Code Reviews with Context: A Retrieval-Augmented Method

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section