Beyond Vector Retrieval: Evaluating Graph-Enhanced RAG performance in aSystem Architecture Environment
2025 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Background. Retrieval-Augmented Generation systems have shown promise inenhancing Large Language Models (LLMs) by providing them with up to date andrelevant information. This capability is particularly important in business domainsusing confidential data which the models have not been trained on. For complexdomains such as software architecture simple document databases fail to capture thecomplex relationships between entities. This is why the thesis suggest the use ofa Graph database to represent these connections and improve the overall retrievalprocess.Objectives. The objectives of this thesis are to build and evaluate two RAGapproaches: DocumentRAG, which uses only keyword and vector similarity searchesto find relevant knowledge in a document database, and HybridRAG which combinesthis with a graph database consisting of nodes and edges detailing the overall systemarchitecture. Specifically, this thesis aims to compare the two implementations andanswer which performs better overall, on specific prompt categories and which of theused metrics are deemed practical and reliable.Methods. The two pipelines, DocumentRAG and HybridRAG as well as theautomated testing pipeline were developed. They both used the same LLMs (GPT-4o and o1) through the OpenAI API to ensure a fair comparison. A total of 49 testprompts which were all created by Subject Matter Experts (SMEs). Each being clas-sified with a different rating in categories like abstraction level, amount of reasoningneeded, number of datapoints needed and the overall complexity of the question.The methods were compared on the following metrics: Relevance, Accuracy, Com-pleteness, Context Recall and Faithfulness.Results. HybridRAG slightly outperformed DocumentRAG overall in all met-rics, although the differences were not statistically significant to a degree of p >=0.05. The largest improvements were in Faithfulness and Relevance where Hy-bridRAG had an advantage of 0.078 and 0.079 in mean scores respectively. Hy-bridRAG outperformed its counterpart most significantly when the prompt wasvaguely phrased, required deep technical knowledge or needed an especially largenumber of data points. Context recall remained low at about 0.13 for both im-plementations and did not show much difference depending on the type of question.Faithfulness on the other hand was high overall with 0.917 for HybridRAG and 0.840for DocumentRAG.Conclusions. While HybridRAG’s advantage was not statistically significant inany categories it maintained a advantage or at least matched DocumentRAG in allmeasurements. The difference in performance was especially clear in vague questionor complex questions. The metrics Accuracy, Completeness and Relevance werefound to be most practical in real world use due to their high trustworthiness andadherence with the overall system performance.
Place, publisher, year, edition, pages
2025.
Keywords [en]
Retrieval-Augmented Generation, Graph Database, Large Language Models, System Architecture Documentation, Knowledge Retrieval
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:bth-27950OAI: oai:DiVA.org:bth-27950DiVA, id: diva2:1962572
External cooperation
Ericsson
Subject / course
Degree Project in Master of Science in Engineering 30,0 hp
Educational program
DVAMI Master of Science in Engineering: AI and Machine Learning 300 hp
Presentation
2025-05-22, J1630, Karlskrona, 08:00 (English)
Supervisors
Examiners
2025-06-112025-05-312025-09-30Bibliographically approved