LLM-BASED EXTERNAL CONTROL: EMPIRICAL EVALUATION OF PRUME AI
Keywords:
Public Audit, Artificial Intelligence, Language Models, RAG, Provenance (PROV), Explainability, ComplianceAbstract
This article presents and evaluates PRUMe AI, an audit platform assisted by Language Models anchored in RAG and PROV tracks, applied to typical external control documents. Combining Design Science Research and a case study with a real sample from TCE-AM (150 documents including bids, contracts/addenda, and reports/opinions; 55% native PDFs, 45% scanned), PRUMe AI performs screening, extraction, compliance checks, and explainable reporting with structured outputs and provenance records. The results indicate material gains: average screening time from 21.4 to 7.9 min/doc (-63%) and total analysis from 39.2 to 17.8 min/doc (-55%); coverage per cycle from 25% (manual process) to 82%. In a subset annotated by experts (n=20), we obtained F1=0.86 (contract fields) and F1=0.82 (clauses), with precision@k=0.91 in the prioritization of “points of attention.” In RAG-anchored checks, 94% of findings included textual citations; the average reliability was 0.88 and inter-rater agreement reached k=0.78. PROV trails covered 96% of decisions and repetition reproduced 92% of results. We discuss limitations (OCR/layout quality, missing metadata, ambiguous wording, and curation of the normative collection) and propose an evolution agenda (document pipeline optimization, knowledge governance for RAG, and training). We conclude that PRUMe AI offers a replicable path to increasing efficiency, coverage, and standardization with transparency and auditability in external control.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Alessandro de Souza Bezerra, Luciane Cavalcante Lopes

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.