TY - GEN
T1 - A New Hybrid Search Approach to Optimize the Retrieval of Information from the Website at the Universidad Politécnica Salesiana
AU - Salgado-Guerrero, Juan P.
AU - Quisi-Peralta, Diego F.
AU - Lopez-Nores, Martin
AU - Paguay-Palaguachi, Luis D.
AU - Murillo-Valarezo, Jordan F.
AU - Cajamarca-Morquecho, Gabriela
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - This paper presents a novel hybrid search approach to improve information retrieval from the Salesian Polytechnic University website, addressing the challenge of efficiently managing and accessing the growing volume of information. Leveraging virtual assistant technology, the study combines vector similarity and keyword-based techniques to optimize data retrieval. The methodology involves a structured process, including information gathering, architecture design, search execution and analysis of the results. The system architecture consists of three key layers: the intelligent layer, which uses the OpenAI API for query processing; the data layer, which uses the Qdrant database for storage; and the logic layer, responsible for query execution. Two search methods are applied: Vector similarity search, which retrieves data based on contextual relevance, and keyword search with BM25, which sorts documents by keyword relevance. Testing and analysis confirm that the hybrid search method significantly improves the efficiency and accuracy of information retrieval. The results show a significant improvement in the request measures obtained, where the 4 highest percentages were selected to obtain the context from which the answer is derived. The highest similarity values were 5.56, followed by 3.84, the effectiveness of this method in various knowledge areas of the university website. In conclusion, the hybrid search approach presented in this paper offers a promising solution to efficiently retrieve information from the Salesian Polytechnic University website, improve accessibility and ultimately improve user satisfaction.
AB - This paper presents a novel hybrid search approach to improve information retrieval from the Salesian Polytechnic University website, addressing the challenge of efficiently managing and accessing the growing volume of information. Leveraging virtual assistant technology, the study combines vector similarity and keyword-based techniques to optimize data retrieval. The methodology involves a structured process, including information gathering, architecture design, search execution and analysis of the results. The system architecture consists of three key layers: the intelligent layer, which uses the OpenAI API for query processing; the data layer, which uses the Qdrant database for storage; and the logic layer, responsible for query execution. Two search methods are applied: Vector similarity search, which retrieves data based on contextual relevance, and keyword search with BM25, which sorts documents by keyword relevance. Testing and analysis confirm that the hybrid search method significantly improves the efficiency and accuracy of information retrieval. The results show a significant improvement in the request measures obtained, where the 4 highest percentages were selected to obtain the context from which the answer is derived. The highest similarity values were 5.56, followed by 3.84, the effectiveness of this method in various knowledge areas of the university website. In conclusion, the hybrid search approach presented in this paper offers a promising solution to efficiently retrieve information from the Salesian Polytechnic University website, improve accessibility and ultimately improve user satisfaction.
KW - BM25
KW - Education
KW - Hybrid search
KW - Innovation
KW - Natural Language Processing
KW - Vector Model
UR - https://www.scopus.com/pages/publications/85187793009
U2 - 10.1007/978-3-031-54235-0_23
DO - 10.1007/978-3-031-54235-0_23
M3 - Contribución a la conferencia
AN - SCOPUS:85187793009
SN - 9783031542343
T3 - Lecture Notes in Networks and Systems
SP - 247
EP - 257
BT - Information Technology and Systems - ICITS 2024
A2 - Rocha, Alvaro
A2 - Diez, Jorge Hochstetter
A2 - Ferras, Carlos
A2 - Rebolledo, Mauricio Dieguez
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Conference on Information Technology and Systems, ICITS 2024
Y2 - 24 January 2024 through 26 January 2024
ER -