Bibliometric analysis of the scientific production on Data Lake
Main Article Content
Abstract
This paper develops a bibliometric analysis of the scientific production that contextualizes the Data Lake current. In this sense, data lakes are infrastructures for storage and management of large volumes of data from various sources, with the intention of facilitating their access, analysis and sharing. The objective of this article is to show a quantitative view of the scientific production of the subject between 2018 and 2022, to understand the current state of research, identify trends and emerging research areas, evaluate the impact and promote collaboration among researchers. The methodology allowed carrying out a systematic review of the literature through a descriptive retrospective analysis, using the Scopus database as a source of information, which reflected 73 key articles. In this context, the results highlight the interest on Data Lake from the number of article publications per year to the top of main authors, keywords and journals with respect to scientific production. Thus, they denote the importance and preferences in research on this topic relevant to various fields or areas.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
COPYRIGHT NOTICE
Authors who publish in the INNOVA Research Journal keeps copyright and guarantee the journal the right to be the first publication of the work under the Creative Commons License, Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0). They can be copied, used, disseminated, transmitted and publicly exhibited, provided that: a) the authorship and original source of their publication (magazine, publisher, URL and DOI of the work) is cited; b) are not used for commercial purposes; c) the existence and specifications of this license of use are mentioned.
References
Agudelo Patiño, J. C. (2020). Data lakes: aplicaciones, herramientas y arquitecturas. Colombia: Universidad Tecnológica de Pereira [Trabajo de Grado, Universidad Tecnológica de Pereira, Colombia]. https://bit.ly/49CWmUB
Balseca-Chávez, F., Colina-Vargas, A. M., y Espinoza-Mina, M. A. (2021). Identificación de amenazas informáticas aplicando arquitecturas de Big Data. INNOVA Research Journal, 6(3), 141-167. https://revistas.uide.edu.ec/index.php/innova/article/view/1860/1953
Escudero, C., y Cortez, L. (2018). Técnicas y métodos cualitativos para la investigación científica. Editorial Utmach. https://bit.ly/3I5ZGvd
Goyal, P., & Malviya, R. (2023). Challenges and opportunities of big data analytics in healthcare. Health Care Science, 2(6), 1-11.
Grossman, R. (2019). Data lakes, clouds, and commons: a review of platforms for analyzing and sharing genomic data. Trends in Genetics, 35(3), 223-234. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6474403/
Guix, J. (2018). El análisis de contenidos: ¿qué nos están diciendo?. Calidad Asistencial, 23(1), 26-30. https://www.elsevier.es/es-revista-revista-calidad-asistencial-256-articulo-el-analisis-contenidos-que-nos-S1134282X08704640
Gül, M., & Ayik, Z. (2023). Enrichment studies in gifted education: a bibliometric analysis with RStudio. Participatory Educational Research, 10(3), 266-284. https://dergipark.org.tr/en/pub/per/issue/76200/1257077
Hernández, E., Duque, N., y Moreno, J. (2017). Big data: una exploración de investigaciones, tecnologías y casos de aplicación. Tecnológicas, 20(39), 1-24. https://revistas.itm.edu.co/index.php/tecnologicas/article/view/685/671
Hernández, M., y Lasso, E. (2021). Revisión bibliográfica de las investigaciones realizadas sobre páramos en las últimas cinco décadas [Trabajo de Grado, Universidad de los Andes, Colombia].
Hernández, R., Fernández, C., y Baptista, M. (2019). Metodología de la investigación. México: McGraw-Hill.
Jarke, M., Lenzerini, M., Vassiliou, Y., & Vassiliadis, P. (2013). Fundamentals of Data Warehouses. Sigmod Record, 32(2), 55-56.
Kimball, R., & Ross, M. (2013). The data warehouse toolkit. Toronto: John Wiley & Sons.
Kitchenham. (2007). Guidelines for performing Systematic Literature Reviews in. Inglaterra: University of Durham. https://bit.ly/3UQP3Uu
López, V., Amado, A., y Miotto, U. (2022). Un enfoque bibliométrico a los procedimientos gráficos como método de investigación. Expresión Gráfica Arquitectónica, 27(45), 218-231. https://polipapers.upv.es/index.php/EGA/article/view/16451
Lorenzo, P., y López, G. (2022). Análisis, diseño e implementación de una arquitectura de servicios cloud para un lago de datos en el ambito turístico [Trabajo de Grado, Universidad de La Coruña, España]. https://ruc.udc.es/dspace/handle/2183/32125
Madera, C., & Laurent, A. (2019). The next information architecture evolution: the data Lake wave. Management of Digital EcoSystems, 6(9), 1-8. https://hal-lirmm.ccsd.c nrs.fr/lirmm-01399005/document
Moreno, B., Muñoz, M., Cuellar, J., Domancic, S., y Villanueva, J. (2018). Revisiones Sistemáticas: definición y nociones básicas. Clínica de Periodoncia, Implantología y Rehabilitación Oral, 11(3), 184-186.
Oleo, C., y Said, E. (2020). La producción científica en el estudio de experiencia de usuario en la educación: caso Web of Science y Scopus. Perspectiva, 32(7), 1-7. https://humanas.blog.scielo.org/es/2020/05/21/la-produccion-cientifica-en-el-estudio-de-experiencia-de-usuario-en-la-educacion-caso-web-of-science-y-scopus/
Pasupuleti, P., & Purra, B. (2017). Data lake development with big data. Reino Unido: Packt Publishing. https://www.packtpub.com/product/data-lake-development-with-big-data/9781785888083
Perilla, R., Orjuela, W., y Parra, C. (2020). Análisis de futuro: algunos métodos alternativos a la caja de herramientas de la prospectiva francesa. Colombia: Universidad del Tolima. https://bit.ly/3wphmzj
Rawat, D., Doku, R., & Garuba, M. (2019). Cybersecurity in big data era: from securing big data to data-driven security. Transactions on Services Computing, 20(1), 1-18. https://ieeexplore.ieee.org/ielaam/4629386/9642441/8673585-aam.pdf
Reinsel, D., Gantz, J., & Rydning, J. (2020). The digitization of the world from edge to core. Estados Unidos: Seagate. https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
Research.com. (2023). Best Journals - Computer Science - IEEE Access. https://research.com/journal/ieee-access
Research.com. (2023). Best Journals - Computer Science - International Journal of Advanced Computer Science and Applications.
Research.com. (2023). Best Journals - Computer Science - Proceedings of the VLDB Endowment. https://research.com/journal/proceedings-of-the-vldb-endowment-1
Rico, D., Maestre, G., Medina, Y., y Areniz, Y. (2021). Universidad inteligente: factores claves para la adopción de internet de las cosas y big data. Ibérica de Sistemas y Tecnologías de Información, 4(1), 63-79.
Romero, A., y Melendres, J. (2023). Uso de data warehouse para la toma de decisiones empresariales: una revisión literaria. Científica de Sistemas e Informática, 3(2), 1-12. https://bit.ly/3uxrF3T
Sakr, S., & Gaber, M. (2019). Large scale and big data: processing and management. Estados Unidos: Auerbach Publications. https://www.routledge.com/Large-Scale-and-Big-Data-Processing-and-Management/Sakr-Gaber/p/book/9781138033948
Shehab, N., Badawy, M., & Arafat, H. (2020). Big Data Analytics Concepts, Technologies Challenges, and Opportunities. Advanced Intelligent Systems and Informatics, 11(14), 92-101. https://www.researchgate.net/publication/336219391_Big_Data_Analyt ics_Concepts_Technologies_Challenges_and_Opportunities
Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. Journal of Business Research, 10(4), 333-339. https://www.sciencedirect.c om/science/article/pii/S0148296319304564
Solano, E., Castellanos, S., López, M., y Hernández, J. (2019). La bibliometría: una herramienta eficaz para evaluar la actividad científica postgraduada. MediSur, 7(4), 59-62. http://scielo.sld.cu/pdf/ms/v7n4/v7n4a745.pdf
Villasís, M., Rendón, M., García, H., Miranda, M., y Escamilla, A. (2020). La revisión sistemática y el metaanálisis como herramientas de apoyo para la clínica y la investigación. Alergia México, 67(1), 62-72. https://www.scielo.org.mx/pdf/ram/v67n1/2448-9190-ram-67-01-62.pdf
Wieder, P., & Nolte, H. (2022). Toward data lakes as central building blocks for data management and analysis. Frontiers in Big Data, 16(2), 1-18. https://www.researchgate.net /publication/362793690_Toward_data_lakes_as_central_building_blocks_for_data_management_and_analysis