Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels

Valles Coral, Miguel Angel; Salazar Ramírez, Luis; Injante Ore, Richard Enrique; Hernandez Torres, Edwin Augusto; Juárez Díaz, Juan; Navarro Cabrera, Jorge Raul; Pinedo Tuanama, Lloy Pool; Vidaurre Rojas, Pierre

DSpace CRIS

DSpace-CRIS consists of a data model describing objects of interest to Research and Development and a set of tools to manage the data. Standard DSpace is used to deal with publications and data sets, whereas DSpace-CRIS involves other CRIS entities: Researcher Pages, Projects, Organization Units and Second Level Dynamic Objects (single entities specialized by a profile, such as Journal, Prize, Event etc; because any profile can define its own set of properties and nested objects)

Learn More

Please use this identifier to cite or link to this item: http://hdl.handle.net/11458/4922

Full metadata record

DC Field	Value	Language
dc.contributor.author	Valles Coral, Miguel Angel	es_PE
dc.contributor.author	Salazar Ramírez, Luis	es_PE
dc.contributor.author	Injante Ore, Richard Enrique	es_PE
dc.contributor.author	Hernandez Torres, Edwin Augusto	es_PE
dc.contributor.author	Juárez Díaz, Juan	es_PE
dc.contributor.author	Navarro Cabrera, Jorge Raul	es_PE
dc.contributor.author	Pinedo Tuanama, Lloy Pool	es_PE
dc.contributor.author	Vidaurre Rojas, Pierre	es_PE
dc.date.accessioned	2023-04-17T14:38:28Z	-
dc.date.available	2023-04-17T14:38:28Z	-
dc.date.issued	2022-11	-
dc.identifier.uri	http://hdl.handle.net/11458/4922	-
dc.description.abstract	Compliance with the basic conditions of quality in higher education implies the design of strategies to reduce student dropout, and Information and Communication Technologies (ICT) in the educational field have allowed directing, reinforcing, and consolidating the process of professional academic training. We propose an academic and emotional tracking model that uses data mining and machine learning to group university students according to their level of dropout risk. We worked with 670 students from a Peruvian public university, applied 5 valid and reliable psychological assessment questionnaires to them using a chatbot-based system, and then classified them using 3 density-based unsupervised learning algorithms, DBSCAN, K-Means, and HDBSCAN. The results showed that HDBSCAN was the most robust option, obtaining better validity levels in two of the three internal indices evaluated, where the performance of the Silhouette index was 0.6823, the performance of the Davies–Bouldin index was 0.6563, and the performance of the Calinski–Harabasz index was 369.6459. The best number of clusters produced by the internal indices was five. For the validation of external indices, with answers from mental health professionals, we obtained a high level of precision in the F-measure: 90.9%, purity: 94.5%, V-measure: 86.9%, and ARI: 86.5%, and this indicates the robustness of the proposed model that allows us to categorize university students into five levels according to the risk of dropping out.	es_PE
dc.description.abstract	El cumplimiento de las condiciones básicas de calidad en la educación superior implica el diseño de estrategias para disminuir la deserción estudiantil, y las Tecnologías de la Información y la Comunicación (TIC) en el ámbito educativo han permitido orientar, reforzar y consolidar el proceso de formación académica profesional. Proponemos un modelo de seguimiento académico y emocional que utiliza minería de datos y aprendizaje automático para agrupar a los estudiantes universitarios según su nivel de riesgo de abandono. Trabajamos con 670 estudiantes de una universidad pública peruana, les aplicamos 5 cuestionarios de evaluación psicológica válidos y confiables usando un sistema basado en chatbot y luego los clasificamos usando 3 algoritmos de aprendizaje no supervisado basados en densidad, DBSCAN, K-Means y HDBSCAN. Los resultados mostraron que HDBSCAN era la opción más robusta, obteniendo mejores niveles de validez en dos de los tres índices internos evaluados, donde el rendimiento del índice de Silhouette fue de 0,6823, el rendimiento del índice de Davies-Bouldin fue de 0,6563 y el rendimiento del índice de Calinski-Harabasz fue de 369,6459. El mejor número de conglomerados producidos por los índices internos fue cinco. Para la validación de índices externos, con respuestas de profesionales de la salud mental, obtuvimos un alto nivel de precisión en la medida F: 90,9%, pureza: 94,5%, medida V: 86,9% y ARI: 86,5%, y esto indica la robustez del modelo propuesto que permite categorizar a los estudiantes universitarios en cinco niveles según el riesgo de deserción.	es_PE
dc.format	application/pdf	es_PE
dc.language.iso	eng	es_PE
dc.rights	info:eu-repo/semantics/openAccess	es_PE
dc.rights.uri	CC BY	es_PE
dc.subject	Agrupamiento	es_PE
dc.subject	DBSCAN	es_PE
dc.subject	HDBSCAN	es_PE
dc.subject	K-medias	es_PE
dc.subject	Minería de datos	es_PE
dc.title	Density-Based Unsupervised Learning Algorithm to Categorize College Students into Dropout Risk Levels	es_PE
dc.title.alternative	Algoritmo de aprendizaje no supervisado basado en la densidad para categorizar a los estudiantes universitarios en niveles de riesgo de abandono escolar	es_PE
dc.type	info:eu-repo/semantics/article	es_PE
dc.identifier.doi	10.3390/data7110165	es_PE
dc.type.version	info:eu-repo/semantics/publishedVersion	es_PE
dc.publisher.country	CH	es_PE
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#1.02.00	es_PE
item.fulltext	With Fulltext	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.grantfulltext	open	-
item.cerifentitytype	Publications	-
item.openairetype	info:eu-repo/semantics/article	-
Appears in Collections:	Scopus

Files in This Item:

File	Description	Size	Format
Artículo científico.pdf		2.68 MB	Adobe PDF	View/Open

Show simple item record

Google Scholar^TM

Check

DSpace CRIS

Files in This Item:

Google ScholarTM

Altmetric

Google Scholar^TM