This work proposes the integration of technologies, such as artificial intelligence and data analysis, with learning management systems to improve learning. This objective is outlined in new normality that seeks robust educational models, where certain activities are carried out in an online mode, surrounded by technologies that allow students to have virtual assistants to guide them in their learning.
1. applied sciences Improvement of an Online Education Model with the Integration of Machine Learning and Data Analysis in an LMS William Villegas-Ch 1, * , Milton Román-Cañizares 1 and Xavier Palacios-Pacheco 2 1 Escuela de Ingeniería en Tecnologías de la Información, FICA, Universidad de Las Américas, Quito 170125, Ecuador; milton.roman@udla.edu.ec 2 Departamento de Sistemas, Universidad Internacional del Ecuador, Quito 170411, Ecuador; xpalacio@uide.edu.ec * Correspondence: william.villegas@udla.edu.ec; Tel.: +593-098-136-4068
Received: 28 June 2020; Accepted: 22 July 2020; Published: 4 August 2020 Abstract: The events that took place in the year 2020 have shown us that society is still fragile and that it is exposed to events that rapidly change the paradigms that govern it. This has been shown by a pandemic like Coronavirus disease 2019; this global emergency has changed the way people interact, communicate, study, or work. In short, the way in which society carries out all activities has changed. This includes education, which has bet on the use of information and communication technologies to reach students. An example of the aforementioned is the use of learning management systems, which have become ideal environments for resource management and the development of activities. This work proposes the integration of technologies, such as artificial intelligence and data analysis, with learning management systems in order to improve learning. This objective is outlined in a new normality that seeks robust educational models, where certain activities are carried out in an online mode, surrounded by technologies that allow students to have virtual assistants to guide them in their learning. Keywords: analysis of data; artificial intelligence; machine learning; online education 1. Introduction Currently, society is affected by a health emergency that has changed the way it lives [1]. The Coronavirus disease 2019 (COVID-19) has revealed the fragility of all areas, be they health, education, industrial, etc. There is no part of society that has not been affected; however, it is the duty of universities and their research departments to work on all these weaknesses and create robust models based on what has been learned from this emergency [2]. For this, it is necessary to take into account the tools that have allowed us to combat this disease and that have served as a channel to keep certain areas available and functional that are necessary for the development and subsistence of society. These tools are information and communication technologies (ICT), which have allowed most activities to be carried out remotely and securely [3]. It should be noted that what happened has changed our vision of the way we live. The new normality that the world is experiencing brings with it new challenges that must be overcome and that all sectors must assume with the use of ICT and new technologies. Higher education is one of the sectors of society that for several years has integrated these technologies into its activities [4]. This integration has allowed education to continue despite the severe drawbacks. However, it is necessary to identify the problems that have arisen and adopt adaptive education models that integrate new and better technologies that allow students to continue their learning in any situation [5]. To achieve this objective, it is necessary to return to certain concepts and tools that have been neglected. Appl. Sci. 2020, 10, 5371; doi:10.3390/app10155371 www.mdpi.com/journal/applsci
2. Appl. Sci. 2020, 10, 5371 2 of 18 For example, the learning management systems (LMSs), which, in recent years, have lost prominence because they are considered by certain institutions as simple repositories. This vision took a drastic turn due to current circumstances, in which LMS is the medium that allows students to maintain interaction with their institutions [6]. It is necessary to specify that the face-to-face educational model under the current circumstances was forced to move to a virtual or online educational model [7]. This factor, although it seems to be easily absorbed by the students, makes a big difference in a deeper analysis [8]. A change in the study modality presents drastic results in the performance of the students, such as the loss of interest and lack of adaptability, factors that directly affect learning. In a traditional education model, the teacher is expected to be the main actor in learning, since he is the owner of knowledge [9]. The moment this changes, and the student becomes the main actor in their learning is where the problems begin. To solve them, it is important to create an ideal environment for the student, where they find the necessary resources for their learning. In addition, it is necessary to include systems that continuously monitor their performance [10], as well as the inclusion of systems that are in charge of assigning activities aligned with the characteristics of each student. In this work, the integration of artificial intelligence (AI), data analysis, and LMS is proposed to improve an online education model and thereby improve student learning. To do so, it is based on an online education model from a university in Ecuador that participates in this research. As a tool, the model uses an LMS, where students find sections with resources and activities that serve for their training [11]. An AI system, such as machine learning, is integrated into this model, which is responsible for interacting and managing student performance. The objective is to transmit security and support to the student in their academic activities [12]. The model knows all the student’s data, both from the interaction with the student, and from the data obtained from the analysis. The task of data analysis is to identify patterns in students that allow their classification and thereby create an adaptive model that is aligned with the needs of each student. By integrating these technologies with the LMS, learning is prioritized; therefore, the student, upon entering the LMS, will meet a virtual assistant who knows their academic agenda and all the minors of their performance [13]. Through the academic agenda, the assistant knows what activities are to be carried out and sends notifications for compliance. The management of grades is another parameter that the assistant has; therefore, it constantly monitors student performance and has the ability to interact with the student to work on their learning. The system after the analysis adjusts the activities that are presented as an evaluation mechanism and aligns them with the characteristics and needs of the student [14]. The system generates an alert that is discussed and verified with the teacher in order to improve the activity, change it, or even modify the course methodologies to achieve the objective of improving learning in an online educational model through the integration of technologies and interaction with students. The model is based on two essential components for its development: First, the university infrastructure. This component is the starting point as it offers a functional architecture for data processing and management. The deployed architecture allows the process to be aligned and adjusted to meet the integration requirements of technologies, such as big data and AI. The second component is the integration of new technologies; both data analysis and AI must converge in harmony and the data must be constantly evaluated. The evaluation allows the AI system to learn from the errors of each instance that is generated in the analysis and make increasingly successful decisions. For the sample, two chairs of the online modality were considered, which are related to ICT. The objective of choosing these subjects is that students have an adequate level with the use of ICT. Therefore, the handling of the new applications will not be indifferent to the student; on the contrary, they will have an effect of interest and involvement in the use of the tools. The work is divided as follows: In Section 2, a description of the works related to the research topic is made; Section 3 reviews the concepts used; Section 4 describes the proposed method; Section 5 shows the results of the investigation and discusses the results obtained; and finally, Section 6 presents the conclusions.
3. Appl. Sci. 2020, 10, 5371 3 of 18 2. Related Works Several related papers have been reviewed, highlighting the use of AI or educational data analysis tools in LMS [15]. However, these works do not propose the main objective of improving learning, making this integration an assistant for those involved. Certain works propose the use of data mining algorithms for knowledge discovery in educational databases [16]. This functionality aims to identify the deficiencies of the students in a given course. This knowledge is transferred to the areas or people in charge of the learning quality who are the ones who take the necessary corrective measures. Other works use more complex models that integrate business intelligence (BI) architectures. With the use of a BI, it is proposed to include several data sources to give greater granularity to the analysis [17]. The granularity in the analysis allows us to identify the variables that lead students to academic desertion, which is one of the problems with the greatest impact on virtual or online educational models. The works related to the use of AI in the LMS mainly seek to help the teacher to generate better models and learning methodologies applied in these environments. There is important information about the use of specialized AI techniques in user interaction and that they learn from each interaction [18]. These models are robust and contribute significantly to the development of this work [19]. Based on the review carried out, it can be highlighted that the proposed work differs from those existing in the integration of two technologies, such as AI and data analysis, in a single environment. By centralizing all academic management in a single system, a virtual assistant can be created that, at first, manages the information of each student and is responsible for automatic and personalized monitoring. The assistant, in addition to learning from the user interaction, has all the information resulting from the data analysis [20]. The analysis is not limited to the data found in the LMS, as the integration of various sources becomes a key point to identify the needs and expectations of each student. The technology with this capacity is big data, and the amount and type of data that is integrated into the analysis provides adaptability to decision-making [21]. This integration allows AI to make quick and effective decisions about student performance. 3. Preliminary Concepts 3.1. Analysis of Data Data analysis is responsible for examining a set of data in order to draw conclusions about the information in order to make decisions, or to expand knowledge on a specific topic. Data analysis subjects the data to various operations in order to obtain precise conclusions that help achieve the proposed objectives. Data analysis is used in various industries to enable companies and organizations to make better business decisions, and it is also used in the sciences to verify or fail existing models or theories [22]. The difference with data extraction is defined by its scope, its purpose, and its focus on analysis. Data extractors classify vast data sets using sophisticated software to identify undiscovered patterns and establish hidden relationships. Data analysis focuses on inference, the process of drawing a conclusion based only on what the researcher knows [23]. The areas that generally use data analysis are: • Marketing: Data analysis has been used primarily to predict consumer behavior, including to classify it. • Human resources: Data analysis is also very useful within companies to maintain a good work environment, identifying potential employees. • Academics: Data analysis is also present in education; it serves to select new students and to measure student performance. 3.2. Artificial Intelligence AI is the simulation of human intelligence by machines. In general, it is the discipline that tries to create systems capable of learning and reasoning as a human being [24]. These systems learn from
4. Appl. Sci. 2020, 10, 5371 4 of 18 experience, have the ability to solve problems under certain conditions, contrast information, and carry out logical tasks. Typically, an AI system is capable of analyzing high-volume data, identifying patterns and trends, and therefore formulating predictions automatically, quickly, and accurately. AI makes everyday experiences smarter [20]. How? By integrating predictive analytics and other AI techniques into applications that are used on a daily basis, for example: • Siri works as a personal assistant as it uses natural language processing. • Facebook and Google Photos suggest tagging and grouping of photos based on image recognition. • Amazon offers product recommendations based on shopping basket models. • Waze provides optimized traffic information and real-time navigation. Artificial intelligence has many fields and its operation is based on the application of various techniques. Some of the most widely used are described below. • Machine learning is a type of artificial intelligence that gives computers the ability to learn. It is based on data analysis, through which new patterns are identified that allow modification of your behavior [7]. That is, it analyzes and processes information, discovers patterns, and acts accordingly. • Knowledge engineering is based on the use of the necessary techniques to create expert systems. It is a computational area that is used to store important information and uses it for strategic purposes [25]. The deeper the layers of information, the better the strategies applied. • Fuzzy logic is one of the most trending mathematical theories currently. It is based on the use of appreciations that are not totally true or false but occupy all the intermediate positions between the absolute truth and total falsehood [26]. • Artificial neural networks is a technique whose behavior is inspired by the functioning of human neural networks. As in the human being, they are independent systems that are interconnected with each other [13]. Each artificial neuron receives a certain number of inputs, to which it gives a certain “weight”. Depending on the number of inputs and your weight, it will receive a certain “nervous impulse”, which translates into an output value [21]. • Rule-based systems work by applying different rules for a given situation and comparing the results obtained. This task can be carried out by different methods. On the one hand, they can start from initial evidence or a situation and find their possible solution [27]. On the other hand, they can start from hypotheses with possible solutions and carry out the inverse journey to find the premise or evidence. • Expert systems are computer systems that function as a human expert in a specific subject. Its operation is based on learning, memorizing, and communicating information [28]. Normally, the information has been provided by human experts, and the system performs the processes based on standards to use its knowledge in particular situations. In turn, this expert system can learn and improve with future additions. • Artificial vision is the combination of hardware and software that allows devices to process and recognize images captured in the real world based on numerical or symbolic concepts [4]. 3.3. Online Education Model The development of ICT has opened up countless possibilities to carry out educational projects in which all people have the opportunity to access quality education regardless of when or where they are. Indeed, the access alternatives that have been put in the hands of people have eliminated time and distance as an obstacle to teaching and learning [6]. Online education is a modality of distance studies developed in a digital environment known as a virtual classroom, which is accessed through an Internet connection and uses technological tools for the teaching-learning process. It has the advantage of being an asynchronous study model, in which hours and days of the week are established for interaction with the teacher. Online education arises
5. Appl. Sci. 2020, 10, 5371 5 of 18 from the busy pace of life in which society currently lives [29]. Whether for work, family, or the geographical position of some people, online education achieves a common educational objective, without the limitations of space or time. Some characteristics of online education models are: • Interactive model allows the student to interact with the content, their teachers, and fellow students. • Accessible, no matter the place or time, and works anywhere with Internet access. • Synchronous and asynchronous, allowing the student to participate in tasks or activities at the same time as others. • Online resources allow access to resources without the need to have them physically at any time that is necessary. 4. Method For the development of the work, it is necessary to specify the environment where the implementation of the different systems will be carried out. By determining exactly the current conditions, it is possible to determine the ideal way of integrating technologies. In a second instance, the data analysis model that is required in the university center is determined, according to the variables and questions that are to be answered. Last but not least, the AI system that works in conjunction with the LMS, data analysis, and students is adjusted to improve learning in an online education model. 4.1. Identification of the Environment In this work, a university from Ecuador participated, and this university offers two study modalities. The first modality is face-to-face, which meets the characteristics of a traditional modality, and learning depends on the experience and methodology applied by the teacher [30]. The student becomes a spectator of her own learning and must comply with previously established schedules. In addition, the teacher becomes the entity that determines what they should learn and how they should do it. For this reason, the teacher has a greater influence in identifying the performance of each student [19]. Therefore, this identification is biased to the teacher’s criteria, a factor that is not expected in an ideal learning model. The second model that the university offers is online education. This model has been worked on and improved over 10 years. In addition, its evolution dates back to a virtual education model. This process has allowed the integration of information technology (IT) and has a technological architecture that becomes the basis of this work. The methodology used by this online education model has been designed for people who, due to their schedules and obligations, cannot access a face-to-face modality. This model takes as a platform an LMS, where schools and courses have been created depending on each degree. In the virtual course, the student must comply with three compulsory activities, which are the development of tasks, evaluations, and participation in forums. For the development of these activities, the student has a module where he will find all the resources that the teacher and designer of the course have considered relevant. The course is designed in weekly modules; therefore, all activities must be completed and made available on the platform every seven days. The management of the modality has defined a specific schedule for 60-min asynchronous tutorials, where each tutor simply clarifies the doubts that have arisen in the week on the topics covered. The problem with this educational model is that although its intention is to adapt to the needs of students both in time and in learning [31], this does not really happen and there are very high dropout rates and low academic effectiveness. In addition, the learning indicators are not desired [15]. These problems generally point to students not having adequate study methodologies, as well as disciplined practice in keeping with their own schedules. These factors are understandable, since the highest percentage of students come from a face-to-face educational model [32]. The adaptability to this system where there is no daily control by the teachers and
6. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 18 Appl. Sci. 2020, 10, 5371 6 of 18 educational model [32]. The adaptability to this system where there is no daily control by the teachers and where the learning falls on the student becomes the main cause of abandonment and other where the learning problems falls on the student becomes the main cause of abandonment and other problems already mentioned. already mentioned. The technological infrastructure that the university uses for this modality becomes an advantage Thedevelopment for the technologicalofinfrastructure thatbeen this work. It has the university designed touses for thisa modality support becomes high volume an advantage of transactions and for the development of this work. It has been designed to support a high volume of transactions services. Issues, such as information security and care, are covered by the IT department. This ensures and that this Issues, such ason work focuses information integratingsecurity and AI, data care, are analysis, andcovered by the ITcompromising LMS without department. This data.ensures Figure that this work 1 presents thefocuses on integrating IT architecture AI, data analysis, that includes and LMS the university. without This compromising architecture is madedata. up ofFigure 1 layers, presents the IT architecture that includes the university. This architecture where an additional one has been included that is in charge of data analysis [33]. is made up of layers, where an additional one has been included that is in charge of data analysis [33]. Figure 1. Technological architecture of a university for the management of an online education Figure 1. Technological architecture of a university for the management of an online education modality [33]. modality [33]. The data entry layer is responsible for obtaining data from all systems and devices that are The data entry layer is responsible for obtaining data from all systems and devices that are available for student use [29]. In this layer, the data that students generate in social networks on available for student use [29]. In this layer, the data that students generate in social networks on specific topics about their condition as students is even considered. The data can be structured or specific topics about their condition as students is even considered. The data can be structured or unstructured, and the data analysis layer adds a value to the data. unstructured, and the data analysis layer adds a value to the data. The cloud computing and storage layer provides the opportunity to manage data according to The cloud computing and storage layer provides the opportunity to manage data according to its purpose. This is important in this work, since various activities are carried out through the use of its purpose. This is important in this work, since various activities are carried out through the use of mobile applications [34]. Various data from these activities are stored or processed directly in public or mobile applications [34]. Various data from these activities are stored or processed directly in public private clouds [35]. or private clouds [35]. The knowledge layer is responsible for data analysis. To do so, it uses a big data architecture. The knowledge layer is responsible for data analysis. To do so, it uses a big data architecture. This layer becomes the engine of the online education model [36]. This layer processes all the data This layer becomes the engine of the online education model [36]. This layer processes all the data found in the different sources and analyzes it through various data mining algorithms. The information found in the different sources and analyzes it through various data mining algorithms. The passes to the AI system, which generates knowledge about the results obtained and interacts with the information passes to the AI system, which generates knowledge about the results obtained and students and the areas in charge of learning [23]. interacts with the students and the areas in charge of learning [23]. The service layer is the integration of the systems and layers already mentioned in the LMS and The service layer is the integration of the systems and layers already mentioned in the LMS and presented to members of the online education modality [37]. The way the information is presented can presented to members of the online education modality [37]. The way the information is presented also be presented in different systems related to the educational model. can also be presented in different systems related to the educational model. 4.2. Analysis of Data 4.2. Analysis of Data Data analysis is of vital importance in this work due to the large amount of data that is expected to Data analysis be processed, is of vital in addition importance to the inthat type of data thisitwork duetotointegrate intends the largeinto amount of data that the analysis. is expected The technology to bemeets that processed, in addition of the characteristics to this the work type of data is big thatThe data. it intends to that objective integrate into the this fulfills is toanalysis. analyze Thethe technology that meets the characteristics of this work is big data. The objective data that comes from different repositories. The data generated by the students from the that this fulfills is to activities analyze they theout, carry data asthat wellcomes as thefrom different interaction repositories. with the LMS isThe datain stored generated by the students its own database from the in a structured activities way. they carry However, if onlyout, asdata these wellareas considered, the interaction with theisLMS granularity is storedininthe not obtained itsanalysis. own database Moreover,in a
7. Appl. Sci. 2020, 10, 5371 7 of 18 the results will be segmented to the corresponding scores for each activity. This does not mean that real data is being obtained on the learning of each student. Therefore, it is necessary to integrate more information to the analysis architecture, as universities generally store the socioeconomic information of students and in some cases include relevant information on the academic performance of basic training institutions. This information allows the discovery of possible trends in the students and the way in which they learn [38]. All the aforementioned refers to structured data; however, this work aims to obtain information from students through all available sources, such as social networks. The big data framework used for this work is based on Hadoop. This framework allows the processing of large volumes of data regardless of its type [39]. This feature and the reliability of Hadoop allows the analysis of as many variables as possible, guaranteeing granular and quality results. Hadoop, being an open-source system, allows storing, processing, and analyzing of academic data at no additional cost to the institution [40]. The Hadoop components that allow us to pose it as the ideal architecture for this work are the Hadoop Distributed File System (HDFS), which allows the data file not to be saved on a single machine but rather to be able to distribute the information to different devices. Mapreduce is a framework that makes it possible to isolate the programmer from all the tasks of parallel programming. It allows a program that has been written in the most common programming languages to be run in a Hadoop cluster [41]. YARN is a framework for task planning and cluster resource management. 4.2.1. Hadoop Operation MapReduce sends the computational process to the site where the data to be processed resides, which is collected in a cluster. When a MapReduce process is launched, the tasks are distributed among the different servers in the cluster and Hadoop manages the sending and receiving of data between nodes. Computing happens at nodes that have data on the premises to minimize network traffic. Once all the data has been processed, the user receives the result of the cluster. MapReduce contains two phases, although the second is subdivided into two others: • Map. • Reduce: shuffle data and reduce. 4.2.2. Phases in Hadoop MapReduce In Hadoop MapReduce, the input data is divided into separate chunks that are processed by the mappers in parallel. The results of the map are ordered, which are the input for the reducers. Generally, the inputs and outputs of jobs are stored in a file system, these being the storage and compute nodes [42]. It is common that the application logic cannot be decomposed into a single MapReduce run, so several phases are chained, treating the results of one as input for the mappers of the next phase. This feature allows the tasks of each fragment to be executed on the node where it is stored, reducing the data access time and movements between nodes in the cluster [40]. The framework is also responsible for managing resources, planning, restarting, and monitoring tasks with the Hadoop YARN manager, which has a single resource manager and a node manager on each node of the cluster [33]. • The map phase runs on subtasks called mappers. These components are responsible for generating key-value pairs by filtering, grouping, ordering, or transforming the original data. Intermediate data pairs are not stored in HDFS. • The shuffle sort phase may not be necessary. It is the intermediate step between map and reduce that helps to collect data and sort them conveniently for processing. With this phase, the repeated occurrences in each of the mappers are added. • The reduce phase manages the aggregation of the values produced by all the mappers in the system or by the key-value-type shuffle phase based on their key. Finally, each reducer generates its output file independently, generally written in HDFS.
8. Appl. Sci. 2020, 10, 5371 8 of 18 Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 18 InInFigure Figure2,2,the theclassification classificationofofthe theclusters clusterswhere wherethe theprocessing processingofofeach eachnode nodeisisassigned assignedisis observed,and andthetheMapReduce MapReduceframework frameworkhas hasaamaster/slave master/slavearchitecture. architecture.ItIthas hasaamaster masterserver serveroror JobTracker and several slave servers or TaskTrackers, one for each node in the cluster. The JobTracker and several slave servers or TaskTrackers, one for each node in the cluster. The JobTracker JobTracker thepoint pointofofinteraction interactionbetween betweenusers usersand andthetheMapReduce MapReduceframework. framework.Users Userssubmit submitMapReduce MapReduce jobstotothe JobTracker, the JobTracker, which which puts them puts in a in them pending job queue a pending and runs job queue andthem runsinthem the order of order in the arrival.of The JobTracker manages the assignment of tasks and delegates the tasks to the arrival. The JobTracker manages the assignment of tasks and delegates the tasks to the TaskTrackers. TaskTrackers. TaskTrackersexecute executetasks tasksunder underthethecommand commandofofthe theJobTracker JobTrackerandandalso alsohandle handlethe themovement movementofof data between the map phase and reduce data between the map phase and reduce [43]. [43]. Figure 2. Architecture of a MapReduce master/slave model [13]. Figure 2. Architecture of a MapReduce master/slave model [13]. MapReduce components align with the type of analysis required in developing a system that MapReduce components align with the type of analysis required in developing a system that integrates AI, data analysis, and LMS. One of the conditions that establishes the use of Hadoop is integrates AI, data analysis, and LMS. One of the conditions that establishes the use of Hadoop is that that a real-time analysis is not necessary, but the architecture must guarantee the handling of a large a real-time analysis is not necessary, but the architecture must guarantee the handling of a large volume of data, as well as its diversity [44]. Another factor that has been considered for the use of this volume of data, as well as its diversity [44]. Another factor that has been considered for the use of architecture is the knowledge that exists on the part of the IT area of the university that participates in this architecture is the knowledge that exists on the part of the IT area of the university that the study. participates in the study. 4.3. Artificial Intelligence 4.3. Artificial Intelligence AI includes several tools that can be exploited by an online education model to endow systems withAIspecial includes several toolsthat characteristics that allow can bethe exploited creationbyofanvirtual online assistants education thatmodel to endow interact systems directly with with special characteristics that allow the creation of virtual assistants that students. The AI aims to take the data that has been previously processed by big data and look forinteract directly with patterns The AI aims in them. In to thistake way,thethe data that has system canbeen previously classify autonomously processed theby big data results andand look for recommend patterns in them. In this way, the system can autonomously classify the results different actions to students and tutors of the modality [45]. Among the AI tools that can perform this and recommend different actions type of work to students and tutors of the modality [45]. Among the AI tools that can perform this are: type of work are: • Expert systems are systems highly trained in a specific intellectual activity, based on the knowledge • Expert systems are systems highly trained in a specific intellectual activity, based on the of experts in the field. A classic example is that of systems that play chess. knowledge of experts in the field. A classic example is that of systems that play chess. • Chatbots are systems that make an interesting use of natural language processing and improve • Chatbots are systems that make an interesting use of natural language processing and improve with each experience, allowing coherent two-way communication with humans, either oral with each experience, allowing coherent two-way communication with humans, either oral or or written. written. •• Virtualassistants Virtual assistantsarearethe theclosest closestthing thingto toaamovie movieAIAIthat thatwewecan caninteract interactwith withtoday. today.ItItrecognizes recognizes our voice, our voice, adapts adaptstotothe way the way weweask ask for things, and isand for things, ableisto able recommend entertainment to recommend according entertainment according to our tastes. One of the strengths of these technologies is that they have an immenseof to our tastes. One of the strengths of these technologies is that they have an immense number users who number feed them of users constantly who feed and help reinforce them constantly and helptheir learning reinforce theiralgorithms. learning algorithms. •• Machine learning are computer programs that Machine learning are computer programs that try to learn from try to learn from previous previous experience experienceand and examples, and examples, and have havea specific a specificand and predetermined purpose predetermined that is generally purpose modeling,modeling, that is generally predicting, understanding patterns in the data, or controlling some system. predicting, understanding patterns in the data, or controlling some system. According to the description of the main types of AI and how it is presented to the user, it is necessary to identify what is the need to be covered in the investigation. In the first instance, there is
9. Appl. Sci. 2020, 10, 5371 9 of 18 According Appl. Sci. 2020, 10, x to FORthe description PEER REVIEW of the main types of AI and how it is presented to the user,9 of it 18 is necessary to identify what is the need to be covered in the investigation. In the first instance, there is a need forfor anan autonomous autonomous system systemthatthat cancan generate knowledge generate knowledge of the of data that is the data already that obtained is already from obtained the analysis from [24]. The the analysis second [24]. The instance second is that theisAIthat instance can the interact withinteract AI can the user.with These thecharacteristics user. These define expert systems characteristics or Chatbots define expert as ideal systems systems as or Chatbots forideal managing systemsstudents. However, for managing it is required students. However,that theistool it has thethat required capacity the toolto generate learning to has the capacity on generate data for which it was learning neverfor on data programmed which it was andnever that, according to this programmed andlearning, it recommends that, according certain activities to this learning, to students it recommends andactivities certain teachers. toFor this reason, students and a machineFor teachers. learning modela is this reason, used. learning model is used. machine To implement implement aa machine machine learning learning model, model, there there are are two two main mainstrategies: strategies: •• Supervised learning: For For this this methodology, methodology, aa previous previous training training phasephase (datasets) (datasets) is is required, required, where hundreds of labels are where hundreds of labels are introduced. introduced. If a machine is required to be able to recognize recognize between dogs andandcatscatsinin a photo, a photo, then then we have we have to show to show the program the program thousands thousands of images of where imagesit where becomes it becomes clear whatclear whatWhat is a cat? is a cat? is a What is a dog? dog? After this After trainingthisphase, trainingthephase, program thewould program would be able be able to identify to identify each each of the of the animals animalscircumstances. in different in different circumstances. This method This method is called is called classification. classification. Another type Another of type learning supervised of supervisedwouldlearning wouldwhich be regression, be regression, is the same which is the same as following as following a continuous value.a continuous value. It is somewhat similar to the machine being able to It is somewhat similar to the machine being able to follow logical values, such as if there is a follow logical values, such as if there is numerical a numerical series of 2, 4, 6 series that the of machine 2, 4, 6 that the machine is able to followisitable as 8,to10,follow it asis8,used 12. This 10, 12. This is especially used for especially for prediction. prediction. •• Unsupervised learning: Unsupervised learning:InInthis thisprocedure, procedure, a training a training phase is not phase is required, not required, and theandmachine must the machine be able must betoable understand to understandand find andpatterns in the information find patterns itself directly. in the information An example itself directly. is to group An example is students into homogeneous groups. If the information from to group students into homogeneous groups. If the information from thousands of clients with thousands of clients with unstructured data unstructured dataisisdisclosed disclosedtotothe thesystem, system, the the computer computer system system would would be beableable to recognize to recognize the the characteristics of the students, and segment it into profiles with similar characteristics of the students, and segment it into profiles with similar criteria. This problem is criteria. This problem is called called clustering clustering orordatadata agglomeration.This agglomeration. Thisisisuseful usefulto toreduce reducethe the number number of of total total variables variables to 22 or to or 33 maximums, maximums, so so that that there there is is no no loss loss ofof information, information, and and thus thus the the data data cancan be be visualized, visualized, visually facilitating its understanding. visually facilitating its understanding. Phases Phases for thefor the Implementation Implementation of Machine of Machine Learning Learning Before thinking about the technological solution, it is necessary to address the business objective that is sought to be solved with a machine machine learning learning tool. The goals can be as diverse as improving conversions, reducing churn, or increasing user satisfaction [46]. The The important important thing thing is is to be clear about which element to optimize to focus resources on it and not to implement a solution that exceeds the original goal [12]. Figure 3 shows the different phases of the machine learning process and how they interact with each other. Figure 3. Figure Phases for 3. Phases for the the implementation implementation of of aa machine machine learning learning model. model. 1. To understand the problem, it is important to understand the problem that we have to solve. Normally, this takes a long time, especially if the problem comes from a sector in which knowledge is poor. In this phase, it is necessary to create collaborative environments with people who know a lot about the problem.
10. Appl. Sci. 2020, 10, 5371 10 of 18 1. To understand the problem, it is important to understand the problem that we have to solve. Normally, this takes a long time, especially if the problem comes from a sector in which knowledge is poor. In this phase, it is necessary to create collaborative environments with people who know a lot about the problem. 2. To understand the data, it is common to do an exploratory analysis of the data to become familiar with it. Descriptive statistics, correlations, and graphs are performed in exploratory analysis to better understand the story the data is telling. Furthermore, it helps to estimate if the available data is sufficient, and relevant, to build a model. 3. Defining an evaluation criterion is usually an error measure. Typically, the root-mean-square error is used for regression problems and the cross entropy is used for classification problems. For classification problems with two classes that are common, other measures, such as accuracy and completeness, are used. 4. Evaluation of the current solution: Probably, the problem to be solved with machine learning, is already being solved in another way. Surely, the motivation to use machine learning to solve this problem is to get better results. Another common motivation is to get similar results automatically, replacing boring manual work. By measuring the performance of the current solution, it can be compared to the performance of the machine learning model. In this way, the feasibility of using the machine learning model is identified. If there is no current solution, a simple solution can bee defined that is very easy to implement. For example, predicting a student’s grade in a course with automatic learning is comparable to a simple solution (the average value of their qualifications during an academic period). Only in this way, when the machine learning model is implemented, is it possible to define if it is good enough, if it needs to be improved, or if it is not worth implementing. If in the end it turns out that the current solution or a simple solution is similar to the machine learning solution, it is probably better to use the simple solution. 5. Prepare the data: Although this process is carried out by the big data section, it is necessary to detail certain factors in the machine learning phases. Data preparation is one of the phases of machine learning that involves more effort. The main challenges are incomplete data. It is normal that the ideal data for the machine learning process is not available. For example, to predict which students are more likely to enter an online educational model, the data we have comes from an online survey. There will be many people who have not filled in all the fields. However, incomplete data is better than having no data at all, and there are several actions that can be used to prepare the data, such as deleting it, imputing it with a reasonable value, imputing it with a machine learning model, or doing nothing and using some machine learning technique that handles incomplete data. When combining data from various sources, some data may come from a database, others from a spreadsheet, from files, etc. It is necessary to combine the data so that the machine learning algorithms can consider all the information. Calculating the relevant features (machine learning algorithms) works much better with relevant features instead of pure data [47]. As an example, it is much easier for people to know the temperature in degrees Celsius than to know how many milligrams of mercury have been dilated in a traditional thermometer. 6. Building the model: The phase of building a machine learning model, once the data is ready, surprisingly requires little effort. This is because there are already several machine learning libraries available. Many of them are free and open source. During this phase, which type of machine learning technique to use it chosen. The machine learning algorithm will automatically learn to get the right results with the historical data that has been prepared. 7. Error analysis: This phase is important to understand what needs to be done to improve machine learning results. In particular, the options will be use a more complex model, use a simpler model, identify the need to include more data and/or more characteristics, develop a better understanding of the problem, etc. In the error analysis phase, it is important to ensure that the model is capable of generalization. Generalization is the ability of machine learning models to produce good results when they use new data. In general, it is not difficult to achieve acceptable
11. the data preparation phases, which requires that the machine learning model communicates with other parts of the system and that the results of the model are used in the system. Furthermore, errors must be automatically monitored. The model warns if model errors grow over time to rebuild the machine learning model with new data, either manually or automatically. Appl. Sci. 2020, 10, 5371 The construction of interfaces for the data is necessary so that the model 11 of can 18 obtain data automatically and so that the system can use its prediction automatically. 4.4. results using Integration of this process. Big Data, However, Machine to get Learning, andexcellent LMS results, we have to iterate over the previous phases several times. With each iteration, the understanding of the problem and the data will For the grow. integration This allows the of design systemsofand newrelevant better technology, a model, features such as the and reduces thatgeneralization shown in Figure 4, is error. used, where the LMS has a large volume of data on all activities and interaction A greater understanding also offers the possibility of choosing with more criteria the machine with the student. The interaction learningistechnique not direct;thathowever, it is best suits thecommon problem. for there to be information in the LMS database on 8.how Model integrated into a system. Once the model hasOther long each student remains active on the platform. been information adjusted based thatoncan be obtained error, the machineis the usuallearning model is integrated into the LMS. The phase of integrating a machine learning model intoin schedule in which each student connects [19]. To these data are added those that are stored databases a system of requires administrative a greaterand othereffort. relative academic systems. to It is necessary This information be able allows an to automatically analysis repeat that the data covers a greater number of variables that the big data architecture is preparation phases, which requires that the machine learning model communicates with other in charge of processing [48]. The architecture parts of the systemofand bigthat datathe in results its firstofphase is responsible the model are used for extracting in the system. data from all sources, Furthermore, errors andmust this data is structured and unstructured [49]. Once it obtains all the be automatically monitored. The model warns if model errors grow over time to rebuild data, it processes it in such the a waymachine that it is learning useful formodel obtaining the knowledge that the AI is in charge of through with new data, either manually or automatically. The construction of machine learning. Machine learning is responsible interfaces for the data is necessary for recognizing the patterns so that the model of analysis can obtain and with them data automatically andperforms so that thethe classification of individuals. The patterns system can use its prediction automatically. are presented as characteristics of each group, where the objective is that, by knowing the needs of each group, the system has the ability to propose strategies or Integration 4.4. techniquesofthat Bigimprove the way Data, Machine activities Learning, andareLMS presented [50]. Furthermore, it improves learning by recommending learning activities to students based on their needs. For the integration of systems and new technology, a model, such as that shown in Figure 4, Once the activities have been recommended, machine learning enters a state of analysis of the is used, where the LMS has a large volume of data on all activities and interaction with the student. results. For this, the system analyzes the grades that students obtain in the recommended activities. The interaction is not direct; however, it is common for there to be information in the LMS database on If the results show that the student improved their performance, the process ends and returns to the how long each student remains active on the platform. Other information that can be obtained is the initial state. If the system detects that the results do not exceed the average mark, defined as the basis usual schedule in which each student connects [19]. To these data are added those that are stored in for the university’s policies, the system feeds back and integrates this data into the analysis phase, databases of administrative and other academic systems. This information allows an analysis that where the system begins the process again until satisfactory results are obtained. covers a greater number of variables that the big data architecture is in charge of processing [48]. Figure 4. Big data integration model—Machine learning and LMS. Figure 4. Big data integration model—Machine learning and LMS. The architecture of big data in its first phase is responsible for extracting data from all sources, 5. Discussion and and Results this data is structured and unstructured [49]. Once it obtains all the data, it processes it in such a way that it is useful for obtaining the knowledge The new normality that humanity lives inthat theinstitutions forces AI is in charge of through to seek machine new models thatlearning. adapt to the needslearning is responsible of people. This paperfor recognizing takes the patterns this consideration intoofaccount analysisand andseeks withto them performs improve the an online classification of individuals. The patterns are presented as characteristics of each group, where the objective is that, by knowing the needs of each group, the system has the ability to propose strategies or techniques that improve the way activities are presented [50]. Furthermore, it improves learning by recommending learning activities to students based on their needs. Once the activities have been recommended, machine learning enters a state of analysis of the results. For this, the system analyzes the grades that students obtain in the recommended activities. If the results show that the student improved their performance, the process ends and returns to the initial state. If the system detects that the results do not exceed the average mark, defined as the basis
12. Appl. Sci. 2020, 10, 5371 12 of 18 for the university’s policies, the system feeds back and integrates this data into the analysis phase, where the system begins the process again until satisfactory results are obtained. 5. Discussion and Results The new normality that humanity lives in forces institutions to seek new models that adapt to the needs of people. This paper takes this consideration into account and seeks to improve an online education model. The integration of technologies becomes the starting point to improve education and monitor student performance. It should be noted that the current reality has allowed online, virtual, or hybrid education models to become the expected response to continue with higher learning. This work is applied on the architecture and infrastructure of the university that participated in the study. This is considered an advantage, since, having the majority of the infrastructure deployed, it allows the concentration of efforts on the design of the machine learning model. If there is a need to modify any layer of the architecture, it is simply updated without the need to generate higher technical, human, or economic costs. With the integration of these technologies, the monitoring of student performance is improved, which generally depends clearly on the criteria of the teacher or those in charge of learning. With this model, the monitoring does not have human actors, the systems are in charge of carrying out a continuous analysis of each student, and the machine learning model will even detect the cases that have the highest risk of low academic performance. This feature allows thee generation of an early warning that is currently established when the academic monitoring department knows a certain number of grades. Early detection of the comprehensive model allows the generation of projections based on the student’s history. For example, in students who had problems in the subject of introduction to calculus, the system recognizes them as possible cases with problems in calculus I and subjects whose prerequisite is introduction to calculus. This analysis can be very superficial; however, the system can even determine a possible case of repetition by analyzing the topics that make up a subject. For the recommendation of activities, machine learning has knowledge of the student’s performance in each activity. Therefore, the decision is made based on the best results that the student obtains in each activity. For example, cases have been detected where type activities, rapid evaluations by means of true and false items, do not align with the need of a certain group of students. The model identifies these groups and recommends other types of activities to the course designer. For this, the development of active learning is taken as an essence. In this type of learning, a wide variety of activities have been developed that machine learning proposes to the student according to their needs. In order to evaluate the proposed model, several exercises were carried out in which the two parallels that belong to an administrative career were involved. Each parallel is made up of 24 students, and the follow-up period was 16 weeks, which generally lasts one academic period. Each level is made up of five subjects, among which students must take general, complementary, and professionalizing subjects. The sample of students belongs to the fourth level. The main reason why this group was chosen is for information obtained from the academic monitoring department. Here, it was found that the first two years of study is where the highest dropout rate is recorded. In addition, students at this level have taken all computer science subjects, allowing them to adapt more easily to a model based on the integration of technologies. The online education model of the university participating in the study complies for each course or subject with an already standardized model consisting of 16 weeks. These are divided into two partials each of seven weeks plus one partial evaluation. Within the LMS, specifically Moodle in the case of the university, each of the courses is created and registered and these have been divided into modules that respond to each week of the period. The courses consist of a main module that provides detailed information on the type of study, the matter, and the assigned tutor. In the same way, the student will find the syllabus and the study guide that allows him to know exactly the topics to be reviewed and the activities to be completed. Within each week, the module is
13. Appl. Sci. 2020, 10, 5371 13 of 18 divided into sections that contain the resources, activities, and corresponding information to assign an asynchronous meeting with the tutor. In the resources section, each tutor is in charge of uploading all the material corresponding to the topic of the week. These resources must be aligned according to the learning results of the subject. The tutor usually uploads his own material, such as a presentation, the resolution of an exercise, or a reading. In addition, it must include supporting material, such as videos, readings, scientific articles, etc. In the activities section, the student finds everything to do during each week. An activity is an opinion forum, where the student comments critically and objectively on a topic raised by the tutor. Another activity that the student must complete is a task that meets the requirements set forth in Bloom’s taxonomy. The objective of this theory is that after completing a learning process, the student acquires new skills and knowledge. For this reason, it consists of a series of levels built with the purpose of ensuring meaningful learning that lasts throughout life. The levels of Bloom’s taxonomy are know, understand, apply, analyze, evaluate, and create. In addition, the student must complete a questionnaire-type evaluation, the purpose of which is to encourage students to read the resources. The last section maintains the information corresponding to the asynchronous meeting with the tutor. The objective of the meeting is that students can make all the queries directly to the tutor or can receive feedback on the activities or topics discussed. Each meeting lasts 60 min. In this model, these meetings are not mandatory and the student can review the recording as many times as they deem necessary. Once the scenario where the model is integrated has been defined, the variables that explain the dropout are established. The set of variables is the university degree corresponding to the numerical value of the general average of a student’s secondary studies, the number of subjects passed, the number of enrollments in the defined periods, the subjects taken (between 1 and 20, coded according to the average number of subjects taken), the sex, and the age of the students between 19 and 30 years old. The problem addressed refers to the detection of the causes of university dropout; previous works have considered desertion to constitute the failure of a student in a consecutive period. In the first exercise, big data requires access to all logs of activities carried out by teachers and students that are usually stored in MySQL. All the data obtained from the different sources went through a processing and transformation phase in order to obtain clean data that are analyzed by Hadoop in search of the patterns that the students follow. In Figure 5, the patterns of the first exercise are presented, where the results of the activities carried out by the students during the established period are obtained. In the “x” axis, the activities, where H1 is the forums, H2 the tasks, and H3 the questionnaire-type evaluations, are shown. On the “y” axis, the obtained grade is presented. It is necessary to indicate that the grades respond to the use of rubrics that guarantee learning. These grades range from 1 to 10. On this axis, six is marked as an acceptable grade that meets the minimum learning criteria. In the forums, it is observed that the learning level is high in most cases, and the low grades are mostly due to the fact that the student did not register their participation or that the contributions were not objective. In the task, based on Bloom’s taxonomy, mean values are obtained that represent that a part of the students adequately meets the requirements of the activity. The group closest to 1 is the questionnaire-type evaluations. These evaluations consist of 10 questions that are scheduled to be completed in 20 min, where the student must answer each question in an average of two minutes. In this activity, the values are extremely low and do not contribute to learning.
14. not register their participation or that the contributions were not objective. In the task, based on Bloom’s taxonomy, mean values are obtained that represent that a part of the students adequately meets the requirements of the activity. The group closest to 1 is the questionnaire-type evaluations. These evaluations consist of 10 questions that are scheduled to be completed in 20 min, where the Appl. must Sci. 2020, 10,answer 5371 each question in an average of two minutes. In this activity, the values 14 ofare 18 extremely low and do not contribute to learning. Figure 5. Data analysis of the activities developed in an online education model with the use of big data. H1: Forums, H2: Homework, H3: Evaluations. The result obtained by big data is taken by the AI to feed machine learning and learn about this data for decision-making. The AI model integrated the analysis, the data from the LMS in relation to the time of dedication of the students to the reading of the teacher’s resource, and the data from a survey carried out on the students, where the time they had to answer each question was discussed. The data from this analysis was subjected to the naive Bayes data mining algorithm with the results presented in Table 1. Table 1. Stratified cross-validation. Correctly Classified Instances 48 94.1176% Incorrectly Classified Instances 3 5.8824% Kappa statistic 0.9113 Mean absolute error 0.0447 Root mean squared error 0.1722 Relative absolute error 10.0365% Root relative squared error 36.4196% Total Number of Instances 51 The algorithm performed the analysis of 51 instances to identify the reason why the scores in the evaluations present a performance below the expected. Of the 51 instances, 48 were classified as correct, with 94.1176%. This value was considered as true to assume the decision of the analysis. The results are presented in Table 2. Table 2. Matrix of confusion. A B C ← Classified as 15 0 0 | a = T. Dedication 0 18 1 | b = T. Question 0 2 15 | c = Difficulty The results obtained gave as a result that the time available to answer each question (2 min), damages the development of the evaluation. These results were compared with the number of evaluations that the LMS closed because the evaluation time was completed. The number of instances that detect this effect are 18 effective and one erroneous or that the analysis detected it as an evaluation difficulty. In the time of the dedication of the students to the reading of the teacher’s resources, 15 true