Online Education Model

Contributed by:
Steve
This work proposes the integration of technologies, such as artificial intelligence and data analysis, with learning management systems to improve learning. This objective is outlined in new normality that seeks robust educational models, where certain activities are carried out in an online mode, surrounded by technologies that allow students to have virtual assistants to guide them in their learning.
1. applied
sciences
Improvement of an Online Education Model with the
Integration of Machine Learning and Data Analysis in
an LMS
William Villegas-Ch 1, * , Milton Román-Cañizares 1 and Xavier Palacios-Pacheco 2
1 Escuela de Ingeniería en Tecnologías de la Información, FICA, Universidad de Las Américas, Quito 170125,
Ecuador; milton.roman@udla.edu.ec
2 Departamento de Sistemas, Universidad Internacional del Ecuador, Quito 170411, Ecuador;
xpalacio@uide.edu.ec
* Correspondence: william.villegas@udla.edu.ec; Tel.: +593-098-136-4068

Received: 28 June 2020; Accepted: 22 July 2020; Published: 4 August 2020 
Abstract: The events that took place in the year 2020 have shown us that society is still fragile and
that it is exposed to events that rapidly change the paradigms that govern it. This has been shown
by a pandemic like Coronavirus disease 2019; this global emergency has changed the way people
interact, communicate, study, or work. In short, the way in which society carries out all activities
has changed. This includes education, which has bet on the use of information and communication
technologies to reach students. An example of the aforementioned is the use of learning management
systems, which have become ideal environments for resource management and the development of
activities. This work proposes the integration of technologies, such as artificial intelligence and data
analysis, with learning management systems in order to improve learning. This objective is outlined
in a new normality that seeks robust educational models, where certain activities are carried out in an
online mode, surrounded by technologies that allow students to have virtual assistants to guide them
in their learning.
Keywords: analysis of data; artificial intelligence; machine learning; online education
1. Introduction
Currently, society is affected by a health emergency that has changed the way it lives [1].
The Coronavirus disease 2019 (COVID-19) has revealed the fragility of all areas, be they health,
education, industrial, etc. There is no part of society that has not been affected; however, it is the
duty of universities and their research departments to work on all these weaknesses and create robust
models based on what has been learned from this emergency [2]. For this, it is necessary to take into
account the tools that have allowed us to combat this disease and that have served as a channel to
keep certain areas available and functional that are necessary for the development and subsistence
of society. These tools are information and communication technologies (ICT), which have allowed
most activities to be carried out remotely and securely [3]. It should be noted that what happened has
changed our vision of the way we live.
The new normality that the world is experiencing brings with it new challenges that must be
overcome and that all sectors must assume with the use of ICT and new technologies. Higher education
is one of the sectors of society that for several years has integrated these technologies into its activities [4].
This integration has allowed education to continue despite the severe drawbacks. However, it is
necessary to identify the problems that have arisen and adopt adaptive education models that
integrate new and better technologies that allow students to continue their learning in any situation [5].
To achieve this objective, it is necessary to return to certain concepts and tools that have been neglected.
Appl. Sci. 2020, 10, 5371; doi:10.3390/app10155371 www.mdpi.com/journal/applsci
2. Appl. Sci. 2020, 10, 5371 2 of 18
For example, the learning management systems (LMSs), which, in recent years, have lost prominence
because they are considered by certain institutions as simple repositories. This vision took a drastic
turn due to current circumstances, in which LMS is the medium that allows students to maintain
interaction with their institutions [6].
It is necessary to specify that the face-to-face educational model under the current circumstances
was forced to move to a virtual or online educational model [7]. This factor, although it seems to be
easily absorbed by the students, makes a big difference in a deeper analysis [8]. A change in the study
modality presents drastic results in the performance of the students, such as the loss of interest and
lack of adaptability, factors that directly affect learning. In a traditional education model, the teacher
is expected to be the main actor in learning, since he is the owner of knowledge [9]. The moment
this changes, and the student becomes the main actor in their learning is where the problems begin.
To solve them, it is important to create an ideal environment for the student, where they find the
necessary resources for their learning. In addition, it is necessary to include systems that continuously
monitor their performance [10], as well as the inclusion of systems that are in charge of assigning
activities aligned with the characteristics of each student.
In this work, the integration of artificial intelligence (AI), data analysis, and LMS is proposed
to improve an online education model and thereby improve student learning. To do so, it is based
on an online education model from a university in Ecuador that participates in this research. As a
tool, the model uses an LMS, where students find sections with resources and activities that serve for
their training [11]. An AI system, such as machine learning, is integrated into this model, which is
responsible for interacting and managing student performance. The objective is to transmit security
and support to the student in their academic activities [12]. The model knows all the student’s data,
both from the interaction with the student, and from the data obtained from the analysis. The task of
data analysis is to identify patterns in students that allow their classification and thereby create an
adaptive model that is aligned with the needs of each student. By integrating these technologies with
the LMS, learning is prioritized; therefore, the student, upon entering the LMS, will meet a virtual
assistant who knows their academic agenda and all the minors of their performance [13]. Through
the academic agenda, the assistant knows what activities are to be carried out and sends notifications
for compliance. The management of grades is another parameter that the assistant has; therefore,
it constantly monitors student performance and has the ability to interact with the student to work on
their learning.
The system after the analysis adjusts the activities that are presented as an evaluation mechanism
and aligns them with the characteristics and needs of the student [14]. The system generates an alert
that is discussed and verified with the teacher in order to improve the activity, change it, or even modify
the course methodologies to achieve the objective of improving learning in an online educational model
through the integration of technologies and interaction with students. The model is based on two
essential components for its development: First, the university infrastructure. This component is the
starting point as it offers a functional architecture for data processing and management. The deployed
architecture allows the process to be aligned and adjusted to meet the integration requirements of
technologies, such as big data and AI. The second component is the integration of new technologies;
both data analysis and AI must converge in harmony and the data must be constantly evaluated.
The evaluation allows the AI system to learn from the errors of each instance that is generated in the
analysis and make increasingly successful decisions. For the sample, two chairs of the online modality
were considered, which are related to ICT. The objective of choosing these subjects is that students
have an adequate level with the use of ICT. Therefore, the handling of the new applications will not be
indifferent to the student; on the contrary, they will have an effect of interest and involvement in the
use of the tools. The work is divided as follows: In Section 2, a description of the works related to the
research topic is made; Section 3 reviews the concepts used; Section 4 describes the proposed method;
Section 5 shows the results of the investigation and discusses the results obtained; and finally, Section 6
presents the conclusions.
3. Appl. Sci. 2020, 10, 5371 3 of 18
2. Related Works
Several related papers have been reviewed, highlighting the use of AI or educational data analysis
tools in LMS [15]. However, these works do not propose the main objective of improving learning,
making this integration an assistant for those involved. Certain works propose the use of data
mining algorithms for knowledge discovery in educational databases [16]. This functionality aims
to identify the deficiencies of the students in a given course. This knowledge is transferred to the
areas or people in charge of the learning quality who are the ones who take the necessary corrective
measures. Other works use more complex models that integrate business intelligence (BI) architectures.
With the use of a BI, it is proposed to include several data sources to give greater granularity to the
analysis [17]. The granularity in the analysis allows us to identify the variables that lead students
to academic desertion, which is one of the problems with the greatest impact on virtual or online
educational models.
The works related to the use of AI in the LMS mainly seek to help the teacher to generate
better models and learning methodologies applied in these environments. There is important
information about the use of specialized AI techniques in user interaction and that they learn from
each interaction [18]. These models are robust and contribute significantly to the development of
this work [19]. Based on the review carried out, it can be highlighted that the proposed work differs
from those existing in the integration of two technologies, such as AI and data analysis, in a single
environment. By centralizing all academic management in a single system, a virtual assistant can be
created that, at first, manages the information of each student and is responsible for automatic and
personalized monitoring. The assistant, in addition to learning from the user interaction, has all the
information resulting from the data analysis [20]. The analysis is not limited to the data found in the
LMS, as the integration of various sources becomes a key point to identify the needs and expectations
of each student. The technology with this capacity is big data, and the amount and type of data that is
integrated into the analysis provides adaptability to decision-making [21]. This integration allows AI
to make quick and effective decisions about student performance.
3. Preliminary Concepts
3.1. Analysis of Data
Data analysis is responsible for examining a set of data in order to draw conclusions about the
information in order to make decisions, or to expand knowledge on a specific topic. Data analysis
subjects the data to various operations in order to obtain precise conclusions that help achieve the
proposed objectives. Data analysis is used in various industries to enable companies and organizations
to make better business decisions, and it is also used in the sciences to verify or fail existing models or
theories [22]. The difference with data extraction is defined by its scope, its purpose, and its focus on
analysis. Data extractors classify vast data sets using sophisticated software to identify undiscovered
patterns and establish hidden relationships. Data analysis focuses on inference, the process of drawing a
conclusion based only on what the researcher knows [23]. The areas that generally use data analysis are:
• Marketing: Data analysis has been used primarily to predict consumer behavior, including to
classify it.
• Human resources: Data analysis is also very useful within companies to maintain a good work
environment, identifying potential employees.
• Academics: Data analysis is also present in education; it serves to select new students and to
measure student performance.
3.2. Artificial Intelligence
AI is the simulation of human intelligence by machines. In general, it is the discipline that tries to
create systems capable of learning and reasoning as a human being [24]. These systems learn from
4. Appl. Sci. 2020, 10, 5371 4 of 18
experience, have the ability to solve problems under certain conditions, contrast information, and carry
out logical tasks. Typically, an AI system is capable of analyzing high-volume data, identifying patterns
and trends, and therefore formulating predictions automatically, quickly, and accurately. AI makes
everyday experiences smarter [20]. How? By integrating predictive analytics and other AI techniques
into applications that are used on a daily basis, for example:
• Siri works as a personal assistant as it uses natural language processing.
• Facebook and Google Photos suggest tagging and grouping of photos based on image recognition.
• Amazon offers product recommendations based on shopping basket models.
• Waze provides optimized traffic information and real-time navigation.
Artificial intelligence has many fields and its operation is based on the application of various
techniques. Some of the most widely used are described below.
• Machine learning is a type of artificial intelligence that gives computers the ability to learn. It is
based on data analysis, through which new patterns are identified that allow modification
of your behavior [7]. That is, it analyzes and processes information, discovers patterns,
and acts accordingly.
• Knowledge engineering is based on the use of the necessary techniques to create expert systems.
It is a computational area that is used to store important information and uses it for strategic
purposes [25]. The deeper the layers of information, the better the strategies applied.
• Fuzzy logic is one of the most trending mathematical theories currently. It is based on the use of
appreciations that are not totally true or false but occupy all the intermediate positions between
the absolute truth and total falsehood [26].
• Artificial neural networks is a technique whose behavior is inspired by the functioning of human
neural networks. As in the human being, they are independent systems that are interconnected
with each other [13]. Each artificial neuron receives a certain number of inputs, to which it gives a
certain “weight”. Depending on the number of inputs and your weight, it will receive a certain
“nervous impulse”, which translates into an output value [21].
• Rule-based systems work by applying different rules for a given situation and comparing the
results obtained. This task can be carried out by different methods. On the one hand, they can
start from initial evidence or a situation and find their possible solution [27]. On the other hand,
they can start from hypotheses with possible solutions and carry out the inverse journey to find
the premise or evidence.
• Expert systems are computer systems that function as a human expert in a specific subject.
Its operation is based on learning, memorizing, and communicating information [28]. Normally,
the information has been provided by human experts, and the system performs the processes
based on standards to use its knowledge in particular situations. In turn, this expert system can
learn and improve with future additions.
• Artificial vision is the combination of hardware and software that allows devices to process and
recognize images captured in the real world based on numerical or symbolic concepts [4].
3.3. Online Education Model
The development of ICT has opened up countless possibilities to carry out educational projects
in which all people have the opportunity to access quality education regardless of when or where
they are. Indeed, the access alternatives that have been put in the hands of people have eliminated
time and distance as an obstacle to teaching and learning [6].
Online education is a modality of distance studies developed in a digital environment known as a
virtual classroom, which is accessed through an Internet connection and uses technological tools for
the teaching-learning process. It has the advantage of being an asynchronous study model, in which
hours and days of the week are established for interaction with the teacher. Online education arises
5. Appl. Sci. 2020, 10, 5371 5 of 18
from the busy pace of life in which society currently lives [29]. Whether for work, family, or the
geographical position of some people, online education achieves a common educational objective,
without the limitations of space or time.
Some characteristics of online education models are:
• Interactive model allows the student to interact with the content, their teachers, and fellow students.
• Accessible, no matter the place or time, and works anywhere with Internet access.
• Synchronous and asynchronous, allowing the student to participate in tasks or activities at the
same time as others.
• Online resources allow access to resources without the need to have them physically at any time
that is necessary.
4. Method
For the development of the work, it is necessary to specify the environment where the
implementation of the different systems will be carried out. By determining exactly the current
conditions, it is possible to determine the ideal way of integrating technologies. In a second instance,
the data analysis model that is required in the university center is determined, according to the variables
and questions that are to be answered. Last but not least, the AI system that works in conjunction with
the LMS, data analysis, and students is adjusted to improve learning in an online education model.
4.1. Identification of the Environment
In this work, a university from Ecuador participated, and this university offers two study
modalities. The first modality is face-to-face, which meets the characteristics of a traditional modality,
and learning depends on the experience and methodology applied by the teacher [30]. The student
becomes a spectator of her own learning and must comply with previously established schedules.
In addition, the teacher becomes the entity that determines what they should learn and how they
should do it. For this reason, the teacher has a greater influence in identifying the performance of each
student [19]. Therefore, this identification is biased to the teacher’s criteria, a factor that is not expected
in an ideal learning model.
The second model that the university offers is online education. This model has been worked
on and improved over 10 years. In addition, its evolution dates back to a virtual education model.
This process has allowed the integration of information technology (IT) and has a technological
architecture that becomes the basis of this work. The methodology used by this online education model
has been designed for people who, due to their schedules and obligations, cannot access a face-to-face
modality. This model takes as a platform an LMS, where schools and courses have been created
depending on each degree. In the virtual course, the student must comply with three compulsory
activities, which are the development of tasks, evaluations, and participation in forums. For the
development of these activities, the student has a module where he will find all the resources that the
teacher and designer of the course have considered relevant.
The course is designed in weekly modules; therefore, all activities must be completed and made
available on the platform every seven days. The management of the modality has defined a specific
schedule for 60-min asynchronous tutorials, where each tutor simply clarifies the doubts that have
arisen in the week on the topics covered. The problem with this educational model is that although
its intention is to adapt to the needs of students both in time and in learning [31], this does not really
happen and there are very high dropout rates and low academic effectiveness. In addition, the learning
indicators are not desired [15]. These problems generally point to students not having adequate study
methodologies, as well as disciplined practice in keeping with their own schedules. These factors
are understandable, since the highest percentage of students come from a face-to-face educational
model [32]. The adaptability to this system where there is no daily control by the teachers and
6. Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 18
Appl. Sci. 2020, 10, 5371 6 of 18
educational model [32]. The adaptability to this system where there is no daily control by the teachers
and where the learning falls on the student becomes the main cause of abandonment and other
where the learning
problems falls on the student becomes the main cause of abandonment and other problems
already mentioned.
already mentioned.
The technological infrastructure that the university uses for this modality becomes an advantage
Thedevelopment
for the technologicalofinfrastructure thatbeen
this work. It has the university
designed touses for thisa modality
support becomes
high volume an advantage
of transactions and
for the development of this work. It has been designed to support a high volume of transactions
services. Issues, such as information security and care, are covered by the IT department. This ensures and
that this Issues, such ason
work focuses information
integratingsecurity and
AI, data care, are
analysis, andcovered by the ITcompromising
LMS without department. This
data.ensures
Figure
that this work
1 presents thefocuses on integrating
IT architecture AI, data analysis,
that includes and LMS
the university. without
This compromising
architecture is madedata.
up ofFigure 1
layers,
presents the IT architecture that includes the university. This architecture
where an additional one has been included that is in charge of data analysis [33]. is made up of layers, where an
additional one has been included that is in charge of data analysis [33].
Figure 1. Technological architecture of a university for the management of an online education
Figure 1. Technological architecture of a university for the management of an online education
modality [33].
modality [33].
The data entry layer is responsible for obtaining data from all systems and devices that are
The data entry layer is responsible for obtaining data from all systems and devices that are
available for student use [29]. In this layer, the data that students generate in social networks on
available for student use [29]. In this layer, the data that students generate in social networks on
specific topics about their condition as students is even considered. The data can be structured or
specific topics about their condition as students is even considered. The data can be structured or
unstructured, and the data analysis layer adds a value to the data.
unstructured, and the data analysis layer adds a value to the data.
The cloud computing and storage layer provides the opportunity to manage data according to
The cloud computing and storage layer provides the opportunity to manage data according to
its purpose. This is important in this work, since various activities are carried out through the use of
its purpose. This is important in this work, since various activities are carried out through the use of
mobile applications [34]. Various data from these activities are stored or processed directly in public or
mobile applications [34]. Various data from these activities are stored or processed directly in public
private clouds [35].
or private clouds [35].
The knowledge layer is responsible for data analysis. To do so, it uses a big data architecture.
The knowledge layer is responsible for data analysis. To do so, it uses a big data architecture.
This layer becomes the engine of the online education model [36]. This layer processes all the data
This layer becomes the engine of the online education model [36]. This layer processes all the data
found in the different sources and analyzes it through various data mining algorithms. The information
found in the different sources and analyzes it through various data mining algorithms. The
passes to the AI system, which generates knowledge about the results obtained and interacts with the
information passes to the AI system, which generates knowledge about the results obtained and
students and the areas in charge of learning [23].
interacts with the students and the areas in charge of learning [23].
The service layer is the integration of the systems and layers already mentioned in the LMS and
The service layer is the integration of the systems and layers already mentioned in the LMS and
presented to members of the online education modality [37]. The way the information is presented can
presented to members of the online education modality [37]. The way the information is presented
also be presented in different systems related to the educational model.
can also be presented in different systems related to the educational model.
4.2. Analysis of Data
4.2. Analysis of Data
Data analysis is of vital importance in this work due to the large amount of data that is expected to
Data analysis
be processed, is of vital
in addition importance
to the inthat
type of data thisitwork duetotointegrate
intends the largeinto
amount of data that
the analysis. is expected
The technology
to bemeets
that processed, in addition of
the characteristics to this
the work
type of data
is big thatThe
data. it intends to that
objective integrate into the
this fulfills is toanalysis.
analyze Thethe
technology that meets the characteristics of this work is big data. The objective
data that comes from different repositories. The data generated by the students from the that this fulfills is to
activities
analyze
they theout,
carry data
asthat
wellcomes
as thefrom different
interaction repositories.
with the LMS isThe datain
stored generated by the students
its own database from the
in a structured
activities
way. they carry
However, if onlyout, asdata
these wellareas considered,
the interaction with theisLMS
granularity is storedininthe
not obtained itsanalysis.
own database
Moreover,in a
7. Appl. Sci. 2020, 10, 5371 7 of 18
the results will be segmented to the corresponding scores for each activity. This does not mean that
real data is being obtained on the learning of each student. Therefore, it is necessary to integrate more
information to the analysis architecture, as universities generally store the socioeconomic information
of students and in some cases include relevant information on the academic performance of basic
training institutions. This information allows the discovery of possible trends in the students and the
way in which they learn [38]. All the aforementioned refers to structured data; however, this work
aims to obtain information from students through all available sources, such as social networks.
The big data framework used for this work is based on Hadoop. This framework allows the
processing of large volumes of data regardless of its type [39]. This feature and the reliability of
Hadoop allows the analysis of as many variables as possible, guaranteeing granular and quality results.
Hadoop, being an open-source system, allows storing, processing, and analyzing of academic data
at no additional cost to the institution [40]. The Hadoop components that allow us to pose it as the
ideal architecture for this work are the Hadoop Distributed File System (HDFS), which allows the data
file not to be saved on a single machine but rather to be able to distribute the information to different
devices. Mapreduce is a framework that makes it possible to isolate the programmer from all the tasks
of parallel programming. It allows a program that has been written in the most common programming
languages to be run in a Hadoop cluster [41]. YARN is a framework for task planning and cluster
resource management.
4.2.1. Hadoop Operation
MapReduce sends the computational process to the site where the data to be processed resides,
which is collected in a cluster. When a MapReduce process is launched, the tasks are distributed
among the different servers in the cluster and Hadoop manages the sending and receiving of data
between nodes. Computing happens at nodes that have data on the premises to minimize network
traffic. Once all the data has been processed, the user receives the result of the cluster.
MapReduce contains two phases, although the second is subdivided into two others:
• Map.
• Reduce: shuffle data and reduce.
4.2.2. Phases in Hadoop MapReduce
In Hadoop MapReduce, the input data is divided into separate chunks that are processed by
the mappers in parallel. The results of the map are ordered, which are the input for the reducers.
Generally, the inputs and outputs of jobs are stored in a file system, these being the storage and compute
nodes [42]. It is common that the application logic cannot be decomposed into a single MapReduce
run, so several phases are chained, treating the results of one as input for the mappers of the next
phase. This feature allows the tasks of each fragment to be executed on the node where it is stored,
reducing the data access time and movements between nodes in the cluster [40].
The framework is also responsible for managing resources, planning, restarting, and monitoring
tasks with the Hadoop YARN manager, which has a single resource manager and a node manager on
each node of the cluster [33].
• The map phase runs on subtasks called mappers. These components are responsible for generating
key-value pairs by filtering, grouping, ordering, or transforming the original data. Intermediate
data pairs are not stored in HDFS.
• The shuffle sort phase may not be necessary. It is the intermediate step between map and reduce
that helps to collect data and sort them conveniently for processing. With this phase, the repeated
occurrences in each of the mappers are added.
• The reduce phase manages the aggregation of the values produced by all the mappers in the
system or by the key-value-type shuffle phase based on their key. Finally, each reducer generates
its output file independently, generally written in HDFS.
8. Appl. Sci. 2020, 10, 5371 8 of 18
Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 18
InInFigure
Figure2,2,the
theclassification
classificationofofthe
theclusters
clusterswhere
wherethe theprocessing
processingofofeach
eachnode
nodeisisassigned
assignedisis
observed,and andthetheMapReduce
MapReduceframework
frameworkhas hasaamaster/slave
master/slavearchitecture.
architecture.ItIthas
hasaamaster
masterserver
serveroror
JobTracker and several slave servers or TaskTrackers, one for each node in the cluster. The
JobTracker and several slave servers or TaskTrackers, one for each node in the cluster. The JobTracker JobTracker
thepoint
pointofofinteraction
interactionbetween
betweenusers
usersand
andthetheMapReduce
MapReduceframework.
framework.Users
Userssubmit
submitMapReduce
MapReduce
jobstotothe JobTracker,
the JobTracker, which
which puts them
puts in a in
them pending job queue
a pending and runs
job queue andthem
runsinthem
the order of order
in the arrival.of
The JobTracker manages the assignment of tasks and delegates the tasks to the
arrival. The JobTracker manages the assignment of tasks and delegates the tasks to the TaskTrackers. TaskTrackers.
TaskTrackersexecute
executetasks
tasksunder
underthethecommand
commandofofthe theJobTracker
JobTrackerandandalso
alsohandle
handlethe themovement
movementofof
data between the map phase and reduce
data between the map phase and reduce [43]. [43].
Figure 2. Architecture of a MapReduce master/slave model [13].
Figure 2. Architecture of a MapReduce master/slave model [13].
MapReduce components align with the type of analysis required in developing a system that
MapReduce components align with the type of analysis required in developing a system that
integrates AI, data analysis, and LMS. One of the conditions that establishes the use of Hadoop is
integrates AI, data analysis, and LMS. One of the conditions that establishes the use of Hadoop is that
that a real-time analysis is not necessary, but the architecture must guarantee the handling of a large
a real-time analysis is not necessary, but the architecture must guarantee the handling of a large
volume of data, as well as its diversity [44]. Another factor that has been considered for the use of this
volume of data, as well as its diversity [44]. Another factor that has been considered for the use of
architecture is the knowledge that exists on the part of the IT area of the university that participates in
this architecture is the knowledge that exists on the part of the IT area of the university that
the study.
participates in the study.
4.3. Artificial Intelligence
4.3. Artificial Intelligence
AI includes several tools that can be exploited by an online education model to endow systems
withAIspecial
includes several toolsthat
characteristics that allow
can bethe exploited
creationbyofanvirtual
online assistants
education thatmodel to endow
interact systems
directly with
with special characteristics that allow the creation of virtual assistants that
students. The AI aims to take the data that has been previously processed by big data and look forinteract directly with
patterns The AI aims
in them. In to
thistake
way,thethe
data that has
system canbeen previously classify
autonomously processed theby big data
results andand look for
recommend
patterns in them. In this way, the system can autonomously classify the results
different actions to students and tutors of the modality [45]. Among the AI tools that can perform this and recommend
different actions
type of work to students and tutors of the modality [45]. Among the AI tools that can perform this
are:
type of work are:
• Expert systems are systems highly trained in a specific intellectual activity, based on the knowledge
• Expert systems are systems highly trained in a specific intellectual activity, based on the
of experts in the field. A classic example is that of systems that play chess.
knowledge of experts in the field. A classic example is that of systems that play chess.
• Chatbots are systems that make an interesting use of natural language processing and improve
• Chatbots are systems that make an interesting use of natural language processing and improve
with each experience, allowing coherent two-way communication with humans, either oral
with each experience, allowing coherent two-way communication with humans, either oral or
or written.
written.
•• Virtualassistants
Virtual assistantsarearethe
theclosest
closestthing
thingto toaamovie
movieAIAIthat
thatwewecan caninteract
interactwith
withtoday.
today.ItItrecognizes
recognizes
our voice,
our voice, adapts
adaptstotothe way
the way weweask ask
for things, and isand
for things, ableisto able
recommend entertainment
to recommend according
entertainment
according to our tastes. One of the strengths of these technologies is that they have an immenseof
to our tastes. One of the strengths of these technologies is that they have an immense number
users who
number feed them
of users constantly
who feed and help reinforce
them constantly and helptheir learning
reinforce theiralgorithms.
learning algorithms.
•• Machine learning are computer programs that
Machine learning are computer programs that try to learn from try to learn from previous
previous experience
experienceand and
examples, and
examples, and have
havea specific
a specificand and
predetermined purpose
predetermined that is generally
purpose modeling,modeling,
that is generally predicting,
understanding patterns in the data, or controlling some system.
predicting, understanding patterns in the data, or controlling some system.
According to the description of the main types of AI and how it is presented to the user, it is
necessary to identify what is the need to be covered in the investigation. In the first instance, there is
9. Appl. Sci. 2020, 10, 5371 9 of 18
According
Appl. Sci. 2020, 10, x to
FORthe description
PEER REVIEW of the main types of AI and how it is presented to the user,9 of it 18
is
necessary to identify what is the need to be covered in the investigation. In the first instance, there is a
need forfor
anan
autonomous
autonomous system
systemthatthat
cancan
generate knowledge
generate knowledge of the
of data that is
the data already
that obtained
is already from
obtained
the analysis
from [24]. The
the analysis second
[24]. The instance
second is that theisAIthat
instance can the
interact withinteract
AI can the user.with
These thecharacteristics
user. These
define expert systems
characteristics or Chatbots
define expert as ideal
systems systems as
or Chatbots forideal
managing
systemsstudents. However,
for managing it is required
students. However,that
theistool
it has thethat
required capacity
the toolto generate learning to
has the capacity on generate
data for which it was
learning neverfor
on data programmed
which it was andnever
that,
according to this
programmed andlearning, it recommends
that, according certain activities
to this learning, to students
it recommends andactivities
certain teachers. toFor this reason,
students and
a machineFor
teachers. learning modela is
this reason, used. learning model is used.
machine
To implement
implement aa machine
machine learning
learning model,
model, there
there are
are two
two main
mainstrategies:
strategies:
•• Supervised learning: For For this
this methodology,
methodology, aa previous
previous training
training phasephase (datasets)
(datasets) is is required,
required,
where hundreds of labels are
where hundreds of labels are introduced. introduced. If a machine is required to be able to recognize
recognize between
dogs andandcatscatsinin a photo,
a photo, then then we have
we have to show
to show the program
the program thousands thousands
of images of where
imagesit where
becomes it
becomes
clear whatclear whatWhat
is a cat? is a cat?
is a What is a dog?
dog? After this After
trainingthisphase,
trainingthephase,
program thewould
program would
be able be able
to identify
to identify
each each of the
of the animals animalscircumstances.
in different in different circumstances.
This method This method
is called is called classification.
classification. Another type
Another
of type learning
supervised of supervisedwouldlearning wouldwhich
be regression, be regression,
is the same which is the same
as following as following
a continuous value.a
continuous value. It is somewhat similar to the machine being able to
It is somewhat similar to the machine being able to follow logical values, such as if there is a follow logical values, such
as if there is
numerical a numerical
series of 2, 4, 6 series
that the of machine
2, 4, 6 that the machine
is able to followisitable
as 8,to10,follow it asis8,used
12. This 10, 12. This is
especially
used
for especially for prediction.
prediction.
•• Unsupervised learning:
Unsupervised learning:InInthis thisprocedure,
procedure, a training
a training phase is not
phase is required,
not required, and theandmachine must
the machine
be able
must betoable
understand
to understandand find andpatterns in the information
find patterns itself directly.
in the information An example
itself directly. is to group
An example is
students into homogeneous groups. If the information from
to group students into homogeneous groups. If the information from thousands of clients with thousands of clients with
unstructured data
unstructured dataisisdisclosed
disclosedtotothe thesystem,
system, the
the computer
computer system
system would
would be beableable to recognize
to recognize the
the characteristics of the students, and segment it into profiles with similar
characteristics of the students, and segment it into profiles with similar criteria. This problem is criteria. This problem
is called
called clustering
clustering orordatadata agglomeration.This
agglomeration. Thisisisuseful
usefulto toreduce
reducethe the number
number of of total
total variables
variables
to 22 or
to or 33 maximums,
maximums, so so that
that there
there is
is no
no loss
loss ofof information,
information, and and thus
thus the the data
data cancan be be visualized,
visualized,
visually facilitating its understanding.
visually facilitating its understanding.
Phases Phases
for thefor the Implementation
Implementation of Machine
of Machine Learning
Learning
Before thinking about the technological solution, it is necessary to address the business objective
that is sought to be solved with a machine
machine learning
learning tool. The goals can be as diverse as improving
conversions, reducing churn, or increasing user satisfaction [46]. The
The important
important thing
thing is
is to be clear
about which element to optimize to focus resources on it and not to implement a solution that exceeds
the original goal [12].
Figure 3 shows the different phases of the machine learning process and how they interact with
each other.
Figure 3.
Figure Phases for
3. Phases for the
the implementation
implementation of
of aa machine
machine learning
learning model.
model.
1. To understand the problem, it is important to understand the problem that we have to solve.
Normally, this takes a long time, especially if the problem comes from a sector in which
knowledge is poor. In this phase, it is necessary to create collaborative environments with people
who know a lot about the problem.
10. Appl. Sci. 2020, 10, 5371 10 of 18
1. To understand the problem, it is important to understand the problem that we have to solve.
Normally, this takes a long time, especially if the problem comes from a sector in which knowledge
is poor. In this phase, it is necessary to create collaborative environments with people who know
a lot about the problem.
2. To understand the data, it is common to do an exploratory analysis of the data to become familiar
with it. Descriptive statistics, correlations, and graphs are performed in exploratory analysis to
better understand the story the data is telling. Furthermore, it helps to estimate if the available
data is sufficient, and relevant, to build a model.
3. Defining an evaluation criterion is usually an error measure. Typically, the root-mean-square
error is used for regression problems and the cross entropy is used for classification problems.
For classification problems with two classes that are common, other measures, such as accuracy
and completeness, are used.
4. Evaluation of the current solution: Probably, the problem to be solved with machine learning, is
already being solved in another way. Surely, the motivation to use machine learning to solve this
problem is to get better results. Another common motivation is to get similar results automatically,
replacing boring manual work. By measuring the performance of the current solution, it can
be compared to the performance of the machine learning model. In this way, the feasibility of
using the machine learning model is identified. If there is no current solution, a simple solution
can bee defined that is very easy to implement. For example, predicting a student’s grade in a
course with automatic learning is comparable to a simple solution (the average value of their
qualifications during an academic period). Only in this way, when the machine learning model is
implemented, is it possible to define if it is good enough, if it needs to be improved, or if it is not
worth implementing. If in the end it turns out that the current solution or a simple solution is
similar to the machine learning solution, it is probably better to use the simple solution.
5. Prepare the data: Although this process is carried out by the big data section, it is necessary to
detail certain factors in the machine learning phases. Data preparation is one of the phases of
machine learning that involves more effort. The main challenges are incomplete data. It is normal
that the ideal data for the machine learning process is not available. For example, to predict
which students are more likely to enter an online educational model, the data we have comes
from an online survey. There will be many people who have not filled in all the fields. However,
incomplete data is better than having no data at all, and there are several actions that can be
used to prepare the data, such as deleting it, imputing it with a reasonable value, imputing it
with a machine learning model, or doing nothing and using some machine learning technique
that handles incomplete data. When combining data from various sources, some data may come
from a database, others from a spreadsheet, from files, etc. It is necessary to combine the data so
that the machine learning algorithms can consider all the information. Calculating the relevant
features (machine learning algorithms) works much better with relevant features instead of pure
data [47]. As an example, it is much easier for people to know the temperature in degrees Celsius
than to know how many milligrams of mercury have been dilated in a traditional thermometer.
6. Building the model: The phase of building a machine learning model, once the data is ready,
surprisingly requires little effort. This is because there are already several machine learning
libraries available. Many of them are free and open source. During this phase, which type of
machine learning technique to use it chosen. The machine learning algorithm will automatically
learn to get the right results with the historical data that has been prepared.
7. Error analysis: This phase is important to understand what needs to be done to improve machine
learning results. In particular, the options will be use a more complex model, use a simpler
model, identify the need to include more data and/or more characteristics, develop a better
understanding of the problem, etc. In the error analysis phase, it is important to ensure that the
model is capable of generalization. Generalization is the ability of machine learning models to
produce good results when they use new data. In general, it is not difficult to achieve acceptable
11. the data preparation phases, which requires that the machine learning model communicates
with other parts of the system and that the results of the model are used in the system.
Furthermore, errors must be automatically monitored. The model warns if model errors grow
over time to rebuild the machine learning model with new data, either manually or
automatically.
Appl. Sci. 2020, 10, 5371 The construction of interfaces for the data is necessary so that the model
11 of can
18
obtain data automatically and so that the system can use its prediction automatically.
4.4. results using
Integration of this process.
Big Data, However,
Machine to get
Learning, andexcellent
LMS results, we have to iterate over the previous
phases several times. With each iteration, the understanding of the problem and the data will
For the
grow. integration
This allows the of design
systemsofand newrelevant
better technology, a model,
features such as the
and reduces thatgeneralization
shown in Figure 4, is
error.
used, where the LMS has a large volume of data on all activities and interaction
A greater understanding also offers the possibility of choosing with more criteria the machine with the student. The
interaction
learningistechnique
not direct;thathowever, it is
best suits thecommon
problem. for there to be information in the LMS database on
8.how Model integrated into a system. Once the model hasOther
long each student remains active on the platform. been information
adjusted based thatoncan be obtained
error, the machineis the
usuallearning model is integrated into the LMS. The phase of integrating a machine learning model intoin
schedule in which each student connects [19]. To these data are added those that are stored
databases
a system of requires
administrative
a greaterand othereffort.
relative academic systems. to
It is necessary This information
be able allows an
to automatically analysis
repeat that
the data
covers a greater number of variables that the big data architecture is
preparation phases, which requires that the machine learning model communicates with other in charge of processing [48].
The architecture
parts of the systemofand bigthat
datathe
in results
its firstofphase is responsible
the model are used for extracting
in the system. data from all sources,
Furthermore, errors
andmust
this data is structured and unstructured [49]. Once it obtains all the
be automatically monitored. The model warns if model errors grow over time to rebuild data, it processes it in such
the a
waymachine
that it is learning
useful formodel
obtaining the knowledge that the AI is in charge of through
with new data, either manually or automatically. The construction of machine learning.
Machine learning is responsible
interfaces for the data is necessary for recognizing the patterns
so that the model of analysis
can obtain and with them
data automatically andperforms
so that thethe
classification of individuals. The patterns
system can use its prediction automatically. are presented as characteristics of each group, where the
objective is that, by knowing the needs of each group, the system has the ability to propose strategies
or Integration
4.4. techniquesofthat Bigimprove the way
Data, Machine activities
Learning, andareLMS presented [50]. Furthermore, it improves learning
by recommending learning activities to students based on their needs.
For the integration of systems and new technology, a model, such as that shown in Figure 4,
Once the activities have been recommended, machine learning enters a state of analysis of the
is used, where the LMS has a large volume of data on all activities and interaction with the student.
results. For this, the system analyzes the grades that students obtain in the recommended activities.
The interaction is not direct; however, it is common for there to be information in the LMS database on
If the results show that the student improved their performance, the process ends and returns to the
how long each student remains active on the platform. Other information that can be obtained is the
initial state. If the system detects that the results do not exceed the average mark, defined as the basis
usual schedule in which each student connects [19]. To these data are added those that are stored in
for the university’s policies, the system feeds back and integrates this data into the analysis phase,
databases of administrative and other academic systems. This information allows an analysis that
where the system begins the process again until satisfactory results are obtained.
covers a greater number of variables that the big data architecture is in charge of processing [48].
Figure 4. Big data integration model—Machine learning and LMS.
Figure 4. Big data integration model—Machine learning and LMS.
The architecture of big data in its first phase is responsible for extracting data from all sources,
5. Discussion
and and Results
this data is structured and unstructured [49]. Once it obtains all the data, it processes it in such a
way that it is useful for obtaining the knowledge
The new normality that humanity lives inthat theinstitutions
forces AI is in charge of through
to seek machine
new models thatlearning.
adapt to
the needslearning is responsible
of people. This paperfor recognizing
takes the patterns
this consideration intoofaccount
analysisand
andseeks
withto
them performs
improve the
an online
classification of individuals. The patterns are presented as characteristics of each group, where the
objective is that, by knowing the needs of each group, the system has the ability to propose strategies
or techniques that improve the way activities are presented [50]. Furthermore, it improves learning by
recommending learning activities to students based on their needs.
Once the activities have been recommended, machine learning enters a state of analysis of the
results. For this, the system analyzes the grades that students obtain in the recommended activities.
If the results show that the student improved their performance, the process ends and returns to the
initial state. If the system detects that the results do not exceed the average mark, defined as the basis
12. Appl. Sci. 2020, 10, 5371 12 of 18
for the university’s policies, the system feeds back and integrates this data into the analysis phase,
where the system begins the process again until satisfactory results are obtained.
5. Discussion and Results
The new normality that humanity lives in forces institutions to seek new models that adapt to
the needs of people. This paper takes this consideration into account and seeks to improve an online
education model. The integration of technologies becomes the starting point to improve education
and monitor student performance. It should be noted that the current reality has allowed online,
virtual, or hybrid education models to become the expected response to continue with higher learning.
This work is applied on the architecture and infrastructure of the university that participated in the
study. This is considered an advantage, since, having the majority of the infrastructure deployed,
it allows the concentration of efforts on the design of the machine learning model. If there is a need to
modify any layer of the architecture, it is simply updated without the need to generate higher technical,
human, or economic costs.
With the integration of these technologies, the monitoring of student performance is improved,
which generally depends clearly on the criteria of the teacher or those in charge of learning. With this
model, the monitoring does not have human actors, the systems are in charge of carrying out a
continuous analysis of each student, and the machine learning model will even detect the cases that
have the highest risk of low academic performance. This feature allows thee generation of an early
warning that is currently established when the academic monitoring department knows a certain
number of grades. Early detection of the comprehensive model allows the generation of projections
based on the student’s history. For example, in students who had problems in the subject of introduction
to calculus, the system recognizes them as possible cases with problems in calculus I and subjects whose
prerequisite is introduction to calculus. This analysis can be very superficial; however, the system can
even determine a possible case of repetition by analyzing the topics that make up a subject.
For the recommendation of activities, machine learning has knowledge of the student’s
performance in each activity. Therefore, the decision is made based on the best results that the
student obtains in each activity. For example, cases have been detected where type activities, rapid
evaluations by means of true and false items, do not align with the need of a certain group of students.
The model identifies these groups and recommends other types of activities to the course designer.
For this, the development of active learning is taken as an essence. In this type of learning, a wide
variety of activities have been developed that machine learning proposes to the student according to
their needs.
In order to evaluate the proposed model, several exercises were carried out in which the two
parallels that belong to an administrative career were involved. Each parallel is made up of 24 students,
and the follow-up period was 16 weeks, which generally lasts one academic period. Each level is made
up of five subjects, among which students must take general, complementary, and professionalizing
subjects. The sample of students belongs to the fourth level. The main reason why this group was
chosen is for information obtained from the academic monitoring department. Here, it was found that
the first two years of study is where the highest dropout rate is recorded. In addition, students at this
level have taken all computer science subjects, allowing them to adapt more easily to a model based
on the integration of technologies. The online education model of the university participating in the
study complies for each course or subject with an already standardized model consisting of 16 weeks.
These are divided into two partials each of seven weeks plus one partial evaluation. Within the LMS,
specifically Moodle in the case of the university, each of the courses is created and registered and these
have been divided into modules that respond to each week of the period. The courses consist of a
main module that provides detailed information on the type of study, the matter, and the assigned
tutor. In the same way, the student will find the syllabus and the study guide that allows him to know
exactly the topics to be reviewed and the activities to be completed. Within each week, the module is
13. Appl. Sci. 2020, 10, 5371 13 of 18
divided into sections that contain the resources, activities, and corresponding information to assign an
asynchronous meeting with the tutor.
In the resources section, each tutor is in charge of uploading all the material corresponding to the
topic of the week. These resources must be aligned according to the learning results of the subject.
The tutor usually uploads his own material, such as a presentation, the resolution of an exercise,
or a reading. In addition, it must include supporting material, such as videos, readings, scientific
articles, etc.
In the activities section, the student finds everything to do during each week. An activity is an
opinion forum, where the student comments critically and objectively on a topic raised by the tutor.
Another activity that the student must complete is a task that meets the requirements set forth in
Bloom’s taxonomy. The objective of this theory is that after completing a learning process, the student
acquires new skills and knowledge. For this reason, it consists of a series of levels built with the
purpose of ensuring meaningful learning that lasts throughout life. The levels of Bloom’s taxonomy
are know, understand, apply, analyze, evaluate, and create. In addition, the student must complete a
questionnaire-type evaluation, the purpose of which is to encourage students to read the resources.
The last section maintains the information corresponding to the asynchronous meeting with
the tutor. The objective of the meeting is that students can make all the queries directly to the tutor or
can receive feedback on the activities or topics discussed. Each meeting lasts 60 min. In this model,
these meetings are not mandatory and the student can review the recording as many times as they
deem necessary.
Once the scenario where the model is integrated has been defined, the variables that explain the
dropout are established. The set of variables is the university degree corresponding to the numerical
value of the general average of a student’s secondary studies, the number of subjects passed, the number
of enrollments in the defined periods, the subjects taken (between 1 and 20, coded according to the
average number of subjects taken), the sex, and the age of the students between 19 and 30 years old.
The problem addressed refers to the detection of the causes of university dropout; previous works
have considered desertion to constitute the failure of a student in a consecutive period. In the first
exercise, big data requires access to all logs of activities carried out by teachers and students that are
usually stored in MySQL. All the data obtained from the different sources went through a processing
and transformation phase in order to obtain clean data that are analyzed by Hadoop in search of the
patterns that the students follow.
In Figure 5, the patterns of the first exercise are presented, where the results of the activities
carried out by the students during the established period are obtained. In the “x” axis, the activities,
where H1 is the forums, H2 the tasks, and H3 the questionnaire-type evaluations, are shown. On the
“y” axis, the obtained grade is presented. It is necessary to indicate that the grades respond to the
use of rubrics that guarantee learning. These grades range from 1 to 10. On this axis, six is marked
as an acceptable grade that meets the minimum learning criteria. In the forums, it is observed that
the learning level is high in most cases, and the low grades are mostly due to the fact that the student
did not register their participation or that the contributions were not objective. In the task, based on
Bloom’s taxonomy, mean values are obtained that represent that a part of the students adequately
meets the requirements of the activity. The group closest to 1 is the questionnaire-type evaluations.
These evaluations consist of 10 questions that are scheduled to be completed in 20 min, where the
student must answer each question in an average of two minutes. In this activity, the values are
extremely low and do not contribute to learning.
14. not register their participation or that the contributions were not objective. In the task, based on
Bloom’s taxonomy, mean values are obtained that represent that a part of the students adequately
meets the requirements of the activity. The group closest to 1 is the questionnaire-type evaluations.
These evaluations consist of 10 questions that are scheduled to be completed in 20 min, where the
Appl. must
Sci. 2020, 10,answer
5371 each question in an average of two minutes. In this activity, the values
14 ofare
18
extremely low and do not contribute to learning.
Figure 5. Data analysis of the activities developed in an online education model with the use of big
data. H1: Forums, H2: Homework, H3: Evaluations.
The result obtained by big data is taken by the AI to feed machine learning and learn about this
data for decision-making. The AI model integrated the analysis, the data from the LMS in relation to
the time of dedication of the students to the reading of the teacher’s resource, and the data from a
survey carried out on the students, where the time they had to answer each question was discussed.
The data from this analysis was subjected to the naive Bayes data mining algorithm with the results
presented in Table 1.
Table 1. Stratified cross-validation.
Correctly Classified Instances 48 94.1176%
Incorrectly Classified Instances 3 5.8824%
Kappa statistic 0.9113
Mean absolute error 0.0447
Root mean squared error 0.1722
Relative absolute error 10.0365%
Root relative squared error 36.4196%
Total Number of Instances 51
The algorithm performed the analysis of 51 instances to identify the reason why the scores in
the evaluations present a performance below the expected. Of the 51 instances, 48 were classified
as correct, with 94.1176%. This value was considered as true to assume the decision of the analysis.
The results are presented in Table 2.
Table 2. Matrix of confusion.
A B C ← Classified as
15 0 0 | a = T. Dedication
0 18 1 | b = T. Question
0 2 15 | c = Difficulty
The results obtained gave as a result that the time available to answer each question (2 min),
damages the development of the evaluation. These results were compared with the number of
evaluations that the LMS closed because the evaluation time was completed. The number of instances
that detect this effect are 18 effective and one erroneous or that the analysis detected it as an evaluation
difficulty. In the time of the dedication of the students to the reading of the teacher’s resources, 15 true