Online Tutoring and Student Outcomes during the COVID-19 Pandemic

Contributed by:
Jonathan James
This paper evaluates the effectiveness of an intervention implemented in Italian middle schools that provides free individual tutoring online to disadvantaged students during lock-down. Tutors are university students who volunteer for at least 3 hours per week. They were randomly assigned to middle school students, from a list of potential beneficiaries compiled by school principals. Using original survey data collected from students, parents, teachers, and tutors, we find that the program substantially increased students’ academic performance and that it significantly improved their socio-emotional skills, aspirations, and psychological well-being. Effects are stronger for children from lower socioeconomic status and, in the case of psychological well-being, for immigrant children.
1. Apart but Connected: Online Tutoring and Student
Outcomes during the COVID-19 Pandemic ∗
Michela Carlana †, Eliana La Ferrara ‡
This version: February 2021
Abstract
In response to the COVID-19 outbreak, the governments of most countries or-
dered the closure of schools, potentially exacerbating existing learning gaps. This
paper evaluates the effectiveness of an intervention implemented in Italian middle
schools that provides free individual tutoring online to disadvantaged students dur-
ing lock-down. Tutors are university students who volunteer for at least 3 hours
per week. They were randomly assigned to middle school students, from a list of
potential beneficiaries compiled by school principals. Using original survey data
collected from students, parents, teachers and tutors, we find that the program sub-
stantially increased students’ academic performance (by 0.26 SD on average) and
that it significantly improved their socio-emotional skills, aspirations, and psycho-
logical well-being. Effects are stronger for children from lower socioeconomic status
and, in the case of psychological well-being, for immigrant children.
Keywords: tutoring, COVID-19, education, achievement, aspirations, socio-
emotional skills, well-being.
∗ We thank seminar participants at several universities and webinars for helpful comments. Micol
Morellini, Vrinda Kapoor, Angelica Bozzi, Marco Cappelluti, Isabela Duarte, Agnese Gatti, Gaia Gau-
denzi, Federica Mezza, Chiara Soriolo, Amy Tan and Monia Tommasella provided excellent research
assistance. We are grateful to the schools that took part into the intervention for their collaboration, to
the team of pedagogical experts guided by Giulia Pastori and Andrea Mangiatoridi and including Anna
Maria Carletti, Paola Catalani, Silvia Negri, Doris Valente, Stefania Zacco and Monica Zanon. We also
thank our team of tutor supervisors (Angela Caloia, Leila Pirbay, Ilaria Ricchi, and Giulia Zaratti). La
Ferrara acknowledges financial support from the Invernizzi Foundation. Carlana acknowledges RAship
support from the Malcolm Wiener Center for Social Policy at Harvard Kennedy School. AER RCT
Registry ID: AEARCTR-0002148. The project has obtained IRB approval from Bocconi University and
Harvard University.
† Harvard Kennedy School, CEPR, and IZA (e-mail: michela [email protected]).
‡ Department of Economics, IGIER and LEAP, Bocconi University, and CEPR (e-mail:
2. 1 Introduction
In response to the COVID-19 outbreak, schools have closed in over 190 countries (UN-
ESCO, 2020). School closure has created massive learning losses for children (Grewenig
et al., 2020; Psacharopoulos et al., 2020), estimated in up to 0.3 standard deviations in
achievement test scores (Maldonado and De Witte, 2020) and 0.9 years of schooling for
seven months of shuttered school buildings (Azevedo et al., 2020). The pandemic has also
had adverse psychological and social effects for children and adolescents, leading to higher
depression and lower development of socio-emotional skills (Orgilés et al., 2020; Golber-
stein et al., 2020). The combination of these effects risks having long term consequences
on the human capital of the cohorts affected by school closures.
While many countries have tried to mitigate learning losses by switching to remote
instruction and using asynchronous or synchronous platforms, the implementation of these
tools has varied substantially. Even within the same country, schools in wealthier areas
have showed higher prevalence of synchronous learning and online participation of students
(Malkus, 2020; Chetty et al., 2020). High-income students also have access to better
homeschooling inputs, including technology, help from parents (Agostinelli et al., 2020),
and online learning resources (Engzell et al., 2020; Bacher-Hicks et al., 2020; Doyle, 2020),
which exacerbates educational inequalities.
This paper reports the results of a novel policy experiment launched in Italy, the first
country severely affected by the Covid-19 pandemic after China. In 2020, Italian schools
were closed from the beginning of March until the summer – more than 1/3 of the en-
tire school year. In response to this, our research team designed and implemented an
innovative online tutoring program: TOP (“Tutoring Online Program”). The program
targeted middle school students (grade 6 to 8) from disadvantaged background in terms
of socioeconomic status, linguistic barriers, or learning difficulties, who were identified by
school principals among those lagging behind during distance learning. The program was
offered to middle schools from all over Italy on a voluntary basis, and it was completely
TOP has two defining features. First, tutoring is entirely online. While tutoring has
shown promising results when done in person by teachers and paraprofessionals (Nickow
et al., 2020), such mode of delivery was impossible during lockdown. All interaction in
our program occurs through personal computers, tablets or smartphones. Second, the
tutors in TOP are not trained professionals but volunteer university students, trained
and supported by pedagogical experts. While teachers and professionals are certainly
qualified, the skills required for tutoring differ from those for classroom interaction (Cook
1
3. et al., 2015). The choice of volunteer tutors has advantages in terms of budget (mobi-
lizing resources to hire professionals may not allow for a rapid response and large scale
implementation), and possibly also in terms of the quality of inter-personal interaction,
as TOP leverages the intrinsic motivation of university students to be volunteers.
Four weeks after the announcement of the school shutdown by the Italian government,
we emailed the principals of all Italian middle schools to introduce the program and ask for
a list of potential beneficiary students. At the same time we emailed all students enrolled
in three large universities in Milan, the second largest city in the country, offering them
the possibility to volunteer for a minimum of three hours per week until the end of the
school year. The response was extraordinary. Two weeks later, online tutoring activities
We received a total of 1, 059 ‘valid’ applications from 76 different middle schools from
all over Italy.1 For each student, the school had indicated which subject they needed
help with, among math, Italian and English. 81 percent of the students needed help
in more than one subject. We randomly assigned a tutor to 530 of the 1.059 applicants,
conditioning on ten ‘blocks’ based on the timing of the valid application. Due to budgetary
and administrative constraints, 530 was the maximum number of tutors to whom we could
offer training and pedagogical support. In fact, in order to equip tutors with a basic set of
pedagogical skills and to help with potential problems in the relationship with children,
we worked with a team of education experts to design an online self-training module for
tutors and to hold regular group meeting and on-demand one-to-one sessions with expert
educators (see section 2.2.3).
We collected baseline data from students, parents and tutors before the start of the tu-
toring (first half of April), and follow-up data from students, parents, tutors and teachers
at the end of the school year (June). Thanks to the over-subscription and random alloca-
tion of tutors to students, we can estimate the causal impact of the program on four sets
of outcomes: academic performance, aspirations, socio-emotional skills, and psychological
We find sizeable and significant improvements for students who were assigned an online
tutor compared to those who were not. Time devoted to homework and attendance to reg-
ular online classes increased, as reported by students as well as by teachers. Performance
in a standardized test that we administered at endline and that covered math, Italian and
English improved by 0.26 standard deviations (SD). The effects are particularly strong for
math, which is the subject on which the majority of tutoring sessions focused. Teachers’
1 An application was considered ‘valid’ when the parent had given informed consent and the child had
given assent, and when both parent and child had completed their own (online) baseline questionnaire.
2
4. assessments of learning also improved for treated students compared to control ones, by
0.18 SD. These are remarkable effects given that the median length of the online tutoring
was around 5 weeks.
Our second category of outcomes relates to educational aspirations, in particular, type
of high school track the student plans to enroll in, and likelihood and perceived ability to
attend university. We polled students, parents and teachers about these. The resulting
index of educational aspirations shows a 0.15 SD increase for students in the program
compared to the control group.
We also measured students’ perseverance, grit and locus of control and we find that
TOP increased the value of a composite index capturing these dimensions by 0.14 SD.
The effect is driven by increases in treated students’ perception that they can control
what happens in their lives (locus of control).
Our fourth set of outcomes includes measures of psychological well-being. At the end of
the tutoring treated students were happier and less depressed, as reported by themselves
and by their parents. The effect corresponds to a 0.17 SD improvement in a composite
psychological well-being index.
We examine treatment effect heterogeneity along several dimensions. The first is the
intensity of treatment. While the vast majority of students received 3 hours of online
tutoring per week, a random subset of those students who needed help in more than one
subject (143 out of 427 treated students) were assigned a tutor who gave their availability
for 6 hours per week. We find that performance gains double with the hours of tutoring;
the other outcomes do not.
Another dimension of heterogeneity is access to technology. We know from the tutor
endline survey that around 20 percent of the students connected using a smartphone, as
opposed to a PC or tablet. While one may be concerned that this would diminish the
effectiveness of tutoring –and that such decrease would disproportionately affect children
from lower socio-economic status– we find that it did not. On the other hand, technical
problems (e.g., problems with the internet connection during the tutoring), seem to qual-
itatively decrease the impact of the program, although the effect is imprecisely estimated.
These aspects should be taken into account if one wanted to apply our online tutoring
model to lower income or more remote settings.
In terms of demographics and socio-economic background, we do not detect significant
differences in impact between boys and girls, nor immigrants and natives – except for
the effect on psychological well-being which is entirely driven by immigrants. Improve-
ments in learning outcomes are instead higher for students whose parents have less than
college education, have a blue collar job and do not work from home. Interestingly, tu-
3
5. tor characteristics such as gender, GPA, degree program and pro-social attitudes do not
systematically affect the effectiveness of the tutoring.
Finally, we can estimate how the experience of being a TOP tutor during the pandemic
affected the tutors themselves. We can do so because we randomly selected the university
students to whom we offered the job from the pool of those who applied to be volunteers.2
Four months after the end of the program, we find that volunteers who were included in
the TOP program have significantly higher empathy than those who were not. The effect
corresponds to a 0.27 SD increase. We instead do not find significant effects on tutors’
beliefs regarding the relative role of luck versus hard work in determining success in life.
Our paper contributes to several strands of literature. A robust body of empirical work
shows that in-person tutoring is highly effective for improving academic outcomes. Recent
meta analyses find that the impacts are sizeable (a pooled effect size of 0.37 SD in Nickow
et al. (2020)), and robust across a wide array of contextual factors (Fryer Jr, 2017). The
importance of small group or individual tutoring has been underlined for students who
struggle (Ander et al., 2016) and in order to teach at the right level (Banerjee et al., 2015).
Also, the tutor-student relationships is often close to a mentorship connection that may
affect the development of cognitive as well as social skills, such as prosociality (Kosse
et al., 2020). On the other hand, tutoring is much costlier than classroom instruction and
it may not be easy to arrange individual, in person tutoring in the presence of geographical
constraints. Also, tutoring may sometimes be attached with the stigma of being identi-
fied as a student in-need and pulled out from regular classes (Coie and Krehbiel, 1984;
Richmond, 2015). We contribute to this literature by providing evidence on large-scale
online tutoring, based on volunteer tutors supported and trained by pedagogical experts.
Our model allows to substantially reduce the cost of tutoring –one of the biggest barriers
to large-scale implementation– but also to efficiently reach students located in disadvan-
taged areas through virtual learning. Finally, online tutoring is less observable from peers
than in-person tutoring, which may reduce the sense of stigma possibly attached with this
Our results are of course directly relevant to the debate on effective strategies to mitigate
the effect of Covid-19 on education. The existing evidence suggests that students who lag
behind the most during the pandemic are from low-income families with limited access to
technology, and that they receive less support from parents and lower quality of remote
learning from schools (Bacher-Hicks et al., 2020; Chetty et al., 2020). Different forms
2 Asmentioned above, we could not accept all tutor applicants because we were constrained in the
number of hours of support for tutors that we could pay for. The randomization was conditional on the
characteristics that we use to allocate tutors to students, notably subject of tutoring, number of hours
per week, and previous tutoring experience and training.
4
6. of remote learning instruction have been adopted around the world and, although the
evidence on interventions increasing access to computers and internet is mixed (Escueta
et al., 2017; Malamud and Pop-Eleches, 2011), the impact of digital technology may differ
during school closure compared to normal school years.
To the best of our knowledge, very few policy experiments have attempted to use remote
tools to improve learning outcomes during the pandemic. Angrist et al. (2020) evaluate
two low-tech interventions in Botswana that use SMS text messages and direct phone
calls to support parents in the education of their children. The combined intervention
resulted in a 0.12 SD improvement in student outcomes and led parents to update their
beliefs about their children’s learning level. Hardt et al. (2020) evaluate a remote peer
mentoring intervention at a German university during the pandemic, where peers met
online to discuss self-organization. They find positive effects on motivation and exam
registration, though not on earned credits. Our work contributes to this literature by
evaluating the effects of an innovative and low cost online tutoring program targeting
teenage students who had been adversely affected by school closures, and by showing
impacts on learning outcomes as well as soft skills and psychological well-being.
Finally, recent work on organizations highlights the power of intrinsic motivation and
social recognition for improving public service delivery (e.g., Ashraf et al., 2014; Gauri
et al., 2019). In particular, Levitt et al. (2016) underline that such behavioral aspects
can be leveraged to improve educational performance. While we cannot directly speak to
this question, as we did not vary the recruitment method or the incentives provided to
tutors, the fact that our tutors self-selected into volunteering for the TOP program and
their intrinsic motivation may have contributed to the effectiveness of our intervention.
Furthermore, we do provide evidence that volunteering as a tutor increased empathy
compared to those university students that applied but where not assigned to a student.
2 Intervention and Study Design
2.1 Institutional Background
Italy has been the first country after China hardly hit by the COVID-19 pandemic, with
around 80,000 deaths as of January 2021, one third of which were concentrated in the
region of Lombardy. All school buildings closed on March 5th, 2020. Since then, school
reopening has been repeatedly postponed until September 2020 and, even during the
Fall 2020, many schools had to offer remote learning instruction, depending on regional
5
7. The key components for effective remote school learning are the availability of infras-
tructure and of trained teachers with technological skills. Regarding infrastructure, on
March 26th the Italian Ministry of Education allocated 70 million euro to buy tablets
that students could temporarily borrow and 10 million to improve internet connection
and online platforms of schools.3 This intervention facilitated the access to devices and
internet for disadvantaged students. However, not all students in need were offered a
device due to bureaucratic delays.
As of March 2020, teachers’ digital competences were still somewhat limited, with
less than 50 percent of the teachers using any digital tool in their daily lectures (Agcom,
2019).4 When schools closed due to COVID-19, the response from teachers was extremely
heterogeneous: many students only received instructional packets with homework for
the first few weeks. Training courses to improve teachers’ technological knowledge were
organized starting in Spring 2020 by the regional offices of the Ministry of Education, by
private foundations and web platforms such as Google Classroom and WeSchool. Based
on data we collected on 427 teachers in our 76 sample schools, by the month of June more
than 96 percent of the teachers were providing synchronous online classes. Most of the
teacher-student interaction was synchronous with the entire class of around 22 students.
Around 85 percent of teachers provided some asynchronous videos, usually no more than
one hour per week. Almost all teachers assigned some homework every week.
2.2 The Tutoring Online Program (TOP)
2.2.1 Timeline
Two weeks after the school closure in Italy, we started the process to design and imple-
ment an new program, the “Tutoring Online Program” (henceforth, TOP), as an attempt
to provide immediate response to the emergency situation. We identified a team of peda-
gogical experts who could help us develop the curriculum for tutor training and support
and we contacted the rectors of three large universities in Milan, asking for permission
to advertise our program mong their students. We obtained IRB clearance and between
March 30 and April 3 we sent out email invitations to university students and to the
principals of all Italian middle schools. On April 14 tutoring activities started and they
3 The same decree also allocated 5 million euro for the digital training of teachers (Ministerial Decree
n. 187, 26 March 2020).
4 The digital transition in Italian schools was promoted by the Italian law 107/2015 (the so-called “La
Buona Scuola”). The first step toward digitalization was a tool called ‘electronic class register’, created
to ease communication between teachers and parents. The register includes grades, absences, and other
messages. By the end of the school year 2015-16, more than 90 percent of middle schools were using
electronic class register (Agcom, 2019).
6
8. lasted until the beginning of June. Appendix Figure A.1 shows the timeline of the project,
including the two rounds of data collection. The implementation was entirely supported
by the research team with the help of student volunteers and research assistants. In what
follows we describe the recruitment process and the key features of the program.
2.2.2 Recruiting schools and students
We sent a recruitment email to all Italian middle schools (grades 6 to 8), using publicly
available email addresses. We informed school principals about the support we could
provide with TOP, presenting it as “a free online individual tutoring service to students
currently struggling” during the school closure. We explained that tutoring would be done
by volunteer university students and that it would be for 3 to 6 hours per week. In order to
participate in the program, each school principal had to complete a brief baseline survey
expressing their interest in the project. In few days, more than 100 schools completed
this first step.5
Second, school principals –possibly with the help of teachers– had to complete an ap-
plication form with a list of students including up to three pupils for each class. We
asked to select the students who “may need TOP the most in terms of their learning level
and family environment”. For each child, the form should indicate the preferred subjects
of the tutoring (one or more among math, Italian, and English), and contact details of
the relevant teachers. The school was in charge of contacting parents and ask for their
authorization to share with us the name and surname of the child and the contact in-
formation (email and phone) of one of the parents. We asked schools to make sure the
selected students had internet connection and a computer or tablet.6 We clarified that
we could not guarantee the tutoring to all applicants and that, if the number of requests
exceeded the number of tutors that we could mobilze and support, we would randomly
assign tutors to students, in order to give every applicant the same chances.
We received in total 1, 594 names of students from 78 schools: 57 percent of these
students were identified as needing help in all three subjects, 25 percent in two subjects,
and 18 percent in one.7 The research team contacted all parents to collect informed
5 We also received some support from a few regional offices of the Italian Ministry of Education,
which helped spreading the information on the project. However, the enrollment of schools was almost
completed when we received this additional support.
6 As clarified in Section 2.1, the Ministry provided to each Italian school resources to buy devices for
students in need. Despite that, 36 parents among those selected from the school reveled that they had
no internet connection or device and they were excluded from our experiment. On top of that, around
20 percent of students used only the phone for the tutoring, as we discuss below.
7 According to the teachers, almost 90 percent of students needed support in math, 78 percent in
Italian and 72 percent in English.
7
9. consent for the project and baseline surveys from parents and students. We sent the
survey using email and text messages.8
Our final study sample comprises the 1, 059 students from 76 schools: these are the
students who completed the baseline survey and whose parents approved the informed
consent and completed the baseline survey themselves by the end of enrollment period
(i.e., April 25th). The geographical distribution with the number of students and schools
for each region is reported in Appendix Figure A.2. To assess the representativeness of
our self-selected sample of schools, in Appendix Table A.I we compare the provinces with
and without schools that took part in the TOP program. We currently focus on the
province level for data availability issues. While the breakdown by education level of the
population is quite comparable (the differences are significant but extremely small), the
provinces with schools in TOP tend to have slightly higher immigrant share, and lower
unemployment rate. These differences are fully explained by the regional divide, with
65 percent of provinces in the North having at least one school included in the program,
compared to 15 percent in the South and Islands, as shown in Appendix Figure A.2 (the
Center is equally represented). The regional imbalance is not surprising, given that the
North was by far the part of the country most hardly hit by COVID-19. It makes sense
that schools in areas strongly hit by the pandemic early on were more likely to apply,
most likely as they foresaw that schools would have not reopened until the end of the
2.2.3 Recruiting and training tutors
Thanks to the collaboration with the rectors of three large Italian universities in Milan,
we sent a message to all students enrolled in undergraduate and graduate programs.9 The
message explained that a team of researchers was launching an online tutoring program
and that we were recruiting “volunteers interested in helping middle school students who
were struggling to keep up with their classes and with their homework”. We required
that volunteers should be currently enrolled in university and fluent in Italian. Applicant
tutors had to complete a baseline survey, indicating among other things the subjects in
which they would feel comfortable tutoring, and their availability for either 3 or 6 hours
per week. The number of applications from volunteers reached 2, 000 by the end of the
enrollment period, far exceeding our expectations.
8 Ifthe parents did not respond within a few days, the research team followed-up with a phone call
to check that they received the information and they eventually shared the consent form and baseline
surveys to a new contact provided by the family.
9 The three universities were Bicocca, Bocconi and Statale, which approximately enroll 33, 000, 14, 000
and 61, 000 students respectively.
8
10. As our volunteers were not trained professionals, we hired a team of pedagogical experts
to train and support the tutors.10 Within a few weeks, they set up an online learning
platform with a self-training program that included slides and videos. The topics included:
how to approach students; tools and online platforms for effective online tutoring; learning
disorders; and tips to help students in math, Italian and English. The platform also
included a supervised forum where tutors could ask questions and share their experiences.
Finally, the pedagogical team organized regular group meetings with around 20 tutors, as
well as one-on-one meetings on demand to offer support in specific circumstances. Based
on the information reported by our tutors at endline, around 80 percent of them used
the training platform, 50 percent watched the videos and followed the online training, 8
percent used the forum, and 36 percent and 12 percent joined at least one small group or
individual meeting, respectively.
The tutor training was an important component of TOP and it ensured that our volun-
teers, even without professional training, could offer a high quality service to their tutees
and could receive professional advice and support in case of need. However, this was
the most expensive part of TOP (see section 7 for cost estimates) and, given our budget
constraints, it limited to 530 the number of tutors that could be trained and supported.
2.3 Experimental Design
2.3.1 Randomization
We randomized the allocation of the 1, 059 students in our sample into two groups: a treat-
ment group that received tutoring (530 students) and a control group that did not (529
students). In order to guarantee that students could start as soon as possible, we processed
applications on a rolling basis by creating ‘blocks’ of around 100 student applicants. We
stratified the randomization at the block level, where blocks were created depending on
the timing of baseline completion.11 Appendix Table A.II shows that the treatment and
control groups do not differ according to baseline characteristics collected from students
and parents, including gender, immigration status, learning disorders, grade, interest in
the subjects taught, and parental education and occupation.
Of the 1, 059 students included in the experimental design, 712 completed the endline
10 The team was led by prof. Giulia Pastori and prof. Andrea Mangiatordi , both at Bicocca University,
and included six other members with teaching and pedagogical expertise.
11 After reaching around 100 completed applications including parental consent, baseline survey of
parents and students, we created a ‘block’ and randomly assigned 50 percent of students to the treatment
and 50 percent to the control group. Within each block, we ordered the observations by school ID and
9
11. test. Attrition rates were different for the treatment and control group, which is not
surprising given that students who received a tutor remained engaged with the program
until the month of June, while control students had to be contacted after not receiving a
tutor. As shown in the Appendix Table A.III, on average 67 percent of students completed
the endline test score: 46 percent of the control group and 88 percent of the treatment
group. We find that children with college educated fathers or with higher familiarity
with computers are more likely to complete the endline test, with the effect being driven
mainly by students in the control group (column 4, Appendix Table A.III). Compared
to students in grade 8, students in grade 6 were more likely to complete the endline in
the control group, which may depend on the fact that, during the period of the endline,
grade 8 was involved in the final middle school exam, and control students may have been
relatively less motivated to devote time to the survey.
[Insert Table I]
Table I reports the balance Table restricting the sample only to students who completed
the final survey. Overall, most characteristics are balanced between treatment and control
group. If anything, compared to the full sample shown in Appendix Table A.II, the con-
trol group is marginally positively selected in terms of parental education (as highlighted
above). Given the direction of imbalance in response rates, one may expect an underes-
timate of the treatment effect. Nonetheless, we will present different robustness checks,
including inverse probability-weighted estimates of treatment effects and the inclusion of
different sets of controls (Appendix Table A.XII).
Among the 530 treated students, teachers identified 427 as needing help in more than
one subject. We randomly assigned one third of these 427 students to an ‘intense’ version
of the program with 6 hours of tutoring per week instead of 3. This will allow us to
estimate the impact of treatment intensity in section 5.1. We present the balance Table
for random assignment to the intense tutoring in Appendix Table A.IV.12
2.3.2 Tutor allocation
We assigned tutors to students following a step-by-step procedure. First, we restricted the
sample of tutors to those currently enrolled in university and fluent in Italian. Given the
high number of volunteers, we decided to further restrict the sample to tutors with previ-
12 The table shows some imbalances in the education level of the mother, with a higher share of mothers
with at least high-school diploma among students in the 6h treatment vs. 3h treatment group. We control
for these baseline characteristics in all regressions.
10
12. ous tutoring experience and/or specific training (e.g., to support students with learning
disorders or immigrants).
Second, we divided tutors into different groups depending on their expertise in the
various subjects (math, Italian, English or combinations of these), their time availability
(3 vs. 6 hours per week), and their training (general, specific for immigrants, specific for
students with learning disorders). Within each group, we randomly ordered the tutors.
Third, we randomly assigned treated students to tutors taken from the relevant group,
considering the subjects they needed help with, whether they needed intense tutoring, and
their characteristics (learning disorders and immigration status). Note that only 4 percent
of tutors had specific training on learning disorders, while 32 percent of the students in
our sample have learning disorders. Hence, the great majority of students with learning
disorders were supported by a tutor who had no training other than the support provided
by our pedagogical team. Similarly, only 1 percent of tutors had studied specifically to
work with immigrant children, who constitute 22 percent of our sample.
As expected, given the allocation procedure, tutors assigned to students differ from
the overall sample of tutors that applied. Appendix Table A.V reports the differences
in characteristics of assigned tutors with those who applied and where not assigned to a
Column 1 of Appendix Table A.VI provides summary statistics for the tutors who
were assigned to students, from the tutor baseline survey. Notice that 530 students were
assigned to the treatment, but 7 dropped out before starting the tutoring, therefore we
only assigned 523 tutors. The great majority of tutors are female (70 percent), born in
Italy (98 percent), they were moved by a desire to help others when applying to TOP (83
percent) and they have previous experience as volunteers (83 percent). In terms of degree
program, about 34 percent of the tutors attend a STEM major or medical school, 28
percent an economics/business major, 14 percent a humanities major, and only 7 percent
a major in education.13
Columns 2 and 3 of Appendix Table A.VI show summary statistics separately for tutors
that offered their availability for 3 vs. 6 hours per week, and the last two columns
report the p-value on the null that the difference is zero and the standardized difference.
Tutors who made themselves available for 6 hours are less likely to come from economics
or business and more likely to come from humanities; they are also more likely to be
born outside Italy and to have training to work with immigrants. Although students are
randomly assigned to a high vs low-intensity treatment, we should keep in mind that
13 The relatively high share of students from economics and business is due to the fact that one of the
three universities from which we recruited, Bocconi, specializes in those subjects.
11
13. they get a ‘package’ of different tutor characteristics when assigned to 6 vs. 3 hours of
2.3.3 Implementation
We matched all tutors with students by April 25, as shown in the timeline (Appendix
Figure A.1). The tutoring lasted from mid-April to the beginning of June 2020.14 In June
2020, we collected the endline surveys.
After each meeting, tutors were required to record some information about the session
using a management tool prepared by the research team. The information included the
day and time of the meeting, whether the student had done the homework assigned,
and whether he/she had exerted effort during the tutoring session. On average, treated
students had 14 tutoring meetings over the course of the program, for a total of 17
hours over 34 days. The distribution of the number of meetings and tutoring days is
presented in Appendix Figure A.3. Less than 5 percent of students chose not to start the
tutoring (hence have zero meetings). During the tutoring, the subject covered by the great
majority of students was math, which was covered by 78 percent of the students. The
entire distribution on subjects covered in the meetings, as reported in tutors’ registries,
is displayed in Appendix Table A.4.
3 Data and Empirical Strategy
We build a unique dataset merging the baseline surveys of parents, students, and tutors
with endline data coming from (i) the results of a standardized test administered by us
and taken by the students, and (ii) surveys of parents, students, tutors and teachers of the
classes in which our treated and control students were enrolled. We report the summary
statistics of our main outcomes measured at endline in Appendix Table A.VII.15 The
relevant survey questions are reported in Online Appendix B.
3.1 Student achievement
One of our main outcomes of interest is student learning. In normal years, standardized
test scores are collected in May/June from all Italian students in grade 8 by the Insti-
tute for the Evaluation of the Italian Schooling System (INVALSI). However, due to the
14 Some tutors voluntarily decided to support the students during the summer and in the following
academic year.
15 The variables labeled as ‘outcomes reported by parent’ or ‘outcomes reported by teachers’ do not
refer to parents or teachers themselves, but to the answers that parents/teachers gave about a given child.
12
14. pandemic, these tests were not administered in 2020. In collaboration with two expert
middle school teachers, we designed a (shorter) standardized test very close in format to
the national standardized one. Our test included seven multiple choice questions in math,
seven in Italian, and five in English.
The test was administered to treatment and control students by enumerators. The
research team sent to each student the link to complete the test score, but they needed a
password to access it. The enumerator called each parent to set a time for the test. During
the test, the student was on a video call with the enumerator, he/she opened the link with
the questionnaire in his/her own device and entered the password given in real time by
the enumerator: at that point the test could start. Enumerators were clearly instructed
not to help children during the test. Once the student completed and submitted the
test online, the enumerators were available to discuss any doubts and answer potential
By design, during the course of our program TOP tutors did not follow a specific
curriculum but they helped students with the homework assigned by school teachers.
For this reason, the test we administered covered the basic achievement expected from
students of each grade. On average, treated and control students answered correctly 56
percent of the questions (as shown in Appendix Table A.VII, line 1): 67 percent in math,
48 percent in Italian, and 50 percent in English. The assessment covered a wide range of
competencies and very few students reached a ceiling in terms of correct answers.
3.2 Student, parent, and teacher surveys
We asked students, parents, and teachers to complete a questionnaire that we sent by email
and/or SMS. The questions covered four main sets of outcomes: academic achievement
and beliefs, educational aspirations, socio-emotional skills, and psychological well-being.
Teachers were asked to complete the same question for each child in their class that was
either treated or control in TOP. The main outcomes in our empirical analysis will be
indexes built extracting the first principal component from the variables in each category,
standardized to have mean zero and standard deviation one in the control group.
Academic outcomes and beliefs. We asked children and parents their beliefs on the
number of correct questions for each subject of the test described in Section 3.1.16 only 56
16 Children were asked about this at the end of the test. Parents were asked this question in their
endline survey, which typically took place after the kid had taken the test (neither the test not the child’s
answers were shared with the parent). Indeed, we expect an impact of TOP on academic outcomes, but
also on the beliefs and expectations of parents and teachers (Rosenthal, 1973). Overall, the data show
that students and their parents are overconfident on their performance, with an average expected share
of correct answers equal to 67 percent (for students) and 71 percent (for parents), against an actual share
13
15. percent in the test. Notably, 64 percent of the students and 71 percent of the parents are
‘overconfident’, in the sense that they expect a higher number of correct answers than one
actually obtained in the test. Teachers’ beliefs tend to be closer to the actual performance:
teachers expect their students to correctly answer 49 percent of the questions on average,
and only 36 percent of them are overconfident about children’s performance.17
To obtain a measure of achievement different from the standardized test score, we asked
each teacher to assign a grade from 1 to 10 to every child in our study (treated or control)
that was in one of their classes. The average grade was 5.65, that is just below the
pass grade of 6 in the Italian school context. This is consistent with the target of our
intervention being children who were struggling to keep up with school work. We also
asked children how they would rate their own school performance on a 1 to 10 scale, and
the average was 6.29, somewhat more optimistic than the teachers but definitely not high.
Aspirations. Low goals and ambitions may lead students into an “aspiration trap”
(Genicot and Ray, 2017; La Ferrara, 2019). Children from disadvantaged background
may underinvest in their education, dropping out from school or choosing easier and less
profitable high school tracks (Carlana et al., 2021). We hypothesized that TOP may
have a direct effect on students’ aspirations, by providing an alternative role model (the
tutor) that may induce them to revise their goals. In our survey, we collected information
from students on their long term educational goals (e.g., attend university), and on their
short term plans (e.g., the type of high school they wanted to enroll in).18 Among the
students in our sample, only 15 percent are interested in a top-tier academic high school,
while around 1/3 are planning to attend a vocational high school.19 As for long-term
goals, 39 percent of the students at endline tell us that they are considering university
education, and the Figure is similar for parents (35 percent). The share is instead much
lower when we ask teachers until what level the student should continue to study: only
14 percent say ‘university’. Finally, we also collected a measure of self-efficacy (Bandura
et al., 1999), asking students (and parents) whether, aside from what they would like to do
in the future, they think they (their children) would be capable of successfully attending
of correct answers of
17 We tried to interview the math, Italian and English teachers for each child. For cases where one of
the teachers did not reply, we calculate the beliefs (and grade, to be described below) as the average for
the subjects for which data is available.
18 In Italy, after grade 8 students need to choose their high school track. The schooling system is
organized in top-tier academic tracks (scientific and classical lyceum), other academic tracks (linguistic,
pedagogical, and other types of lyceum), technical tracks (with technological or economic focus, e.g.,
accounting), and vocational tracks.
19 On average, in Italy 32 percent of students are enrolled in a top tier track and 14 percent in a
vocational track. As expected from the targeting, at baseline, the sample of students who applied to
TOP tends to include more low-achieving and low-aspiring students.
14
16. university if they wanted to.
Socio-emotional skills. Social distancing and school closure can result in a lack of
opportunities to develop not only cognitive, but also socio-emotional skills in the classroom
(Alan et al., 2019). In our endline survey we collected several outcomes to capture socio-
emotional skills. First, in order to measure perseverance, we asked students to answer a
logic question. At the end of the question, we asked them whether they wanted to answer
a new question with the same level of difficulty, with a higher level of difficulty or whether
they wanted to give up. We use their choice as an outcome measure of perseverance in
a real effort task. Second, we measure ‘grit’ following the Short Grit Scale developed by
Duckworth and Quinn (2009). Starting from 8 questions on a 5-point scale, we add up all
the points and divide by 40. The maximum score on this scale is 1 (extremely gritty), and
the lowest is 0 (not at all gritty). We asked the same questions to children and parents,
finding a high correlation among their answers (0.64). Third, we collected a measure of
‘locus of control’ to capture the extent to which students believe they can control the
outcome of events in their lives or whether fate and luck determine the course of action
(Rotter, 1966). To calculate the final score, we start from 4 questions on a 5-point scale,
add up all the points and divide by 20. Also for this outcome, the maximum score is 1
(high locus of control), and the lowest is 0 (low locus of control).
Well-being. Last but not least, we want to understand if the interaction with the
tutor may have helped students to feel less isolated, possibly overcoming depression, and
happier. For this purpose, we collected two measures of psychological well-being from
students and their parents. The first is the Children’s Depression Screener (ChilD-S)
developed by Frühe et al. (2012), which is calculated aggregating a battery of 9 questions.20
The answers are given on a 4-point likert scale; we add up all the points and divide by
36. Also on this outcome, the maximum score is 1 (high level of depression), and the
minimum is 0 (no depression). The second measure is a proxy for happiness: we asked
whether students were feeling happy or unhappy during the lockdown, on a scale from 1 to
10 (10 being the maximum happiness). The correlation between the depression measure
reported by parents and the one reported by students is 0.56, while for happiness it is
3.3 Tutor Survey
On top of the baseline information, we asked all the volunteers that had applied to be
tutors to complete a very short endline survey in September 2020, six months after the
20 For a detailed list, see Online Appendix B.
15
17. start of the program. Almost all the tutors who were recruited into TOP completed the
endline survey, while only around one third of those who were not assigned a student did
so. Appendix Table A.VIII shows the difference in observable characteristics among the
tutors who participated in TOP (‘treated’ tutors) and the others (‘control’ tutors). Once
we account for the criteria used to assign students to tutors (e.g., tutors from STEM
are over-represented in treatment because math was the subject most in demand by the
students), very few significant differences appear. This will allow us to investigate how
participation in TOP affected some outcomes measured at the tutor level.
The first outcome is empathy. We collected two standard questions on a 4-point likert
scale, asking respondents if they (i) “find it easy to put themselves in somebody else’s
shoes”; and (ii) “are able to make decisions without being influenced by people’s feelings”.
We sum all points and divide by 8 to obtain a variable ranging from 0 to 1.
The second set of outcomes concerns views on the role of hard work and effort to achieve
success in life. The index we build aggregates answers to three separate questions on (i)
income differences and effort; (ii) the importance of hard work versus luck and connections;
and (iii) the prospects of getting a well-paid job after studying hard, independent of family
background. We aggregate the variables in a similar way as described above.
Finally, in addition to the short endline, tutors recruited into TOP also completed
some further information on their experience during tutoring, e.g., how satisfied they
were, etc.21
3.4 Empirical strategy
To assess the impact of TOP on the various outcomes we collected, we estimate the
following OLS regression:
Yir = αr + βT reatedi + γXi + εir (1)
where Yir is the relevant outcome for student i who was assigned to treatment or control
in randomization round r; αr denotes randomization round fixed effects; T reatedi is an
indicator for whether the student was assigned a tutor in the TOP program; Xi is a vector
of student level controls measured at baseline, including: gender, immigrant status, grade
in which the student is enrolled, mother and father’s education, mother and father’s
employment type, learning disability, interest for the different subjects, perseverance,
belief on the importance of luck, and familiarity with computers; εir is an error term. We
21 TOP tutors received a longer questionnaire in June that included the questions on their experience
during TOP, and then again in September the same short questionnaire that control students received.
16
18. estimate robust standard errors. We also correct for multiple hypothesis testing using the
Westfall-Young stepdown adjusted p-values, which also control the family-wise error rate
(FWER) and allow for dependence amongst p-values.
4 Results
4.1 Online classes and homework
We start by assessing how participation in the program affected key ‘inputs’ in the learning
process on the part of the students. In particular, we consider the time devoted to
homework and the quality of their participation in regular (online) classes offered by their
[Insert Figure 1]
Figure 1 shows the distribution of time devoted to homework (in minutes) during the
last month of school, as reported by students (panel a) and parents (panel b), as well as
the teachers’ assessment of how regularly the student handed in their homework (panel c).
For each graph, blue bars refer to students in the control group and red ones to students
in the TOP program.
Panel (a) shows that the majority of the students report doing between 30 minutes and
two hours of homework each day, with a small fraction reporting less than 30 minutes and
about 20 percent reporting more than 2 hours. Importantly, the distribution for treated
students is clearly skewed to the right compared to that for control ones, with a marked
reduction in those that report less than 1 hour and a clear increase in those that report
more than 1.5 hours.
When we consider parents’ reports (panel b), we see some discrepancy in the levels
reported: parents are more likely to report very low values (30 minutes or less) and less
likely to report more than 150 minutes. However, it is true also in this case that parents
of students enrolled in TOP report comparatively more time devoted to homework by
their children.
The bottom panel in Figure 1 shows how school teachers perceive students’ commitment
to homework. For control students, 12 percent of the teachers report that they never hand
in any homework, 28 percent say sometimes, 31 percent most of the time and 29 percent
always. The corresponding figures for students in TOP are 4 percent, 23 percent, 35
percent and 38 percent. This confirms that our program did induce students to exert
more effort in homework than they would otherwise have exerted.
17
19. [Insert Table II]
In Table II we consider a broader set of outcomes which includes not only homework,
but also attendance to online classes, behavior during classes and students’ liking of
the subjects. Each outcome is regressed on the treatment dummy and on the controls
detailed in equation (1). We have different sources reporting on the various outcomes,
namely students (columns 1-4), parents (columns 5-6) and teachers (columns 7-9).
Consistent with the data in Figure 1, we find that treatment increased the time devoted
to homework: the average effect is about 10 minutes per day (column 1) or 9 minutes per
day (column 5), depending on whether it is reported by students or by parents.22 This
represents approximately an 11 percent increase over the mean for the control group. Also
the regularity of homework completion as reported by teachers is significantly higher for
treated students. This is shown in column 7, where we estimate an ordered logit model
using as an outcome the categorical variable described in Figure 1(c).
During lockdown, classes were offered online and students were supposed to connect
every day and attend them. Compliance with this requirement was not always full,
though: sometimes less motivated students pretended to have internet problems and
did not connect, or connected for part of the class and then left. In columns 2 and
6, we find that the probability of regularly attending online classes, as reported by the
children and the parents, respectively, is uncorrelated with treatment.23 This is not true,
however, when we consider teachers’ reports (column 8). In this case, students in the
TOP program are 9.4 percentage points more likely to attend classes regularly – a 16
percent increase over the control group mean. The discrepancy is not surprising if one
observes the difference in average values of the dependent variable reported by the three
categories of respondents: children and parents report regular attendance in 83 and 88
percent of the cases, respectively, while –for the same student– teachers only report it in
57 percent of the cases. It is possible that reporting bias by children and parents may
introduce too much noise for us to detect a treatment effect, while the positive impact of
TOP is clear if one takes teachers’ reports as more reliable –which makes sense given that
teachers have no incentive to over-report good behavior.
Column 3 of Table II shows that treated students are 8 percentage points less likely
to report that they found it difficult to follow classes online and use their school’s online
22 The continuous dependent variable expressed in minutes per day and used in columns 1 and 5 is
constructed by assigning midpoint values to the intervals displayed in Figure 1, panels (a) and (b).
23 We asked students whether in the last month of school they had been following online classes regularly,
and we posed the same question to parents regarding their children. The dependent variable in columns
2 and 6 is a dummy taking value 1 if the answer is “Yes, every time there was an online class”.
18
20. platform during the last month of school, representing a 10 percent increase over the
Column 9 shows that treated students also behaved better during school hours. While
for 83 of the students in the control group teachers report behavioral problems during the
last month of school, this fraction is 6.4 percentage points lower among students in the
TOP program.
Overall, these results indicate that both the ‘quantity’ dimension of class attendance
and the ‘quality’ of learning from classwork were positively affected by our program,
suggesting a potential complementarity between the work done by the tutor after school
and that done by the teachers during school hours.
Tutors also seem to have contributed to making the subjects more interesting for their
tutees. Column 4 shows that treated students have a 5 percentage points higher proba-
bility of liking the subjects of math, literature or English relative to control students.24
Given how little our target population likes these subjects (only 28 percent answer in the
affirmative in the control group) this is a sizeable increase.
4.2 Academic outcomes and beliefs
In Table III we study the impact of TOP on academic performance and beliefs.25
[Insert Table III]
The dependent variable in column 1 is our key measure of performance, that is, the
fraction of correct answers given by the student in the standardized test we administered
at the end of the program, which covered the subjects of math, Italian and English (see
Section 3.1 for a detailed description). We find that the share of correct answers in the
test is 4.5 percentage points higher for treated students, a 9 percent increase over the
average of 53 percent correct answers in the control group. The effect is highly significant
(p-value 0.013) and corresponds to a 0.26 SD increase in the index of performance. This
is an impressive result if we take into account two factors. First, the median duration
of tutoring was five weeks. Second, tutors did not specifically prepare the students for
this type of test (multiple choice tests are not typically assigned as homework in Italian
24 We asked students how much they liked the three subjects in which tutoring was offered, on a 5-point
scale from “Not at all” to “Very much”. The dependent variable in column 4 is the mean of three dummies
taking value 1 if the answer is 4 or 5 in math, Italian, and English, respectively.
25 The outcomes in this Table are average values in all three subjects: math, Italian, and English. For
Beliefs and Overconfidence there are few cases for which we have missing information for one subject.
For those cases, we take the average over the subjects for which we have information.
19
21. schools), but rather focused on helping students find a method for studying and doing
regular homework.
In columns 2 to 4 we estimate the effect of the program on the beliefs held by students,
parents and teachers about the number of correct responses given by students in the test.
In all three cases we find a positive impact, which remains significant for teachers (at
the 5 percent level) and for students (at the 10 percent level) also after accounting for
multiple hypothesis testing.
Given that TOP led to actual performance improvements, the positive effect on expec-
tations is not surprising. However, as discussed in section 3.2, students and parents on
average tend to over-estimate the number of correct answers to the test, while teachers
tend to under-estimate it. This can also be seen in the means of the dependent variables
for the control group reported at the bottom of Table III. In columns 5 to 7, we test
whether the program helped re-align individual beliefs with actual performance, using
as dependent variable the dummy ‘Overconfidence’, which takes value 1 if the expected
number of correct answers exceeds the actual one.26 The estimated coefficients point in
the right direction for students and parents (although they are not significant), while the
effect on teachers is a precise zero.
Finally, in the last two columns of Table III we use an alternative measure of academic
performance. In the endline survey for students (implemented a few days before the
students took the standardized test) we asked them how they would rank themselves
compared to their classmates, on a scale from 1 to 10. The 10-point scale is akin to the
grading scale used in Italian schools, where 6 indicates a pass. In the teacher endline
survey, we posed the same question to the teacher for each of the students in our sample
that was in a given teacher’s class. We see that students and teachers’ evaluations do not
diverge much on average: the control group mean in the students’ answers is 6.2, while in
teachers’ answers is 5.5. Treatment significantly increases the two outcomes by 0.25 and
0.33, respectively, corresponding to 0.18 SD for both outcomes.
While the results so far represent average impacts pooling math, Italian and English,
Appendix Table A.IX reports the effects separately by subject. Impacts are positive
across the board, but in terms of significance the most robust effects are detected on
math performance (panel A, columns 1, 4 and 5). This is not surprising, as most students
focused on math during the tutoring sessions (see Appendix Figure A.4).
26 The number of observations is lower in columns 6 and 7 compared to columns 3 and 4 because we
have to restrict the sample to cases in which both the students and their parents (or teachers) completed
the endline survey.
20
22. 4.3 Aspirations
In Table IV we estimate the impact of the program on students’ aspirations and perceived
ability to achieve educational goals, as reported by the students (columns 1-4), their
parents (columns 5-6) and their teachers (column 7).
[Insert Table IV]
The direction of the effects suggests that TOP had a small positive impact. Starting
from long term goals, TOP students and their parents appear more likely to report that
in the future they plan to enroll in university (columns 1 and 5) and teachers are more
likely to say that they should do so (column 7). None of these effects is statistically
significant, though. A similar consideration applies to self-efficacy: we asked students
and their parents, aside from their intentions, how much they thought the student would
be able to attend university.27 Perceptions here are low on average (only 21 percent of
students and 29 percent of parents in the control group respond in the affirmative), and
the positive coefficient on the treatment dummy is not significant at conventional levels
(columns 2 and 6).
Finally, a more immediate choice for our students (in terms of time horizon) concerns
the high school track in which they plan to enroll after middle school: vocational, technical
or academic. Treated students are 6 percentage points less likely to say that they plan to
attend the least prestigious track, that is, vocational (column 3). The effect corresponds
to almost a 20 percent decrease compared to students in the control group, although it is
not statistically significant at conventional levels once we adjust for multiple hypothesis
Overall, the results in Table IV do not show robust evidence of a significant effect of
TOP on individual aspirations, although they qualitatively point in a positive direction.
4.4 Socio-emotional skills
We next test whether the program affected students’ socio-emotional skills, in particular
their reactions in the face of obstacles and their perceived ability to control what happens
in their lives.
[Insert Table V]
27 Theoriginal scale for the response was from 1 to 5, where 1 indicated “not at all” and 5 “very much”.
The dependent variable in columns 2 and 6 is a dummy taking value 1 if the original response was 4 or
5. Results are very similar when estimating an ordered logit model with the original question.
21
23. As explained in section 3.2, to measure perseverance we gave students a logic task and
after they completed it we asked if they wanted to get another one of the same level of
difficulty, a more difficult one, or if they wanted to stop. The direction of the effects in
columns 1 and 2 of Table V (panel A) point to an increase in the probability of asking
for the more difficult task and a decrease in the probability of giving up among treated
students, but neither is statistically significant. The effect is insignificant, and quantita-
tively negligible, also when the outcome is the index of ‘grit’ proposed by Duckworth and
Quinn (2009) (columns 3-4, panel A).
While perseverance and grit do not appear significantly affected by the interaction with
a tutor, students’ ‘locus of control’ does. Column 5 shows that students in TOP believe
to a greater extent that they (rather than fate or luck) can control the outcome of events
in their lives. The magnitude of the treatment effect corresponds to a 0.19 SD increase
over the control group mean. A possible interpretation of this finding is that students
who worked with a tutor saw positive results on the academic front (as shown in Table
III), thus understanding that success in school was not a matter of luck. They may have
then extrapolated this belief to life in general.
4.5 Psychological Well-being
An important goal of our program, in addition to the academic component, was to help
students navigate the psychological difficulties that the lockdown and isolation from their
friends may have created. The tutor represented, among other things, someone to talk to
outside one’s own immediate family –a different voice and a connection with the outside
In panel B of Table V we estimate the impact of the program on two measures of
psychological well being: Frühe et al. (2012) Children’s Depression Screener (columns 1
and 3) and a self-reported index of happiness (columns 2 and 4).28 We construct both
measures using the student’s own answers (columns 1-2) and using the parent’s answer
about their child (columns 3 and 4).
Column 1 shows that students in TOP report less symptoms of depression. The mag-
nitude of the effect corresponds to a 0.16 SD decrease. The effect is qualitatively similar
but insignificant when reported by the parents. Correspondingly, happiness increases:
this time the effect from parents responses is more precisely estimated (but not very dif-
ferent in magnitude from that based on students’ answers). The coefficient in column 4
corresponds to a 0.16 SD increase.
28 Both variables have been normalized so that they range from 0 to 1, as explained in Section 3.2.
22
24. These results suggest that TOP played an important role not only in improving learning
outcomes of students who would have otherwise lagged behind, but also in mitigating
potential mental health problems associated with the pandemic and with the strict regime
of lockdown.
4.6 Summary of Main Results and Robustness
Our main outcomes are related to four dimensions, and in all previous tables we reported
p-values adjusted for multiple hypothesis testing within each family of outcomes. To
summarize the key results, we now report the impact of TOP on standardized test per-
formance and on three summary indexes, constructed using principal component analysis
and standardizing the outcome to have mean zero and standard deviation one for students
in the control group.
[Insert Figure 2]
The main results on the impact of TOP are reported in Figure 2 (and in the corre-
sponding Appendix Table A.XI). The strongest improvement for the treatment group is
in test performance, with an increase of 0.26 SD compared to the control group. This
impact is comparable in magnitude to the average impact of large-scale in-person tu-
toring interventions, as reported in the meta-analysis by Nickow et al. (2020).29 The
overall impact on aspirations, socio-emotional skills, and well-being is also positive with
an improvement between 0.14 and 0.17 SD and a p-value of about 0.10 when adjusted for
multiple hypothesis testing across the four summary indexes.
Appendix Table A.X provides a robustness analysis of the main results presented so
far. First, in columns 1 and 2, we present the OLS estimates and standard errors without
including the baseline controls: we find that the results are very similar to the main results
reported in Tables III, IV, V, and in Appendix Table A.XI.
Second, in columns 3 and 5, we re-estimate the effect of treatment on our main out-
comes choosing the set of control variables in a systematic was with double post LASSO
procedure, following Belloni et al. (2012). We include all baseline characteristics that are
sufficiently correlated with treatment (after imposing the LASSO penalty given that the
regression includes many variables) and the variables that are sufficiently correlated with
29 Weconsider large-scale tutoring interventions involving more than 400 observations and implemented
mainly by non-professional tutors. Compared to our intervention, most of the previous tutoring experi-
ments that were causally evaluated and included in the meta-analysis by Nickow et al. (2020) focus on
elementary school children.
23
25. control (after imposing the LASSO penalty) (Ludwig et al., 2017).30 We list the controls
selected using LASSO for each outcome in Appendix Table A.XII. Including the variables
picked by LASSO in the second step makes no substantial difference in most results,
with the exception of the Aspiration index, where the estimated effect is still positive but
smaller in magnitude and not significant at conventional levels.
Finally, in the last two columns we present inverse probability-weighted estimates of
treatment effects. The estimated effects are almost unchanged and, if anything, slightly
higher due to the minor unbalances presented in Table I, with the control group being more
positively selected compared to the treatment group in terms of parental background.
These different robustness checks provide a consistent and overall positive picture of
the impact of TOP on student outcomes.
5 Mechanisms and Heterogeneous Treatment Effects
In this section we explore treatment heterogeneity along several dimensions, to better un-
derstand the ways in which the program had an impact. We start by considering features
such as the number of hours of tutoring and the technology used to connect virtually,
and then we move to study the role of students’ and tutors’ characteristics. We present
evidence on heterogeneity in two ways: (i) augmenting our benchmark specification with
an interaction term between treatment and the relevant characteristic; and (ii) applying
an honest causal forest algorithm (Wager and Athey, 2018).
5.1 Treatment Intensity
As described in section 2.3, students received a different number of hours of tutoring
depending on how many subjects they needed help in and on the availability of tutors.
Among all the students who needed help in more than one subject, we randomly chose
which students would be matched with a tutor who had offered to help for 3 versus 6
hours per week. We thus have exogenous variation that we can exploit to understand how
the impact of the program varies with the intensity of the tutoring.
[Insert Table VI]
30 The double post LASSO procedure is based on three steps. First, we fit a LASSO regression predicting
the dependent variable and we select all variables with non-zero estimated coefficient after the introduction
of a penalty term that shrinks the estimated regression coefficients towards zero to reduce over-fitting.
Second, we fit a LASSO regression predicting the treatment variable and following the same procedure
of step one. Finally, we fit a linear regression of the outcome variable on the treatment variable including
the covariates selected in the first or second step.
24
26. In Table VI we regress our main outcome indexes on the treatment dummy and on
an indicator for whether the student got a 6-hour tutor (“Higher treatment intensity”).31
Column 1 shows that the impact of the standard, 3-hours/week version of the program is
a 0.2 SD increase in academic performance, as measured by our standardized test score.
Having a tutor for 6 hours a week adds another 0.22 SD, leading to an overall impact of
0.42 SD. This is a remarkable effect, and shows that –in this range of hours– the impact of
additional hours of tutoring is linear. The magnitude of the effect of the intense treatment
compared to the less intense treatment is not surprising in light of the meta-analysis in
(Nickow et al., 2020): they show that the average effect of tutoring on learning almost
doubles going from 1-2 days per week to 4-5 days (from 0.24 to 0.41 standard deviations).
While doubling the hours of tutoring had a sizeable impact on learning, when we look
at other outcomes it did not generate significant gains on top of the gains from the basic
version of the program. In columns 2 to 4 of Table VI the coefficient on ‘Intense treatment’
is positive but never significant. Based on this evidence, it seems that three hours of
interaction with the tutor were already enough to generate the bulk of the improvements
on these soft skills.
A different dimension of intensity of the tutoring is how productive the hours were, in
terms of attention and effort exerted by the student. This is, of course, an endogenous
variable, hence we cannot provide any causal evidence in this direction. However, we
can descriptively examine the patterns that emerge by exploiting information from the
tutor registries. For each session, tutors recorded whether the effort exerted by their
student during that session was ‘poor’, ‘fair’ or ‘very good’. Tutors also recorded the
same information for the homework that the student was supposed to complete before each
session. In Appendix Table A.XIII we find that treated students who exerted higher than
average effort during the session and in completing assignments have higher performance
and educational aspirations at endline, while they are not significantly different from
the rest of the treated students in socio-emotional skills and psychological well-being. It
should be stressed once more that these are not causal impacts.
5.2 Devices and Internet Connection
The key feature of TOP is the virtual nature of the interaction between tutor and student.
By definition, the program requires a minimum technological input, namely an internet
connection and a device that the two can use to have a video call. When we recruited
31 Overall, 27 percent of the treated students received a tutor for 6 hours per week (the mean of this
variable in the full sample of treated and control students is 0.135). 427 students out of the 530 treated
were in need of help in more than one subject, and 1/3 of them received the 6-hours tutoring.
25
27. middle school students, we told school principals that the beneficiaries should have access
to a tablet or PC and to an internet connection for at least 3 hours per week. We have
information on these aspects in the tutor endline survey and the registries, where tutors
recorded for each meeting the type of device used by the student and the occurrence
of technical issues. Based on these sources, 22 percent of the students mainly used a
smartphone to connect, and 77 percent of the students had technical issues during at
least one of the meetings.
[Insert Table VII]
In Panel A of Table VII we test whether the impact of the program was different for
students who connected using a smartphone, compared to those who used a PC or a
tablet. We find that it was not, except for aspirations, where the effect on students who
used a smartphone is zero. Importantly, column 1 shows that, compared to an increase in
test score of 0.27 SD for the students who connected with better devices, the impact for
students who connected through a smarpthone was 0.22 SD, significant at the 5 percent
level. Considering that this may be a more disadvantaged group of students, the sizeable
effect is particularly encouraging. It also leaves room for optimism in considering online
tutoring as a tool that could be applied in contexts where the population may have lower
income and/or only have access to smartphones.
In Panel B of Table VII we consider the occurrence of technical issues during the tutor-
ing sessions. While the coefficient of the variable ‘Technical issues’ is never statistically
significant, its sign points to a lower effectiveness of the treatment in the presence of
technical problems. This may be due to disruptions in the learning process as well as to
shortening of the duration of the sessions.
5.3 Students’ and Parents’ Characteristics
An important dimension of heterogeneity pertains to student demographics and socioe-
conomic background.
[Insert Figure 3]
In Figure 3 we show the impact of treatment separately for different sub-groups of
students, split according to predetermined characteristics: gender, immigrant status, and
learning disorders. The figure shows the estimated impact (relative to the control group)
and associated 95 percent confidence interval, for our four main outcome indexes. All
26
28. indexes are standardized so that the coefficients represent the effect of treatment in units
of standard deviations.
Starting from the top-left panel, we find that boys and girls benefited from the program
to the same extent, as did native and immigrant students. One category that benefited
significantly more was that of students with learning disorders: for them, performance in
the standardized test increased by over 0.5 SD, significantly more than for treated students
without learning disorders. This is a group of students that may have faced particular
difficulties with the learning methods and materials that schools provided during distance
learning, in most cases not tailored to the needs of students with dyslexia, dyscalculia,
or other disorders. Our program helped alleviate such difficulties. This is noteworthy
because only 20 out of 523 tutors had specific training on learning disorders, so the vast
majority of the 149 students in the treatment group who had a learning disorder got a
tutor without specific training.32
When we look at aspirations and socio-emotional skills, the treatment effect appears
to be similar across all these subgroups – the only pattern worth mentioning is that
aspirations increase for natives but not for immigrants, possibly because the latter face
different types of barriers when planning their future education (Carlana et al., 2021).
The outcome for which heterogeneity in treatment effects is most striking is psycho-
logical well being (bottom-right panel of Figure 3). TOP worked equally well for boys
and girls and for students with and without learning disorders (with a marginally higher
benefit for students with learning disorders). But when we compare native and immigrant
students, it is clear that the increased happiness and reduced depression we detected in
Table V is entirely driven by immigrant students. The magnitude of the effect for this
group is a striking 0.77 SD increase in well being. One possible interpretation is that im-
migrant students have a less dense network of friendships, hence felt more isolated during
the lockdown: they were prevented from meeting classmates in school and they may not
have been included in conversations that were happening online through WhatsApp or
other groups. In fact, among students in the control group, immigrants have on average
a 0.52 SD lower well-being compared to natives. Meeting regularly with a tutor proved
particularly beneficial for the psychological well being of these students.
Next, we examine impact heterogeneity by socioeconomic status of the family, as mea-
32 We did prepare a module on how to teach students with learning disorders as part of our online
support for tutors, so this was probably a useful resource for them. Indeed, we know from the endline
survey that 62 percent of tutors assigned to students with learning disorders watched the videos compared
to 47 percent of other tutors. They were also 12 percent more likely to participate in the group meetings
for tutors organized by the pedagogical experts and 20 percent more likely to ask for a one-on-one meeting
with an expert to get recommendations on how to effectively help their student.
27
29. sured in the parents’ questionnaire. We focus on three characteristics: education (less
than high school, high school diploma, or higher); type of employment (none, blue collar
job, or white collar job); and whether at least one parent worked from home during the
lockdown. The results are reported in Figure 4.33
[Insert Figure 4]
While the effects are not very precisely estimated, it appears that the gains in academic
performance are concentrated among students whose parents have a high school degree
or less (which is the case for about 90 percent of our sample, anyway). Treated students
whose parents did not complete high school (45 percent of the sample) also appear to
have benefited more from the program in terms of aspirations and socio-emotional skills.
Impacts on psychological well-being are constant across the parents’ education gradient.
Turning to parental employment, we see that having a tutor improved students’ perfor-
mance significantly more for the children of blue collar mothers compared to white collar
ones. These children also benefit significantly more in terms of happiness and reduced
What seems to emerge as a consistent pattern, at least qualitatively, is that TOP had
a bigger impact on students whose parents both worked outside the home. These are the
students who may have received the least support from parents in terms of schoolwork,
and also may have been monitored less during distance learning. Regular meetings with
a tutor had particular relevance in these cases.
5.4 Tutor characteristics
After discussing impact heterogeneity in terms of students’ and parents’ characteristics,
we investigate whether tutors’ characteristics played a significant role in explaining the
effects of the program.
[Insert Table VIII ]
In Table VIII we explore three sets of tutor baseline characteristics: gender, academic
performance, and pro-social attitudes. For each of these characteristics, we report the
33 The Figure shows the effect using mothers’ education and occupation. The results are quantitatively
and qualitatively very similar using fathers’ education and occupation.
34 The category ‘no job’ typically shows an intermediate coefficient, which may result from the fact that
it pools families where the parent is (involuntarily) unemployed and families that have chosen to keep
one parent (typically the mother) at home. Also, parents who do not work can stay home and help their
children with homework, as we discuss shortly.
28
30. coefficient of the treatment dummy interacted with the relevant subgroup, and a p-value
for the null hypothesis that the two coefficients are the same.
Panel A shows that the effect of TOP on our four main outcomes of interest did not differ
on the basis of the gender of the tutor. Also when we distinguish possible combinations
of gender of the tutor and gender of the student (panel B), we fail to detect significant
pairwise differences: same gender pairs do not perform significantly better or worse than
mixed gender ones.
We also find that tutors’ GPA (a proxy for their academic ability) did not significantly
affect the impact of the program: treated students benefited equally from interacting with
a tutor above and below the median GPA in their faculty.35
One possibility is that what matters is not how well a tutor does in his/her university
exams, but rather the type of program they are enrolled in. For example, tutors who are
enrolled in a STEM degree may be more effective if the subject which the student needs
help is math, rather than Italian, etc. In Appendix Table A.XIV we estimate the impact
of the program on students’ performance, disaggregating students’ average test score into
separate scores for math (top panel), Italian (intermediate panel) and English (bottom
panel). The sample only includes treated students, since by design tutor characteristics
are only available for students who are in the TOP program. For each subject, we consider
two proxies for tutors’ proficiency in that subject. The first is whether the tutor expressed
a preference for that subject when they signed up for our program (variable ‘Volunteer
in [subject]’ in the table). The second proxy is an objective measure, specifically: being
enrolled in a STEM degree, for math performance; being enrolled in a Humanities degree,
for Italian; and having an international certification in English (e.g., TOEFL, IELTS,
etc.) for English. Interestingly, we do not find significant differences for math and Italian,
nor a consistent pattern, when we look at what they volunteered to teach and the faculty
they are enrolled in. For English, both proxies point to a higher effectiveness of more
‘competent’ tutors, although neither difference is statistically significant.
Finally, the last two panels of Table VIII capture tutors’ pro-social attitudes and motiva-
tion. We compare the impact of tutors with and without previous volunteering experience
(panel D) and of tutors who, when asked at baseline what motivated them to take part
in the project, replied “To make myself useful” (variable ‘Help others’ in panel E). Note
that our tutors are generally highly pro-social: 82 percent had previous experience as a
volunteer and 83 percent joined TOP to be useful to others. For this reason, it is not
too surprising that we do not detect significant differences in the outcomes of students
35 We standardize the GPA within faculty to account for potential differences in grading criteria, number
of credits, etc. across programs.
29
31. who were assigned different types of tutors. The one outcome in which tutors’ motivation
seems to make a difference is aspirations (column 2), where the positive impact is entirely
driven by the more pro-social tutors.
5.5 Heterogeneous Treatment Effects using Causal Forest
To complement the above analysis with a more systematic approach, we estimate hetero-
geneous treatment effects using a causal forest algorithm. We follow Athey and Imbens
(2016) and Wager and Athey (2018) and apply their method to understand who benefits
most from the tutoring. Online Appendix C describes our methodology in detail.
In a nutshell, we estimate the Conditional Average Treatment Effect (CATE), including
in the causal forest demographics (e.g., gender, immigrant dummy, parental education and
occupation, etc.) and other controls (e.g., school grades and interest for different subjects,
familiarity with computers, etc.). We use the predictions on the expected treatment effect
for each individual, given the covariates, to investigate treatment heterogeneity. We divide
the sample in two groups: top and bottom half of the predictions.
Appendix Table A.XV reports the mean of each baseline characteristics for the stu-
dents above and below the median of predicted impact. Overall the results are consistent
with our analysis in the previous sub-sections. Students who have learning disorders,
lower initial grades and parents with less skilled occupations (e.g., blue collar mothers)
are over-represented among the students with the highest predicted impact on perfor-
mance. Immigrants and students with blue collar mothers are over-represented among
the students with the highest enhancement in their well-being.
Overall, the most disadvantaged children seem to have benefited the most from the
tutoring. However, heterogeneity depending on parents’ or students’ characteristics is not
stark. It is worth emphasizing that the sample of students included in TOP had been
already selected by school principals and teachers on the basis of their being most in
need of the tutoring intervention. This may have potentially led to lower heterogeneity
in treatment effects.
Finally, Appendix Table A.XVI reports the characteristics of the most and least effective
tutors for all four main outcomes.36 Overall, also for tutor characteristics, we do not find
evidence of strong heterogeneity based on observable characteristics.
36 Inthis table we restrict the sample to treated students, as students in the control group are not
assigned a tutor.
30
32. 6 The Impact of TOP on Tutors
The primary purpose of the project was to improve outcomes for students who where the
direct beneficiaries of the intervention. However, the volunteering experience of being a
TOP tutor may have affected tutors’ capacity to empathize, as well as their perception of
the relative importance of hard work versus luck to achieve success in life. We collected a
short questionnaire (as described in Section 3.3) from volunteers who applied to the TOP
tutoring program, independently on whether they were assigned a student or not. As
mentioned in Section 2.3.2, the assignment of tutors to students was random, conditional
on a set of baseline characteristics used for the allocation of tutors (e.g., subject and time
availability). Around half of the respondents who completed the endline questionnaire
had been randomly assigned to a student. Appendix Table A.VIII shows that the char-
acteristics of tutors who experienced TOP and who did not experience TOP are overall
balanced once we take into account the allocation criteria.
[Insert Table IX ]
Table IX shows the impact of tutoring on the two key outcomes (the Empathy index
and the Hard Work index) described in Section 3.3, when controlling for the factors used
in the assignment of tutors to students (time and subject availability, previous training
and tutoring experience, and regular enrollment in university).
We find that participating in TOP increased tutors’ empathy by 3.4 percentage points,
a 0.27 SD increase compared to volunteers who were not assigned a student to supervise.
We do not find any economically or statistically significant effect on tutors’ perceptions
of the role of hard work to achieve success in life.37
Finally, we also asked tutors whether they where satisfied with their tutoring experience
and whether they would be interested in volunteering again during the following academic
year.38 Appendix Table A.XVIII shows the baseline characteristics of students and tutors
correlated with a higher level of satisfaction (column 1-2) and willingness to tutor again
(column 3-4). Tutors matched to students whose parents have a bluecollar job report
higher satisfaction and willingness to repeat the experience. Being enrolled in STEM (and
37 For completeness, we present the ordered logit results for the individual questions used to build the
empathy and hard work indexes in Appendix Table A.XVII. Panel A shows the results using the endline
collected in September, six months after the beginning of the intervention, while Panel B shows the results
when imputing the value for the first endline collected in June for those tutors who did not complete the
second endline. The results are qualitatively and quantitatively very similar with both samples.
38 The original scale for the response on satisfaction was from 1 to 5, where 1 indicated “not at all
satisfied” and 5 “very satisfied”. For the question on whether tutors would like to volunteer during the
following academic year the answer was from 1 to 3, where 1 indicates “no”, 2 “I need to think about it”,
and 3 “yes”.
31
33. marginally in Economics) negatively correlates with the intention to tutor for an extra
year, possibly due to exam/program requirements rather than dissatisfaction (in fact,
column 1 shows that students from Economics report higher satisfactions than others).
Similarly, higher GPA is negatively associated with the intention to tutor for an extra
year, but not with satisfaction.
7 Conclusions
School closure due to the COVID-19 outbreak has created massive learning losses and
adverse psychological effects for children, especially the most vulnerable and those from
low socioeconomic background (Agostinelli et al., 2020; Azevedo et al., 2020; Orgilés et al.,
2020; Golberstein et al., 2020). In this paper, we show that online tutoring can be an
effective tool to help students during the pandemic, improving their academic outcomes
but also their psychological well-being and development of socio-emotional skills.
We exploit over-subscription by schools and students to an innovative online tutoring
program in Italy, “TOP”, to evaluate its impact using a randomized control trial. We
find that the one-on-one support provided virtually by volunteer university students for
around 5 weeks increased performance in a standardized test by 0.26 SD, psychological
well-being by 0.17 SD, and aspirations and socio-emotional skills by 0.15 and 0.14 SD,
In-person tutoring, especially when implemented by professionals and teachers and/or
for several days per week, has proved highly effective in several contexts (Nickow et al.,
2020; Fryer Jr, 2017). However, these programs are widely viewed as “too costly to be
undertaken on a large scale” (Ander et al., 2016). Our TOP intervention allows to achieve
sizeable results on learning and other life outcomes, keeping the costs extremely contained.
The program leverages volunteer university students as tutors, mainly moved by intrinsic
motivation and supported by a team of pedagogical experts. Volunteer tutors represent a
viable and effective solution to reach a large number of students in need of support. The
overall cost of the program per pupil was around 50 euro, covering the organizational and
pedagogical support.39
Even when schools re-open after the COVID-19 outbreak, virtual tutoring implemented
by volunteer university students may provide an effective tool to help vulnerable children
and prevent inequalities to emerge, in a cost-effective way. Indeed, the design of the in-
tervention easily adapts to ‘normal’ school times, when the role of tutors may be that of
39 Thisexcludes the research costs, namely the incentives to complete the endline survey for families,
and the salaries of the enumerators who supervised the endline test score data collection.
32
34. helping to target learning at the right level for students. Future evidence from other coun-
tries and time periods may help better understand the scope for exploiting this versatile
educational tool.
Agcom (2019). Educare digitale lo stato di sviluppo della scuola digitale un sistema
complesso ed integrato di risorse digitali abilitanti. Studio del Servizio Economico-
Statistico Agcom.
Agostinelli, F., Doepke, M., Sorrenti, G., Zilibotti, F., et al. (2020). When the great
equalizer shuts down: Schools, peers, and parents in pandemic times. IZA Discussion
Paper No. 13965.
Alan, S., Boneva, T., and Ertac, S. (2019). Ever failed, try again, succeed better: Re-
sults from a randomized educational intervention on grit. The Quarterly Journal of
Economics, 134(3):1121–1162.
Ander, R., Guryan, J., and Ludwig, J. (2016). Improving academic outcomes for disadvan-
taged students: Scaling up individualized tutorials. The Hamilton Project – Brookings.
Angrist, N., Bergman, P., and Matsheng, M. (2020). School’s out: Experimental evi-
dence on limiting learning loss using “low-tech” in a pandemic. NBER Working Paper,
(w28205).
Ashraf, N., Bandiera, O., and Jack, B. K. (2014). No margin, no mission? a field
experiment on incentives for public service delivery. Journal of Public Economics,
120:1–17.
Athey, S. and Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects.
Proceedings of the National Academy of Sciences, 113(27):7353–7360.
Azevedo, J. P., Hasan, A., Goldemberg, D., Iqbal, S. A., and Geven, K. (2020). Simulating
the potential impacts of covid-19 school closures on schooling and learning outcomes: A
set of global estimates. Policy Research Working Paper Series 9284, The World Bank.
Bacher-Hicks, A., Goodman, J., and Mulhern, C. (2020). Inequality in household adap-
tation to schooling shocks: Covid-induced online learning engagement in real time.
Journal of Public Economics, 193:104345.
Bandura, A., Freeman, W., and Lightsey, R. (1999). Self-efficacy: The exercise of control.
Springer.
Banerjee, A., Banerji, R., Berry, J., Duflo, E., Kannan, H., Mukherji, S., and Walton, M.
(2015). Teaching at the right level: Evidence from randomized evaluations in India.
NBER Working Paper, 22746.
33