Skip to main content

Problems Teaching Listening Online


Marc Jones
Department of Global Innovation Studies, Faculty of Global and Regional Studies, Toyo University

Due to the move to online remote teaching, teachers and students have been required to change the way they undertake teaching and learning. Predictably, these changes have led to difficulties, in particular when teaching listening. Problems discussed include: use of streaming media playback resulting in effects that can hinder listening comprehension; learner use of subtitles and potential overdependence as well as negative effects upon phonological acquisition; and assessment of listening when learners are capable of cheating. Solutions suggested are shifts in control of media, ways to mitigate cognitive load (Sweller, 2011) due to environmental factors form-focused interventions for listening difficulties, creation of websites to store files or links to media, and options for assessment that use a collaborative, humanistic approach. While important for online listening pedagogy, many of the suggested interventions may be useful for classroom instruction.

KEYWORDS: listening, online, pedagogy, infrastructure, assessment

According to survey-based research carried out five years ago, most teachers in Japan want more training in teaching listening (Jones, 2016). Given the strains and stresses of post-COVID-19 teaching, would the proportions be even higher and would this be generalizable to contexts outside Japan? Arguably this is highly probable. According to a survey with a very small sample, university English teachers did not receive much training to prepare for online remote teaching (Jones, 2020c). This is likely to exacerbate problems further. In this article I detail some of the problems that may arise in teaching listening online and detail some of the possible solutions or mitigations that teachers can use.

Streaming video can be ‘bumpy’ in Zoom according to some of my students. It can also be difficult to stream video on other internet meeting sites. This can result in audiovisual lag, with information on screen not corresponding to the correct audio. Such lags can generate McGurk effects (McGurk & MacDonald, 1976), where perceived phonemic information is actually different to the audio. An example of this may be the audio produced being /bi/, the visual information being that of /gi/, and the resulting perception being /di/ (Green & Kuhl, 1989). When listening is already a skill that students have problems with and may feel that their perception cannot be fully relied upon, such effects are likely to add to more negative feelings regarding the possibility of task completion. Additionally, even when McGurk effects are not produced a mismatch between what is heard and what is seen can create more work for the brain (Kolozsvári et al., 2019). Some learners are likely to notice the mismatch and move on, whereas others will become distracted which may lead to feelings of being overwhelmed, particularly if such effects are frequent.

Ways to get around this may be to use an alternative method of delivery. Instead of teachers controlling a video stream through a meeting service, it may be more beneficial to share the stream and time codes with learners. Alternatively, if it is a video on YouTube (YouTube, no date) the start and end times can be manipulated by adding codes at the end of the URL in embed codes (detailed in Jones, 2020b). If teachers use a type of internet site called a learning management system (LMS) for distributing files to students and managing student assignments, they may also use downloaded video and edit it into sections. Some popular LMS are Moodle (Moodle, no date), Canvas (Instructure, no date) and Google Classroom (Google, no date). If using video from a paid streaming site such as Netflix, legal issues aside, it may be the case that students already have their own accounts with the service and they can be given season and episode numbers, and possibly time codes.

If the only reason for teacher control of video files or streams is to prevent student access to L1 subtitles, this begs the question of how we expect learners to take responsibility for their own learning choices. Making clear the expectation of using no subtitles on a first play in order to practice listening to the target section is likely to lead to the vast majority of students following instructions. Subtitles themselves have mixed merits in listening pedagogy.

The advantages of subtitles, according to Wisniewska & Mora (2020) are that L1 subtitles improve understanding of meaning and L2 subtitles improve pronunciation. This is especially useful in autonomous listening because the process should be enjoyable to ensure that it is a repeated activity. However, the effects of orthography in subtitles may interfere with learning. Sokolović-Perović et al., (2019) found an effect on phoneme length by Japanese learners of English in that phonemes represented by double letters were produced with lengthened phonemes despite there being no long consonants in English. Bassetti (2007) found that Pinyin (romanized text) may interfere with learning of Chinese by ‘non-native speakers’. In both Sokolović-Perović et al., (2019) and Bassetti (2007), learners were substantially familiar with the script involved, even if their reading of the words was not orthodox in the case of Sokolović-Perović et al., (2019). However, Showalter and Hayes-Harb, (2013) found novel orthography (tone marks) for English learners of Chinese can be positive. It may be the case that the unfamiliar script means that assumptions about that orthography are non-existent and therefore cannot be carried over to the L2 schema. Whether subtitles actually improve listening is difficult to confirm but Wisniewska & Mora (2020) found no significant benefits for phonological accuracy in perception. Therefore, it is unlikely that orthography is going to aid learning of new sounds among learners.

So why do I encourage listeners to use subtitles if they are not going to learn new sounds? The subtitles are not there as a learning aid, but more as a way to check what was heard on the first pass through the text. Alternatively, if learners do use subtitles while listening, there is the small chance that they will hear something that does not match with their expectations based upon the subtitles they read. This mismatch is salient, and therefore noticeable. Whether or not Schmidt’s (1990) Noticing Hypothesis is correct, learners do need to perceive a form in order to be able to process it (Pienemann, 1997) and therefore need to pay attention to form if it is unfamiliar. Therefore, through this process it is hoped that the mismatch becomes a learning episode.

As stated above, students have reported problems with the audio in streaming shared video recordings through video chat services. Though this appears to be a problem, it is actually a sign that teachers can free themselves of the need to regulate the recorded media that their students listen to. By providing a link in a chat box or a file in a learning management system, students can access the recording themselves and teachers can provide a time limit for everyone to regroup in the virtual classroom. If more time is made available than is required to watch the recording in real time, it is also possible for students to revisit problematic sections, which can foster reflection on what exactly is difficult about that particular part of the text. Additionally, websites can be created easily using numerous free services and are useful places to store links or recordings that are not sensitive. By freeing teachers of the responsibility of managing the media and passing it on to learners, this opens up a pathway to greater learner responsibility and agency overall, and therefore fewer cumbersome responsibilities (alluded to as ‘monkeys’ in Waters, 1998) for the teacher. While teachers may be unsure of learners’ capacity to take on greater agency and therefore autonomous learning, Benson (2013, p. 840) reminds us that “autonomous language learning is more likely to be self-initiated and carried out without the intervention, or even knowledge, of language teachers."

Regarding difficult sections of recordings, Field (2008, p. 90) recommends ‘micro-listening’ which “ideally feature single sentences, pairs of sentences or very short sections of text, drawn from published, off-air or internet recordings.” Essentially, this involves simple decoding, or drawing attention to features of spoken language that cause difficulties. By exposing them in isolation to learners, they become less difficult, because there is no need to attempt to retain the information mentally while also paying attention to upcoming auditory information. With fewer distractions, one would hope that the features in micro-listening can be salient enough for noticing (Schmidt, 1990). However, this is not only suitable for classroom work but is also a way for learners to troubleshoot their listening difficulties independently.

I have found micro-listening to work well with partial YouTube (no date) embeds in a Moodle (n.d.) page or another webpage such as a WordPress (no date) blog (partial embeds do not work in Google applications). To do this, minor adjustments need to be made in the URL to provide the start and end time in seconds as referenced above in Jones (2020b). The minimum length of a partial embed is 2 seconds. If learners can provide time codes, this can be carried out as a reactive Focus on Form (FonF) (Long, 2015), and have even greater connection to the lesson. Additionally, if learners are taught how to create such embeds, they can revisit their own problematic sections in their own time. While embeds shorter than 2 seconds are not possible, with editing software it is both possible and realistic to use clips shorter than this. However, this is likely time consuming, particularly for teachers unfamiliar with multimedia software. Due to this, a case-by-case evaluation needs to be made regarding whether the effort is worth the potential pedagogical benefit. However, when not embedded, learners may also be able to share an extremely short excerpt by sharing screens and sound and using the pause button.

After micro-listening, it is probably advisable to return to the recording at the macro level and allow students to hear the shortened excerpt in context again. As with recasts in spoken error treatment, teachers usually intend for the treatment not to be the end point but as the start of rectifying miscommunication. With micro-listening as FonF, there ought to be an opportunity to reconnect to its original context and then aim to rectify the miscommunication that occurred while trying to parse the text. This can provide an affordance for reflection on aspects of the listening process or features of speech that cause difficulties and of potential strategies to try in order to overcome those difficulties.

Due to the high cognitive load (Sweller et al., 2011) that can be involved in listening due to phonemic discrimination, lexical segmentation, parsing of message and semantic and/or pragmatic evaluation, the amount of listening work assigned needs to be carefully considered. If there is too much in the stream of speech that needs to be attended to in working memory, this can cause students to feel overwhelmed. Once overwhelmed, this is likely to be stressful and thus affect working memory (Baddeley, 1992) due to attention to one’s own affective state and also toward the speech stream. Therefore, by providing breaks in the speech stream and therefore the need to attend to it for prolonged periods, learners can focus upon listening only and teachers may increase the length of listening periods in order to train working memory to handle L2 speech over a longer period. Additionally, by teaching a systematic notetaking method, students can develop the skills to manage information in longer streams of L2 speech that their working memory alone cannot handle.

As detailed in an article written about difficulties teaching listening in a physical environment (Jones, 2020a), several environmental factors can impact learners’ working memory and therefore the executive function relating to task focus as well as the phonological loop (Baddeley, 1992) which is used to attend to sound and speech. We cannot control the actual learning environment so we must advise on it. In the physical classroom learning environment temperature and air flow are regulated centrally or by someone physically present who notices their impact, it is obviously not possible for the teachers online to notice students’ physical environment factors. Additionally, if there are distractions present, this can be another factor affecting quality of attention. However, all of these can be mitigated with a short reminder at the beginning of a lesson. It may appear to be overly patrician and even patronizing at first mention, but when considering that students may become absorbed in solving the problems of their own learning and language acquisition, it is a useful prompt. Speaking from my own experience, it may also be useful for teachers who may be tempted to otherwise sit in the same position for several online teaching sessions in a day with little movement or air flow in their room.

There has been a move, particularly in higher education and particularly in North America to move toward online proctoring software for examination (Moro, 2020; Watters, 2020). While there has been criticism of this, as well as student protest (Harwell, 2020), it appears to continue unabated. However, particularly for the purposes of an ESL/EFL/ESOL course, sleepwalking toward a situation where we assume the presence of bad actors is probably counterproductive. Cormier (2021) describes an arms-race situation that emerges in the higher education context, with students likely to use websites that provide correct answers to exams in order to deal with the increasing difficulty and workload involved in keeping up with teachers increasing ‘rigour’ due to the anxiety of online teaching. This may be due to the lack of means for heuristic formative assessment such as whether students look confused, appear to be struggling, how much they appear to be writing notes, etc. However, as language teachers and in particular listening teachers, we should hope that our students are communicating in the target language or collaborating on ways to deal with the comprehensibility and intelligibility challenges that different examples of spoken language provide.

It is my belief that we should assume that students are collaborating during listening assessment, and in fact, this is a natural condition for many of the listening tasks we assign in English for (General) Academic Purposes (EAP, or EGAP) and General English, where collaboration with peers to make sense of a difficult lecture or a speech act that is not wholly comprehensible, is not only common but assumed to be good practice. The trade-off with this is a loss of granularity in assessment, which may be difficult to justify in comparison to a standardised test. The factor to consider in this is whether we are educating students to solve problems they are likely to face on an ongoing basis, or whether we are educating students to solve problems they are likely to face only during their institutionalised education.

Some approaches to listening that have worked for me are provided below, with the caveat that they are unlikely to work in every context due to an array of factors such as student orientation toward autonomous learning, technology familiarity, general language proficiency. etc.

Independent listening journals have been a useful tool for me to assess my students’ listening skills development because it shows how much the listening skills I teach in class are portable to an independent Iistening context. I require a set of notes taken during listening, a reaction to and a summary of the text as well as new language items learned from it. I also ask students to log whether they used subtitles to assist their listening or whether they listened without subtitles. The final stage is students logging the difficulties they faced when listening to that particular text, as specifically as possible, and considering strategies they could employ to work beyond this difficulty. This stage of self-reflection fosters a greater sense of responsibility for one’s own learning and teacher assistance is requested in a more positive, specific way which enables more effective instruction in solving listening problems. Learners also use the strategies and reflect upon them in a way that allows them to develop longer-term developmental strategies for their listening skills, such as intention to listen to a wider variety of Englishes or wider range of genres to gain greater familiarity. Furthermore, because the journals are kept over a period of time, collaboration occurs as a way of providing interesting listening material and experiences between students, and any poor academic behaviour such as plagiarism is easy to observe through a simple journal comparison. This reduces the need for ‘policing’ student behaviour, because any infraction of rules is not only documented but also submitted by the students themselves without relying on surveillance technology.

In the online environment, the use of tests becomes somewhat more difficult, or at least different. When conducting listening tests, the use of a LMS such as Moodle (no date) has been useful if only because it can be used as file storage and the medium for the test itself. Additionally, if test questions are input with answers, they can be automatically marked. This is a lot of work upfront but can result in less time later. Additionally, longer recording clips can be used with summarizing tasks. While summarizing tasks cannot be graded automatically, placing key words in the answer section typically used for automatic marking can serve as a reminder and thus cut down the time taken to mark rather complex test questions.

One of the main issues with teaching listening online is considering the locus of control in the lessons. Teachers may be accustomed to being responsible for control of recorded media, the modality it is shared in, and also how and whether parts of it are revisited. By shifting this to students, it is not only creating a more egalitarian learning environment in general, but also may assist in developing responsibility for learning among the individual students rather than creating conditions for overreliance upon teacher intervention. Obviously, more research is required into the conditions of online learning and the different types and magnitudes of autonomy learners experience and how these translate to language acquisition. However, my hope is that with greater learner collaboration in the online learning environment, teachers and students can co-create something more equitable and more conducive to listening skill development than appears to be the case with existing models of instruction both in the classroom and online.

Address for correspondence:

Baddeley, A. 1992. Working Memory. Science, 255(5044), pp.556–559.

Bassetti, B. 2007. [Post-print] Effects of hanyu pinyin on pronunciation in learners of Chinese as a foreign language. In: Guder, A. and Jiang, X. and Wan, Y. (eds.) The Cognition, Learning and Teaching of Chinese Characters. Beijing: Beijing Language and Culture University Press.

Benson, P. 2013. Learner Autonomy. TESOL Quarterly. 47(4), pp.839–843.

Cormier, D. 2021. After Cheggification – A way forward (Part 1). 10 February. Dave’s Educational Blog. [Online]. [Accessed 11 February 2021]. Available from:

Field, J. 2008. Listening in the language classroom. Cambridge, UK; New York: Cambridge University Press.

Google. [no date]. Google for Education. Solutions Built for Teachers and Students | Google for Education. [Online]. [Accessed 7 May 2021]. Available from:

Green, K. P., and Kuhl, P. K. 1989. The role of visual information in the processing of place and manner features in speech perception. Perception & Psychophysics. 45(1), pp.34–42.

Harwell, D. 2020. Cheating-detection companies made millions during the pandemic. Now students are fighting back. Washington Post. [Online]. 12 November. [Accessed 7 May 2021]. Available from:

Hayes-Harb, R., Nicol, J. and Barker, J. 2010. Learning the Phonological Forms of New Words: Effects of Orthographic and Auditory Input. Language and Speech. 53(3), pp.367–381.

Instructure. [no date]. Canvas Overview. Instructure. [Online]. [Accessed June 11, 2021] Available from:

Iverson, P., Kuhl, P. K., Akahane-Yamada, R., Diesch, E., Tohkura, Y., Kettermann, A., and Siebert, C. 2003. A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition. 87(1), pp.B47–B57.

Jones, M. 2016. Teachers’ Beliefs and Practices Regarding Listening and Pronunciation in EFL. Explorations in Teacher Development, 23(1), pp. 11–16.

Jones, M. 2020a. Exploring Difficulties Faced in Teaching Elective English Listening Courses at Japanese Universities. Listening Education Online Journal. 10(1), pp.12–20.

Jones, M. 2020b. How can I Teach Listening Online? 10 April. Freelance Teacher Self Development. [Online]. [Accessed 7 May 2021]. Available from:

Jones, M. 2020c. [Preprint] Technology Expenses and Education among University English Language Teachers. SocArXiv.

Kolozsvári, O. B., Xu, W., Leppänen, P. H. T., and Hämäläinen, J. A. 2019. Top-Down Predictions of Familiarity and Congruency in Audio-Visual Speech Perception at Neural Level. Frontiers in Human Neuroscience, 13.

Long, M. H. 1991. Focus on Form: A Design Feature in Language Teaching Methodology. In: de Bot, K., Ginsberg, R. B., and Kramsch, C. eds. Foreign Language Research in Cross-Cultural Perspective. Amsterdam: John Benjamins, pp. 39–52.

McGurk, H. and MacDonald, J. 1976. Hearing lips and seeing voices. Nature. 264(5588), pp.746–748.

Moro, J. 2020. Against Cop Shit. 13 February. Jeffrey Moro. [Accessed 7 May 2021]. Available from:

Pienemann, M. 1999. Language processing and second language development: processability theory. Amsterdam: Benjamins.

Schmidt, R. W. 1990. The Role of Consciousness in Second Language Learning. Applied Linguistics. 11(2), pp.129–158.

Showalter, C. E. and Hayes-Harb, R. 2013. Unfamiliar orthographic information and second language word learning: A novel lexicon study. Second Language Research. 29(2), pp.185–200.

Sokolović-Perović, M., Bassetti, B. and Dillon, S. 2019. English orthographic forms affect L2 English speech production in native users of a non-alphabetic writing system. Bilingualism: Language and Cognition. 23(3), pp.1–11.

Sweller, J., Ayres, P. and Kalyuga, S. 2011. Cognitive Load Theory. New York, NY: Springer New York.

Tyler, M. D. 2019. PAM-L2 and Phonological Category Acquisition in the Foreign Language Classroom, in: Nyvad, A. M., Hejná, M., Højen, A., Jespersen, A. B., and Sørensen, M. H. eds. A Sound Approach to Language Matters: In Honor of Ocke-Schwen Bohn. Aarhus: Aarhus University, pp.607–630.

Vandergrift, L. 1997. The Cinderella of Communication Strategies: Reception Strategies in Interactive Listening. The Modern Language Journal. 81(4), pp.494–505.

Waters, A. 1998. Managing monkeys in the ELT classroom. ELT Journal. 52(1), pp.11–18.

Watters, A. 2020. Cheating, Policing, and School Surveillance. 06 October. Hack Education. [Online]. [Accessed 7 May 2021]. Available at:

Wisniewska, N. and Mora, J. C. 2020. Can captioned video benefit second language pronunciation? Studies in Second Language Acquisition. 42(3), pp.599–624.

WordPress [no date]., WordPress. [Online]. [Accessed 7 May 2021]. Available from:

YouTube [no date]. YouTube. [Online]. [Accessed 7 May 2021]. Available from: