Passa al contenuto principale

Processing texts and Corpora: An introduction

Questo corso prevede restrizioni per l'iscrizione. Effettua il login per verificare se soddisfi i requisiti necessari

edvance

This MOOC was produced as part of the Edvance project – Digital Education Hub per la Cultura Digitale Avanzata. The project is funded by the European Union – Next Generation EU, Component 1, Investment 3.4 “Didattica e competenze universitarie avanzate".

About this course

This MOOC introduces basic notions and techniques for text and corpus processing: textual data formats and annotation, corpus creation and management, corpus query tools, introduction to corpus research. It is designed both as a self-standing introduction, and as a quick run-through of concepts, issues and techniques to be investigated and practiced further through classroom activities.

The course is structured around four thematic Weeks, guiding you in the exploration of methods and techniques for collecting and analyzing texts and corpora (collections of texts in digital format).

  • Week 1 - Understanding digital text
  • Week 2 - Building and using plain text corpora
  • Week 3 - Bulding, accessing and using annotated corpora
  • Week 4 - Applications of corpus methods

Each Week has three modules and each module three units. In addition to video and textual lessons, all units include self-assessment quizzes and supplementary materials for you to better understand the course content.

Target

While the MOOC was developed with Bachelor-level students in mind, it would also be of interest to: prospective university students, Master-level and doctoral students, as well as life-long learners.

Outcomes

At the end of the course, the participant:

  • Knows the basic principles of text and corpus analysis;
  • Knows the basic methods of text and corpus analysis;
  • Knows the main tools for text and corpus analysis;
  • Is able to select reliable and appropriate texts in order to carry out simple linguistic analyses;
  • Is able to structure texts into corpora according to principles of representativeness, comparability, and interoperability;
  • Is able to conduct and evaluate simple linguistics studies based on textual information from corpora, applying appropriate extraction, interpretation and description procedures.

Requirements

None

Activities

The MOOC has four weeks, each week has three modules, and each module has three units. Every module is accompanied by a self-assessment quiz, supplementary learning materials, and bibliographic references. Before the start of the lessons, a preliminary quiz is offered to assess awareness and stimulate interest in the course topics. The MOOC concludes with a final assessment test, which includes some of the questions encountered in the self-assessment quizzes, as well as new ones.

Open Badge

Participants who complete the course will be awarded an Open Badge from BESTR. Participants who log in to the platform with University of Bologna, EDUGAIN, CIE or Spid authentication and answer correctly at least 60% of the questions in total, will receive an email with instructions to download their Open Badge from the BESTR website.

Subtitles

English subtitles available.

For better understanding, subtitles are available for each video and can be activated or not. If you want to revise some crucial passages you can move through the video content and click on the attached text.

EQF level

5

ISCED-F

0231 Language acquisition
0232 Literature and linguistics
0288 Inter-disciplinary programmes and qualifications involving arts and humanities

Categories

ENG: Arts and humanities; Information Technology and Computer Science;
ITA: Arte e discipline umanistiche; Tecnologie dell'Informazione e della Comunicazione

SDGS

4 - Quality Education

FAQ

For further information, see FAQ page.

Course Professors

bernardini

Silvia Bernardini

Bachelor’s degree in Traduzione (1997, Bologna); MPhil in English and Applied Linguistics (1999, Cantab); PhD in Translation Studies (2008, Middlesex) — is Professor of English Linguistics and Translation at the Department of Interpreting and Translation (DIT) of the University of Bologna, Italy. Her research interests focus on applications of corpus linguistics to language and translator education and on corpus-based translation studies. She teaches courses at BA and MA level and has given numerous invited talks at international conferences on these topics.

polizzi

Daniele Polizzi

Bachelor’s degree in Lingue e Letterature/Studi Interculturali (2020, Palermo), Master’s degree in Specialized Translation (2023, Bologna) — is a doctoral student in Translation, Interpreting and Intercultural Studies at the University of Bologna, specialising in English Linguistics and Translation. He holds a 1st level Master’s in clinical linguistics (2023) and has worked as a research assistant in the “UNITE” PRIN project (2024). His research interests focus on corpus linguistics and corpus-assisted translation studies.

tedesco

Novella Tedesco

Bachelor’s degree in Mediazione Linguistica e Culturale (Università L’Orientale, Naples), Master’s degree in Specialised Translation (2022, Bologna) — is a doctoral student in Translation, Interpreting and Intercultural Studies at the University of Bologna, specialising in English Linguistics and Translation. She received specialised training in FAIR data management, Open Science and AI for academic purposes. Her research interests include sociolinguistics, corpus linguistics, language teaching and linguistic ethnography.

Collaborators

Thanks to: Olga Arsić, Viktória Víg, Eszter Víg.