Bibliografia

Corpus orals i corpus de llengua oral



Corpus orals i corpus de llengua oral: treballs generals

Corpus orals: treballs generals

Carré, R. (1991). Los bancos de sonidos. A J. Vidal Beneyto (Ed.), Las industrias de la lengua (p. 108–118). Fundación Sánchez Ruipérez; Pirámide.

Carré, R. (1992). Speech databases. A W. A. Ainsworth (Ed.), Advances in speech, hearing and language processing (Vol. 2, p. 199–216). JAI Press.

Draxler, C. (2000). Speech databases. A F. van Eynde i D. Gibbon (Ed.), Lexicon development for speech and language processing (p. 169–206). Kluwer. https://doi.org/10.1007/978-94-010-9458-0

Draxler, C., Harrington, J. i Schiel, F. (2017). Towards the next generation of speech tools and corpora. Computer Speech & Language, 46, 175–178. https://doi.org/10.1016/j.csl.2017.05.007

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Spoken language system and corpus design. Mouton de Gruyter. https://doi.org/10.1515/9783110809817

1., User’s guide; 2., System design; 3., SL corpus design; 4., SL corpus collection; 5., SL corpus representation.

Lamel, L. i Cole, R. A. (1997). Spoken language corpora. A R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen i V. Zue (Ed.), Survey of the state of the art in Human Language Technology (p. 450–454). Cambridge University Press; Giardini. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.366.9300

Llisterri, J. (1996). Els corpus lingüístics orals. A L. Payrató, E. Boix, M.-R. Lloret i M. Lorente (Ed.), Corpus, corpora: Actes del 1r i 2n coŀloquis lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2) (p. 27–70). Promociones y Publicaciones Universitarias. http://hdl.handle.net/2445/111985

Llisterri, J. (1999). Corpus orals per a la fonètica i les tecnologies de la parla. Actes del I Congrés de Fonètica Experimental / Actas del I Congreso de Fonética Experimental / Proceedings of the I Congress of Experimental Phonetics. Tarragona, 22, 23 i 24 de febrer de 1999 (p. 27–38). Universitat Rovira i Virgili; Universitat de Barcelona. https://joaquimllisterri.cat/publicacions/Resum_tarragona_99.html

Llisterri, J., Machuca, M. J., de la Mota, C., Riera, M. i Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del discurso oral, 8, 289–325. https://joaquimllisterri.cat/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Pols, L. C. W. (1987). Speech technology and corpus linguistics. A W. Meijs (Ed.), Corpus linguistics and beyond: Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora (p. 285–294). Brill.

Schiel, F., Draxler, C., Baumann, A., Ellbogen, T. i Steffen, A. (2012, 21 de març). The production of speech corpora (Version 2.5). Bavarian Archive for Speech Signals. https://doi.org/10.5282/ubm/epub.13693

van Son, R. J. J. H. (2017, 30 d’octubre). Notes on corpus construction. University of Amsterdam, Phonetic Sciences. https://www.fon.hum.uva.nl/rob/NotesOnCorpora/NotesOnCorpusConstruction.pdf

El corpus de treball

Corpus de llengua oral: treballs generals

Adolphs, S. i Carter, R. (2013). Spoken corpus linguistics: From monomodal to multimodal. Routledge. https://doi.org/10.4324/9780203526149

I.– Monomodal spoken corpus analysis: 1.– Making a start: Building and analyzing a spoken corpus; 2.– Corpus and spoken interaction: Multi-word units in spoken English; 3.– From concordance to discourse: Responses to speakers; 4.– Case studies in applied spoken corpus linguistics; ii.– Multimodal spoken corpus analysis: 5.– Sound evidence: Prosody and spoken corpora; 6.– Moving beyond the text; 7.– Developing a framework for analysing ‘headtalk’ and ‘handtalk’: First steps; 8.– Future directions.

Alcántara, M. (2008). Los retos en el análisis de los corpus de última generación. A R. Monroy i A. Sánchez (Ed.), 25 años de Lingüística Aplicada en España: hitos y retos / 25 years of Applied Linguistics in Spain: Milestones and challenges (p. 701–706). Servicio de Publicaciones de la Universidad de Murcia. https://www.um.es/lacell/aesla/contenido/pdf/6/alcantar.pdf

McCarthy, M. i O’Keeffe, A. (2013). Analyzing spoken corpora. A C. A. Chapelle (Ed.), The encyclopedia of applied linguistics (p. 104–112). John Wiley & Sons. https://doi.org/10.1002/9781405198431.wbeal0028

Thompson, P. (2005). Spoken language corpora. A M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice. Oxford Text Archive. http://hdl.handle.net/20.500.12024/2951

Whichmann, A. (2008). Speech corpora and spoken corpora. A A. Lüdeling i M. Kytö (Ed.), Corpus linguistics: An international handbook (Vol. 1, p. 187–207). Walter de Gruyter.

Lingüística de corpus i recursos lingüístics: treballs generals

Lingüística de corpus i recursos lingüístics: revistes

arrow_up

Corpus orals i corpus de llengua oral: reculls i actes de congressos

Durand, J., Gut, U. i Kristoffersen, G. (Ed.). (2014). The Oxford handbook of corpus phonology. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.001.0001

1.– Introduction (J. Durand, U. Gut, G. Kristoffersen); I.– Phonological corpora: Design, compilation, and exploitation: 2.– Corpus design (U. Gut, H. Voorman); 3.– Data collection (B. Birch); 4.– Corpus annotation: Methodology and transcription systems (E. Delais-Roussarie, B. Post); 5.– On automatic phonological transcription of speech corpora (H. Strik, C. Cucchiarini); 6.– Statistical corpus exploitation (H. Moisl); 7.– Corpus archiving and dissemination (P. Wittenburg, P. Trilsbeek, F. Wittenburg); 8.– Metadata formats (D. Broeder, D. van Uytvanck); 9.– Data formats for phonological corpora (L. Romary, A. Witt); II.– Applications: 10.– Corpus and research in phonetics and phonology: Methodological and formal considerations (E. Delais-Roussarie, H. Yoo); 11.– A corpus-based study of apicalization of /s/ before /l/ in Oslo Norwegian (G. Kristoffersen, H. G. Simonsen); 12.– Corpora, variation, and phonology: An illustration from French liaison (J. Durand); 13.– Corpus-based investigations of child phonological development: Formal and practical considerations (Y. Rose); 14.– Corpus phonology and second language acquisition (U. Gut); III.– Tools and methods: 15.– ELAN: Multimedia Annotation Application (H. Sloetjes); 16.– EMU (T. John, L. Bombien); 17.– The use of Praat in corpus research (P. Boersma); 18.– Praat scripting (C. Brinckmann); 19.– The PhonBank project: Data and software-assisted methods for the study of phonology and phonological development (Y. Rose, B. MacWhinney); 20.– EXMARaLDA (T. Schmidt, K. Wörner); 21.– ANVIL: The video annotation research tool (M. Kipp); 22.– Web-based archiving and sharing of phonological corpora (A. Tchobanov); IV.– Corpora: 23.– The IViE Corpus (F. Nolan, B. Post); 24.– French phonology from a corpus perspective: The PFC programme (J. Durand, B. Laks, C. Lyche); 25.– Two Norwegian speech corpora: NoTa-Oslo and TAUS (K. Hagen, H. G. Simonsen); 26.– The LeaP corpus (U. Gut); 27.– The Diachronic Electronic Corpus of Tyneside English: Annotation practices and dissemination strategies (J. C. Beal, K. P. Corrigan, A. J. Mearns, H. Moisl); 28.– The Lanchart corpus (F. Gregersen, M. Maegaard, N. Pharao); 29.– Phonological and phonetic databases at the Meertens institute (M. van Oostendorp); 30.– The VALIBEL speech database (A. C. Simon, M. Francard, P. Hambye); 31.– Prosody and discourse in the Australian Map Task Corpus (J. Fletcher, L. Stirling); 32.– A phonological corpus of L1 acquisition of Taiwan Southern Min (J. S. Tsay).

Pols, L. C. W. (Ed.). (1990). Speech input / output assessment and speech databases [Special issue]. Speech Communication, 9(4). https://www.sciencedirect.com/journal/speech-communication/vol/9/issue/4

Pols, L. C. W. i van Bezooijen, R. (Ed.). (1989). Speech Input/Output Assessment and Speech Databases, ESCA Tutorial and Research Workshop. Noordwijkerhout, The Netherlands, September 20-23, 1989. ISCA Archive. https://www.isca-speech.org/archive_open/sioa_89/index.html

Véronis, J. (Ed.). (2004). Le traitement automatique des corpus oraux [Numéro spécial]. Traitement automatique des langues, 45(2). https://tal.revuesonline.com/resnum.jsp?langue=en&editionId=539

LREC, International Conference on Language Resources and Evaluation

arrow_up

Corpus orals i corpus de llengua oral: estàndards

Broeder, D. i van Uytvanck, D. (2014). Metadata formats. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 150–165). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.008

Romary, L. i Witt, A. (2014). Data formats for phonological corpora. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 166–190). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.005

Tchobanov, A. (2014). Web-based archiving and sharing phonological corpora. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 437–468). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.029

Wittenburg, P., Trilsbeek, P. i Wittenburg, F. (2014). Corpus archiving and dissemination. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 133–149). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.031

TEI, Text Encoding Initiative

Johansson, S. (1995). The approach of the Text Encoding Initiative to the encoding of spoken discourse. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 82–98). Longman. https://doi.org/10.4324/9781315843162

Johansson, S. (1995). The encoding of spoken texts. A N. Ide i J. Véronis (Ed.), Text Encoding Initiative: Background and context (p. 149–158). Kluwer. https://doi.org/10.1007/978-94-011-0325-1_12

TEI Consortium (Ed.). (2021). 8. Transcriptions of speech. A TEI P5: Guidelines for Electronic Text Encoding and Interchange [Version 4.2.2 Last updated on 9th April 2021]. Text Encoding Initiative. https://tei-c.org/release/doc/tei-p5-doc/en/html/TS.html

TEI, Text Encoding Initiative

EAGLES, (Expert Advisory Group on Language Engineering Standards) Spoken Language Working Group – ISLE (International Standards for Language Engineering) Natural Interactivity and Multimodality Working Group

Dybkjær, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U. i Llisterri, J. (2001). Requirements specification for a tool in support of annotation of natural interaction and multimodal data (Deliverable D11.2, Final report). IST-1999-10647 ISLE, International Standards for Language Engineering, Natural Interactivity and Multimodality Working Group. https://joaquimllisterri.cat/publicacions/Dybkjaer_et_al_01_annotation_multimodality.pdf

Dybkjær, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N. i Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data (Deliverable D11.1, Final Report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1997). Handbook of standards and resources for spoken language systems. Mouton de Gruyter.

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Spoken language system and corpus design. Mouton de Gruyter. https://doi.org/10.1515/9783110809817

1.– User’s guide; 2.– System design; 3.– SL corpus design; 4.– SL corpus collection; 5.– SL corpus representation.

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Spoken language characterisation. Mouton de Gruyter. https://doi.org/10.1515/9783110804041

1.– User’s guide; 2.– Spoken language lexica; 3.– Language models; 4.– Physical characterisation and description.

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Spoken language reference materials. Mouton de Gruyter. https://doi.org/10.1515/9783110804508

1.– User’s guide; A. Character codes and computer readable alphabets; B.– SAMPA computer readable phonetic alphabet; C.– SAM file formats; D.– SAM recording protocols; E.– SAM software tools; F.– EUROPEC recording tools; G.– Digital storage media; H.– Database management systems; I.– Speech standards; J.– EUROM-1 database overview; K.– Polyphone project overview; L.– European speech resources; M.– Transcription and documentation conventions for Speechdat; N.– The Bavarian Archive for Speech Signals.

Gibbon, D., Mertins, I. i Moore, R. K. (2000). Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation. Springer. https://doi.org/10.1007/978-1-4615-4501-9

1., Representation and annotation of dialogue. 2., Audio-visual and multimodel speech-based systems. 3., Consumer off-the-shelf (COTS) product and service evaluation. 4., Terminology for spoken language systems. 5., Reference materials.

Llisterri, J. (1996, maig). Preliminary recommendations on spoken texts (EAGLES Document EAG-TCWG- SPT/P). Expert Advisory Group on Language Engineering Standards. http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html

Winski, R., Moore, R. K. i Gibbon, D. (1995). EAGLES Spoken Language Working Group: Overview and results. Fourth European Conference on Speech Communication and Technology (EUROSPEECH’95). Madrid, Spain, September 18-21, 1995 (Vol. 1, p. 841–845). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1995/e95_0841.html

EAGLES, Expert Advisory Group on Language Engineering Standards

SAM, Speech Assessment Methodologies

Chan, D., Fourcin, A., Gibbon, D., Granström, B., Huckvale, M., Kokkinakis, G., Kvale, K., Lamel, L., Lindberg, B., Moreno, A., Mouropoulos, J., Senia, F., Trancoso, I., t’Veld, C. i Zeiliger, J. (1995). EUROM – A spoken language resource for the EU – The SAM Projects. Fourth European Conference on Speech Communication and Technology (EUROSPEECH’95). Madrid, Spain, September 18-21, 1995 (Vol. 1, p. 867–870). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1995/e95_0867.html

Fourcin, A. i Dolmazon, J.-M. (1991). Speech knowledge, standards and assessment. Actes du XIIème Congrès International de Sciences Phonétiques / Proceedings of the XIIth International Congress of Phonetic Sciences. 19-24 août 1991. Aix-en-Provence, France / August 19-24, 1991. Aix-en-Provence, France (Vol. 5, p. 430–433). Université de Provence, Service des Publications.

Fourcin, A., Harland, G., Barry, W. J. i Hazan, V. (Ed.). (1989). Speech input and output assessment: Multilingual methods and standards. Ellis Horwood.

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). EUROM.1 database overview. A Spoken language reference materials (p. 172–173). Mouton de Gruyter. https://doi.org/10.1515/9783110804508-016

SpeechDat

Draxler, C., van den Heuvel, H. i Tropf, H. (1998). SpeechDat experiences in creating large multilingual speech databases for teleservices. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 1, p. 361–366). European Language Resources Association (ELRA). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2745

van den Heuvel, H., Bonafonte, A., J, B., Dufour, S., Lockwood, P., Moreno, A. i Richard, G. (1999). SpeechDat-Car: Towards a collection of speech databases for automotive environments. Proceedings of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions. Tampere, Finland, May 25-26, 1999 (p. 135–138). https://hdl.handle.net/2066/76428

van den Heuvel, H., Boudy, J., Comeyne, R., Euler, S., Moreno, A. i Richard, G. (1999). The SpeechDat-Car multilingual speech databases for in-car applications: Some first validation results. Proceedings, Sixth European Conference on Speech Communication and Technology (EUROSPEECH’99). Budapest, Hungary, September 5-9, 1999 (p. 2279–2282). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1999/e99_2279.html

van den Heuvel, H., Hall, P., Höge, H., Moreno, A., Rincón, A. i Senia, F. (2004). SALA II across the finish line: A large collection of mobile telephone speech databases from North and Latin America completed. LREC 2004. 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal, 26-28 May 2004 (p. 97–100). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/summaries/288.html

Moreno, A. (2000). SALA: SpeechDat across Latin America. A C. Draxler (Ed.), XLDB – Very Large Telephone Speech Databases, Workshop Proceedings. Athens, Greece, May 29, 2000 (p. 16–19). European Language Resources Association (ELRA); Universität München, Institut für Phonetik und Sprachliche Kommunikation. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1864

Moreno, A., Comeyne, R., Haslam, K., van den Heuvel, H., Höge, H., Horbach, S. i Micca, G. (2000). SALA: SpeechDat across Latin America. Results of the first phase. LREC 2000. 2nd International Conference on Language Resources and Evaluation (LREC 2000). Athens, Greece, 31 May - 2 June 2000. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2000/html/summary/10.htm

Moreno, A., Gedge, O., van den Heuvel, H., Höge, H., Horbach, S., Martin, P., Pinto, E., Rincón, A., Senia, F. i Sukkar, R. (2002). SpeechDat across all America: SALA II. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 16–20). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/269.htm

Moreno, A., Höge, H., Köhler, J. i Mariño, J. B. (1998). SpeechDat across Latin America: Project SALA. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 1, p. 367–370). European Language Resources Association (ELRA).

Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S. i Allen, J. (2000). SpeechDat-Car: A large speech database for automotive environments. LREC 2000. 2nd International Conference on Language Resources and Evaluation. Athens, Greece, 31 May - 2 June 2000. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2000/html/summary/373.htm

van Velden, J. G., Langmann, D. i Pawlewski, M. (1996). Specification of speech data collection over mobile telephone networks (SD1.1.2/1.2.2, version 2.3). LE2-4001 SpechDat. http://147.83.50.136/projects/BDG/docs.html

Winski, R. (1997). Definition of corpus, scripts and standards for fixed network (SD1.1.1, version 4.1). LE2- 4001 SpechDat. http://147.83.50.136/projects/BDG/docs.html

arrow_up

Validació de recursos lingüístics

Boves, L. (1998). ELRA Validation manual for SLR (Spoken Language Resources) (Deliverable 6.1.1). ELRA Distribution Agency (ELDA). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.4354

van den Heuvel, H., Iskra, D., Sanders, E. i de Vriend, F. (2008). Validation of spoken language resources: An overview of basic aspects. Language Resources and Evaluation, 42(1), 41–73. https://doi.org/10.1007/s10579-007-9049-1

Schiel, F., Baumann, A., Draxler, C., Ellbogen, T., Hoole, P. i Steffen, A. (2012, 21 de març). The validation of speech corpora (version 1.10). Bavarian Archive for Speech Signals. https://doi.org/10.5282/ubm/epub.13698

arrow_up

Corpus orals i corpus de llengua oral: disseny i creació

Disseny i creació de corpus orals

✓ = Lectures recomanades: nivell introductori

Alcácer, N., Castro, M. J., Galiano, I., Granell, R., Grau, S. i Griol, D. (2004). Adquisición de un corpus de diálogo: DIHANA. A E. Sanchis (Ed.), Actas de las III Jornadas en Tecnología del Habla (3JTH). Universitat Politècnica de València, 17 al 19 de noviembre de 2004 (p. 131–136). Red Temática en Tecnología del Habla; Departamento de Sistemas Informáticos y Computación (DSIC). http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/III/actas3JTH.pdf

Andemach, T., Deville, G. i Mortier, L. (1993). The design of a real world Wizard of Oz experiment for a speech driven telephone directory information service. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 2, p. 1165–1168). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_1165.html

✓ Birch, B. (2014). Data collection. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 27–45). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.004

Boulianne, G., Kenny, P., Lennig, M., O’Shaughnessy, D. i Mermelstein, P. (1994). Books on tape as training data for continuous speech recognition. Speech Communication, 14(1), 61–70. https://doi.org/10.1016/0167-6393(94)90057-4

Campbell, N. (1998). Design of speech corpora for use in concatenative speech synthesis. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1309–1312). European Language Resources Association (ELRA).

Eskenazi, M., Hogan, C., Allen, J. i Frederking, R. (1998). Issues in database design: Recording and processing speech from new populations. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1289–1294). European Language Resources Association (ELRA).

Fraser, N. M. i Gilbert, G. N. (1991). Simulating speech systems. Computer Speech & Language, 5(1), 81–99. https://doi.org/10.1016/0885-2308(91)90019-m

✓ Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). SL corpus design. A Spoken language system and corpus design (p. 79–118). Mouton de Gruyter. https://doi.org/10.1515/9783110809817.79

✓ Gut, U. i Voormann, H. (2014). Corpus design. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 13–26). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.003

Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A. i Nogueiras, A. (2002). Interface databases: Design and collection of a multilingual emotional speech database. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 2024–2028). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/174.htm

Krstulović, S., Bimbot, F., Boëffard, O., Charlet, D., Fohr, D. i Mella, O. (2006). Optimizing the coverage of a speech database through a selection of representative speaker recordings. Speech Communication, 48(10), 1319–1348. https://doi.org/10.1016/j.specom.2006.07.002

Lamel, L., Rosset, S., Bennacef, S., Bonneau-Maynard, H., Devillers, L. i Gauvain, J.-L. (1995). Development of spoken language corpora for travel information. Fourth European Conference on Speech Communication and Technology (EUROSPEECH’95). Madrid, Spain, September 18-21, 1995 (p. 1961–1964). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1995/e95_1961.html

Machuca, M. J. (2006). Corpus para el desarrollo de sistemas de diálogo. A J. Llisterri i M. J. Machuca (Ed.), Los sistemas de diálogo (p. 61–79). Universitat Autònoma de Barcelona; Fundación Duques de Soria.

Millar, J. B. i Hawkins, S. R. (1990). Selecting representative speakers. A J. Laver, M. Jack i A. Gardener (Ed.), Speaker Characterization in Speech Technology, ESCA Tutorial and Research Workshop (p. 161–166). ISCA Archive. https://www.isca-speech.org/archive_open/scst_90/scst_161.html

✓ Niebuhr, O. i Michaud, A. (2015). Speech data acquisition: The underestimated challenge. Kieler Arbeiten zur Linguistik und Phonetik (KALIPHO), 3, 1–42. https://www.isfas.uni-kiel.de/de/linguistik/forschung/arbeitsberichte/kalipho-2-3

de Oliveira, L. C., Paulo, S., Figueira, L., Mendes, C., Nunes, A. i Godinho, J. (2008). Methodologies for designing and recording speech databases for corpus based synthesis. A N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis i D. Tapias (Ed.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco, 28-30 May, 2008. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/summaries/741.html

Schiel, F. i Türk, U. (2006). Wizard-of-Oz recordings. A W. Wahlster (Ed.), SmartKom: Foundations of multimodal dialogue systems (p. 541–570). Springer. https://doi.org/10.1007/3-540-36678-4_34

van Zanten, E. i Damen, E., Laurens aand van Houten. (1993). Collecting data for a speech database. A V. J. van Heuven i L. C. W. Pols (Ed.), Analysis and synthesis of speech: Strategic research towards hight-qualiy text-to-speech generation (p. 207–222). de Gruyter Mouton. https://doi.org/10.1515/9783110879001.207

Aspectes fonètics en el disseny de corpus orals

Disseny i creació de corpus de llengua oral

✓ = Lectures recomanades: nivell introductori

✓ Adolphs, S. i Knight, D. (2010). Building a spoken corpus: What are the basics? A A. O’Keeffe i M. McCarthy (Ed.), The Routledge handbook of corpus linguistics (p. 38–52). Routledge. https://doi.org/10.4324/9780203856949

Cid, M. i Ross, P. (2006). La construcción de un corpus de habla pública de Chile: criterios para la selección de una muestra representativa. Onomázein, 13, 21–33. http://onomazein.letras.uc.cl/03_Numeros/N13/N13.html

Čermák, F. (2009). Spoken corpora design: Their constitutive parameters. International Journal of Corpus Linguistics, 14(1), 113–123. https://doi.org/10.1075/ijcl.14.1.07cer

Crowdy, S. (1993). Spoken corpus design. Literary and Linguistic Computing, 8(4), 259–265. https://doi.org/10.1093/llc/8.4.259

Freitas, T. (2008). Recolha e transcrição de corpora orais. A E. Fernández Rei i X. L. Regueira (Ed.), Perspectivas sobre a oralidade (p. 297–324). Consello da Cultura Galega; Instituto da Lingua Galega. http://consellodacultura.gal/publicacion.php?id=10

Hidalgo, A. (1993). El habla juvenil: una propuesta metodológica para la extracción de un corpus oral representativo. A J. Fernández-Barrientos (Ed.), Jornadas Internacionales de Lingüística Aplicada / International Conference of Applied Linguistics. Robert J. Di Pietro in Memoriam. Actas / Proceedings (Vol. 1, p. 66–75). Universidad de Granada, Instituto de Ciencias de la Educación.

Hidalgo, A., Gallardo Paúls, B., Pons, S., Briz, A., Ruiz Gurillo, L., Gómez Molina, J. R. i Gómez Capuz, J. (1997). La elaboración de un corpus de español coloquial: problemas metodológicos previos. A E. Serra, B. Gallardo Paúls, M. Veyrat, D. Jorques i A. Alcina (Ed.), Panorama de la investigació lingüística a l’Estat Espanyol: Actes del I Congrés de Lingüística General (Vol. 2, p. 7–14). Universitat de València.

Izre’el, B., Shlomo Hary i Rahav, G. (2001). Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International Journal of Corpus Linguistics, 6(2), 171–197. https://doi.org/10.1075/ijcl.6.2.01izr

de Klerk, V. (2002). Starting with Xhosa English… towards a spoken corpus. International Journal of Corpus Linguistics, 7(1), 21–42. https://doi.org/10.1075/ijcl.7.1.02dek

Love, R. (2020). Overcoming challenges in corpus construction: The Spoken British National Corpus 2014. Routledge. https://doi.org/10.4324/9780429429811

1.– Introduction; I.– Before corpus construction: Theory and design: 2.– Why a new Spoken BNC and why now?; 3.– Theoretical challenges in corpus design; II.– During corpus construction: Theory meets practice: 4.– Challenges in data collection; 5.– Challenges in transcription, part I – Conventions; 6.– Challenges in transcription, part II – Who said what?; 7.– Challenges in corpus processing and dissemination; III.– After corpus construction: Evaluating the corpus: 8.– Evaluating the Spoken BNC2014; 9.– Conclusions and further construction work.

✓ Moreno Fernández, F. (1997). La formación de corpus de lengua hablada. A F. Moreno Fernández (Ed.), Trabajos de sociolingüística hispánica (p. 93–114). Universidad de Alcalá, Servicio de Publicaciones.

✓ Moreno Fernández, F. (1999). La formación de corpus–corpora de lengua hablada. A J. de las Cuevas i D. Fasla (Ed.), Contribuciones al estudio de la lingüística aplicada (p. 447–464). Asociación Española de Lingüística Aplicada (AESLA).

Põldvere, N., Frid, J., Johansson, V. i Paradis, C. (2021). Challenges of releasing audio material for spoken data: The case of the London–Lund Corpus 2. Research in Corpus Linguistics (RiCL), 9(1), 35–62. https://doi.org/10.32714/ricl.09.01.04

Recalde, M. i Vázquez, V. (2009). Problemas metodológicos en la formación de corpus orales. A P. Cantos i A. Sánchez (Ed.), A survey of corpus-based research / Panorama de investigaciones basadas en corpus (p. 37–49). Asociación Española de Lingüística de Corpus. https://gramatica.usc.es/~vvazq/pdf_publ/Recalde_Vazquez_2009.pdf

Summers, D. (1993). Longman/Lancaster English language corpus – Criteria and design. International Journal of Lexicography, 6(3), 181–208. https://doi.org/10.1093/ijl/6.3.181

Taylor, L. (1996). The compilation of the Spoken English Corpus. A G. Knowles, A. Wichmann i P. Alderson (Ed.), Working with speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus (p. 20–37). Longman.

Torruella, J. i Llisterri, J. (1999). Diseño de corpus textuales y orales. A J. M. Blecua, G. Clavería, C. Sánchez i J. Torruella (Ed.), Filología e informática: nuevas tecnologías en los estudios filológicos (p. 45–77). Universitat Autònoma de Barcelona, Departamento de Filología Española, Seminario de Filología e Informática; Editorial Milenio. https://joaquimllisterri.cat/publicacions/Torruella_Llisterri_ 99.pdf

Tècniques de gravació

La gravació

Tècniques d’obtenció de dades

Tècniques d’obtenció de dades

arrow_up

Eines per a la recollida, el tractament i la gestió de corpus orals

Boersma, P. (2014). The use of Praat in corpus research. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 342–360). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.016

Beller, G., Veaux, C., Degottex, G., Obin, N., Lachantin, P. i Rodet, X. (2008). IrcamCorpusTools : plate- forme pour les corpus de parole. Traitement automatique des langues, 49(3), 77–103. https://www.atala.org/index.php/content/ircamcorpustools-plate-forme-pour-les-corpus-de-parole

Chevalier, G., Kasparian, S. i Silberztein, M. (2004). Éléments de solution pour le traitement automatique d’un français oral régional. Traitement automatique des langues, 45(2), 41–62. https://tal.revuesonline.com/article.jsp?articleId=5768

Chiyah, F. J., Lopes, J., Liu, X. i Hastie, H. (2020). CRWIZ: A framework for crowdsourcing real-time Wizard-of-Oz dialogues. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 288–297). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.36.pdf

Draxler, C. i Jänsch, K. (2004). SpeechRecorder – a universal platform independent multi-channel audio recording software. LREC 2004. 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal, 26-28 May 2004 (p. 559–562). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/summaries/242.htm

Draxler, C. i Jänsch, K. (2008). WikiSpeech: A content management system for speech databases. INTERSPEECH 2008 – 9th Annual Conference of the International Speech Communication Association. Brisbane, Australia, September 22-26, 2008 (p. 1646–1649). ISCA Archive. https://www.isca-speech.org/archive_v0/interspeech_2008/i08_1646.html

Ferragne, E., Flavier, S. i Fressard, C. (2013). ROCme! software for the recording and management of speech corpora. Interspeech 2013. Lyon, France, August 25-29, 2013 (p. 1864–1865). ISCA Archive. https://www.isca-speech.org/archive/interspeech_2013/ferragne13_interspeech.html

Fonollosa, J. A. R. i Moreno, A. (1998). Automatic database acquisition software for ISDN PC cards and analogic boards. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1325–1328). European Language Resources Association (ELRA).

Fryda, P. i Kopeček, I. (1998). PHC format for managing data in phonetic corpora. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1283–1287). European Language Resources Association (ELRA).

Garofolo, J. S. i Pallett, D. S. (1989). Use of CD-ROM for speech database storage and exchange. First European Conference on Speech Communication and Technology (EUROSPEECH ’89). Paris, France, September 27-29, 1989 (Vol. 2, p. 309–312). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1989/e89_2309.html

Garrido, J. M. (2020). Eines computacionals per a la creació i explotació de corpus orals en català. Zeitschrift für Katalanistik / Revista Alemanya d’Estudis Catalans, 33, 131–154. https://dialnet.unirioja.es/servlet/articulo?codigo=7712169

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Spoken language reference materials. Mouton de Gruyter. https://doi.org/10.1515/9783110804508

1.– User’s guide; A. Character codes and computer readable alphabets; B.– SAMPA computer readable phonetic alphabet; C.– SAM file formats; D.– SAM recording protocols; E.– SAM software tools; F.– EUROPEC recording tools; G.– Digital storage media; H.– Database management systems; I.– Speech standards; J.– EUROM-1 database overview; K.– Polyphone project overview; L.– European speech resources; M.– Transcription and documentation conventions for Speechdat; N.– The Bavarian Archive for Speech Signals.

Goldman, J.-P., Scherrer, Y., Glikman, J., Avanzi, M., Benzitoun, C. i Boula de Mareüil, P. (2018). Crowdsourcing regional variation data and automatic geolocalisation of speakers of European French. A N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis i T. Tokunaga (Ed.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, May 7-12, 2018 (p. 3336–3342). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018/summaries/517.html

Harrington, J. (2010). Some tools for building and querying annotated speech databases. Phonetic analysis of speech corpora (p. 20–45). Wiley-Blackwell. https://www.phonetik.uni-muenchen.de/~jmh/research/pasc010808/pasc.pdf

Hughes, T., Nakajima, K., Ha, L., Vasu, A., Moreno, P. J. i LeBeau, M. (2010). Building transcribed speech corpora quickly and cheaply for many languages. Interspeech 2010. Makuhari, Japan, September 26-30, 2010 (p. 1914–1917). ISCA Archive. https://doi.org/10.21437/Interspeech.2010-551

Jacobson, M. (2004). Gestion de corpus oraux annotés : méthodes et outils. JEP 2004. 25èmes Journées d’Études sur la Parole. Fès, Maroc, 19-22 avril 2004. http://www.afcp-parole.org//doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/jep2004/Jacobson.pdf

John, T. i Bombien, L. (2014). EMU. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 321–341). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.022

Jonell, P., Oertel, C., Kontogiorgios, D., Beskow, J. i Gustafson, J. (2018). Crowsourced multimodal corpora collection tool. A N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis i T. Tokunaga (Ed.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, May 7-12, 2018 (p. 728–734). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018/summaries/9.html

Miguel, A., Galiano, I., Granell, R., Hurtado, L. F., Sánchez, J. A. i Sanchis, E. (2003). La plataforma de adquisición de diálogos en el proyecto DIHANA. Procesamiento del Lenguaje Natural, 31, 341–342. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3200/1691

Milde, J.-T. i Gut, U. (2002). The TASX-environment: An XML-based toolset for time aligned speech corpora. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 1922–1927). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/293.htm

Nogueiras, A. i Moreno, A. (1998). NaniBD: A set of tools for transcribing and validating speech databases. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1359–1365). European Language Resources Association (ELRA). http://hdl.handle.net/2117/22969

Ogawa, H., Nishikawa, H., Tokunaga, T. i Yokono, H. (2020). Gamification platform for collecting task- oriented dialogue data. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 7084–7093). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.876.pdf

Oostdijk, N., Kristoffersen, G. i Sampson, G. (Ed.). (2004). Compiling and Processing Spoken Language Corpora. Lisboa, Portugal, 24 May 2004. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws1.pdf

Ribeiro, C., Trancoso, I. i Serralheiro, A. (1993). A software tool for speech collection, recognition and reproduction. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 179–182). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0179.html

Schmidt, T. i Wörner, K. (2014). EXMARaLDA. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 402–419). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.030

Stan, A. (2020). RECOApy: Data recording, pre-processing and phonetic transcription for end-to-end speech-based applications. Interspeech 2020. Shanghai, China, 25-29 October 2020 (p. 586–590). ISCA Archive. https://doi.org/10.21437/Interspeech.2020-1184

Soria, C., Bernsen, N. O., Cadée, N., Carletta, J., Dybkjær, L., Evert, S., Heid, U., Isard, A., Kolodnytsky, M., Lauer, C., Lezius, W., Noldus, L. P. J. J., Pirrelli, V., Reithinger, N. i Vögele, A. (2002). Advanced tools for the study of natural interactivity. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 357–363). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/236.htm

Titeux, H., Riad, R., Cao, X.-N., Hamilakis, N., Madden, K., Cristia, A., Bachoud-Lévi, A.-C. i Dupoux, E. (2020). Seshat: A tool for managing and verifying annotation campaigns of audio data. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 6976–6982). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.861.pdf

Véronis, J. (2004). Le traitement automatique des corpus oraux. Traitement automatique des langues, 45(2), 7–14. https://tal.revuesonline.com/article.jsp?articleId=5766

arrow_up

Transcripció i etiquetatge de corpus orals

✓ = Lectures recomanades: nivell introductori

✓ Delais-Roussarie, E. i Post, B. (2014). Corpus annotation: Methodology and transcription systems. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 46–88). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.002

✓ Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada, (Volumen monográfico «Panorama de la investigación en lingüística informática»), 53–82. https://dialnet.unirioja.es/servlet/articulo?codigo=227025

Oostdijk, N. i Boves, L. (2008). Preprocessing speech corpora: Transcription and phonological annotation. A A. Lüdeling i M. Kytö (Ed.), Corpus linguistics: An international handbook (Vol. 1, p. 187–207). Walter de Gruyter.

Nivells d’etiquetatge de corpus orals

Barry, W. J. i Fourcin, A. (1992). Levels of labelling. Computer Speech & Language, 6(1), 1–14. https://doi.org/10.1016/0885-2308(92)90041-2

Marchal, A., Nguyen, N. i Hardcastle, W. J. (1995). Multitiered phonetic approach to speech labelling. A C. Sorin, J. Mariani, H. Méloni i J. Schoentgen (Ed.), Levels in speech communication: Relations and interactions. A tribute to Max Wajskop / Hommage à Max Wajskop (p. 149–158). Elsevier.

Tillmann, H. G. i Pompino-Marschall, B. (1993). Theoretical principles concerning segmentation, labelling strategies and levels of categorical annotation for spoken language database systems. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 3, p. 1691–1694). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_1691.html

Criteris de transcripció i d’etiquetatge de corpus orals

Autesserre, D., Pérennou, G. i Rossi, M. (1989). Methodology for the transcription and labeling of a speech corpus. Journal of the International Phonetic Association, 19(1), 2–15. https://doi.org/10.1017/s0025100300005867

Campbell, N. (2002). Labelling natural conversational speech data. Proceedings of the Fall Meeting of the Acoustical Society of Japan (p. 273–274). http://www.speech-data.jp/nick/pubs/lncsd.pdf

Johnson, K. (2004). Aligning phonetic transcriptions with their citation forms. Acoustics Research Letters Online, 5(2), 19–24. https://doi.org/10.1121/1.1635751

Keating, P., MacEachern, P., Shryock, A. i Domínguez, S. (1994). A manual for phonetic transcription: Segmentation and labeling of words in spontaneous speech. UCLA Working Papers in Phonetics, 88, 91–120. https://escholarship.org/uc/item/6p1293fd

Croot, K. i Taylor, B. (1995). Criteria for acoustic-phonetic segmentation and word labelling in the Australian national database of spoken language. Macquarie University, Speech, Hearing and Language Research Centre. https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.518.1682

Lander, T. (1997, 15 de maig). The CSLU labeling guide. Oregon Graduate Institute, Center for Spoken Language Understanding. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=E19995BC0CD47D42F799143BBCAEC71E?doi=10.1.1.163.165&rep=rep1&type=pdf

Pitt, M. A., Johnson, K., Hume, E., Kiesling, S. i Raymond, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45(1), 89–95. https://doi.org/10.1016/j.specom.2004.09.001

Roach, P., Roach, H., Dew, A. i Rowlands, P. (1990). Phonetic analysis and the automatic segmentation and labeling of speech sounds. Journal of the International Phonetic Association, 20(1), 15–21. https://doi.org/10.1017/s002510030000400x

Validació de la transcripció i de l’etiquetatge de corpus orals

Artstein, R. i Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596. https://doi.org/10.1162/coli.07-034-R2

Barry, W. i Grice, M. (1991). Auditory and visual factors in speech database analysis. Speech, Hearing and Language: Work in Progress, 5, 9–32.

Callejas, Z. i López-Cózar, R. (2009). Bases para evaluar la anotación de corpus de emociones espontáneas. Procesamiento del Lenguaje Natural, 43, 66–73. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5

Cole, R., Oshika, B. T., Noel, M., Lander, T. i Fanty, M. (1994). Labeler agreement in phonetic labelling of continuous speech. Third International Conference on Spoken Language Processing (ICSLP 94). Yokohama, Japan, September 18-22, 1994 (p. 2131–2134). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1994/i94_2131.html

Eisen, B. (1993). Reliability of speech segmentation and labelling at different levels of transcription. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 673–676). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0673.html

Grice, M. i Barry, W. (1991). Phonetic units by ear and eye. Phonetics and Phonology of Speaking Styles: Reduction and Elaboration in Speech Communication. Barcelona, Catalonia, Spain, September 30 - October 2, 1991 (p. 29-1–29–5). ISCA Archive. https://www.isca-speech.org/archive_open/ppospst/pp91_029.html

van Hoeckel, C. J. M. (1989). The reliability of manual labelling of continuous speech. A L. C. W. Pols i R. van Bezooijen (Ed.), Speech Input/Output Assessment and Speech Databases, ESCA Tutorial and Research Workshop. Noordwijkerhout, The Netherlands, September 20-23, 1989 (p. 179–182). ISCA Archive. https://www.isca-speech.org/archive_open/sioa_89/sia_2179.html

Kvale, K. i Foldvik, A. K. (1991). Manual segmentation and labelling of continuous speech. Phonetics and Phonology of Speaking Styles: Reduction and Elaboration in Speech Communication. Barcelona, Catalonia, Spain, September 30 - October 2, 1991 (p. 37-1–37–5). ISCA Archive. https://www.isca-speech.org/archive_open/ppospst/pp91_037.html

Pitt, M. A., Johnson, K., Hume, E., Kiesling, S. i Raymond, W. (2005). The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability. Speech Communication, 45(1), 89–95. https://doi.org/10.1016/j.specom.2004.09.001

Raymond, W. D., Pitt, M. A., Johnson, K., Hume, E., Makashay, M. J., Dautricourt, R. i Hilts, C. (2002). An analysis of transcription consistency in spontaneous speech from the Buckeye corpus. A J. H. L. Hansen i B. Pellom (Ed.), 7th International Conference on Spoken Language Processing (ICSLP 2002 – INTERSPEECH 2002). Denver, Colorado, USA, September 16-20, 2002 (p. 1125–1128). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_2002/i02_1125.html

Strangert, E. i Heldner, M. (1995). Labelling of boundaries and prominences by phonetically experienced and non-experienced transcribers. Phonum. Reports from the Department of Phonetics, Umeå University, 3, 85–109. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.7.2357

Transcripció i etiquetatge de corpus orals multilingües

Barry, W. i Dalsgaard, P. (1993). Speech database annotation: The importance of a multi-lingual approach. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (p. 13–20). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0013.html

Dalsgaard, P., Andersen, O. i Barry, W. (1991). Multi-lingual acoustic-phonetic features for a number of European languages. Second European Conference on Speech Communication and Technology (EUROSPEECH ’91). Genova, Italy, September 24-26, 1991 (Vol. 2, p. 685–688). ISCA Archive. https://www.isca-speech.org/archive/eurospeech_1991/index.html

Dalsgaard, P., Andersen, O. i Barry, W. J. (1991). The cross-language validity of acoustic-phonetic features in label alignment. Actes du XIIème Congrès International de Sciences Phonétiques / Proceedings of the XIIth International Congress of Phonetic Sciences. 19-24 août 1991. Aix-en-Provence, France / August 19-24, 1991. Aix-en-Provence, France (Vol. 5, p. 382–385). Université de Provence, Service des Publications.

van Erp, A., Houben, C. G. J., Barry, W. J., Grice, M., Boë, L.-J., Braun, G., Cosi, P., Dyhr, N., Pérennou, G., Vigouroux, N. i Autesserre, D. (1989). A unified approach to the labelling of speech: First multilingual results. First European Conference on Speech Communication and Technology (EUROSPEECH ’89). Paris, France, September 27-29, 1989 (Vol. 2, p. 88–91). ISCA Archive. https://www.isca-speech.org/archive/eurospeech_1989/index.html

Transcripció i etiquetatge de corpus multimodals

Alcántara, M. (2007). La anotación del habla en un corpus de vídeo. Procesamiento del Lenguaje Natural, 38, 131–139. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/2723/1241

Baldry, A. i Thibault, P. J. (2006). Multimodal transcription and text analysis: A multimedia toolkit and cour- sebook with associated on-line course. Equinox.

1.– Introduction: Multimodal texts and genres; 2.– The printed page; 3.– The web page; 4.– Film texts and genres.

Baldry, A. i Thibault, P. J. (2006). Multimodal corpus linguistics. A G. Thompson i S. Hunston (Ed.), System and corpus: Exploring connections (p. 164–183). Equinox.

Dybkjær, L. i Bernsen, N. O. (2002). Natural interactivity resources: Data, annotation schemes and tools. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 349–356). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/213.htm

Dybkjær, L., Bernsen, N. O., Broeder, D. i Wittenburg, P. (2003). Introduction to and summary of the final NIMM WG guidelines (Deliverable D7.1, Final report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2003e/D7.1-14.2.2003-F.pdf

Dybkjær, L., Bernsen, N. O., Wegener Knudsen, M., Llisterri, J., Machuca, M. J., Martin, J.-C., Pelachaud, C., Riera, M. i Wittenburg, P. (2003). Guidelines for the creation of NIMM annotation schemes (Deliverable D9.2 Final Report). IST-1999-10647 ISLE, International Standards for Language Engineering, ISLE Natural Interactivity and Multimodality Working Group. https://joaquimllisterri.cat/publicacions/Dybkjaer_et_al_03_Guidelines_NIMM_annotation_schemes.pdf

Knight, D., Evans, D., Carter, R. i Adolphs, S. (2009). HeadTalk, HandTalk and the corpus: Towards a framework for multi-modal, multi-media corpus development. Corpora, 4, 1–32. https://doi.org/10.3366/E1749503209000203

Sloetjes, H. (2014). ELAN: Multimedia annotation application. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 305–320). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.019

Steininger, S. (2000). Transliteration of language and labeling of emotion and gestures in SmartKom. A D. Broeder, H. Cunningham, N. Ide, D. Roy, H. Thompson i P. Wittenburg (Ed.), Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources, Proceedings. Athens, Greece, 29- 30 May 2000 (p. 49–51). European Language Resources Association (ELRA); Max Planck Institute for Psycholinguistics. https://www.mpi.nl/ISLE/documents/papers/Steininger_paper.pdf

Steininger, S., Schiel, F. i Rabold, S. (2006). Annotation of multimodal data. A W. Wahlster (Ed.), SmartKom: Foundations of multimodal dialogue systems (p. 571–596). Springer. https://doi.org/10.1007/3-540-36678-4_35

Villaseñor, L., Massé, A. i Pineda, L. A. (2000). A multimodal dialogue contribution coding scheme. A D. Broeder, H. Cunningham, N. Ide, D. Roy, H. Thompson i P. Wittenburg (Ed.), Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources, Proceedings. Athens, Greece, 29- 30 May 2000 (p. 52–56). European Language Resources Association (ELRA); Max Planck Institute for Psycholinguistics. https://www.mpi.nl/ISLE/documents/papers/villasenor_paper.pdf

Wegener Knudsen, M., Bernsen, N. O., Dybkjær, L., Hansen, T. K., Mapelli, V., Martin, J.-C., Paulsson, N., Pelachaud, C. i Wittenburg, P. (2003, febrer). Guidelines for the creation of NIMM data resources (Deliverable D8.2, Final report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2003g/D8.2-17.2.2003-F.pdf

Wegener Knudsen, M., Martin, J.-C., Dybkjær, L., Machuca, M. J., Bernsen, N. O., Carletta, J., Heid, U., Sotaro, K., Llisterri, J., Pelachaud, C., Poggi, I., Reithinger, N., van Elswijk, G. i Wittenburg, P. (2002). Survey of multimodal annotation schemes and best practice (Deliverable D9.1 Final Report). IST-1999-10647 ISLE, International Standards for Language Engineering, Natural Interactivity and Multimodality Working Group. https://joaquimllisterri.cat/publicacions/Wegener_Knudsen_ et_al_02_Survey_multimodal_annotation_schemes.pdf

Eines per a la transcripció i l’etiquetatge de corpus orals

Eines per a la transcripció i l’etiquetatge de corpus orals: treballs generals

Cosi, P. (2002). Metodologie e sistemi per l’annotazione linguistica. Quaderni dell’Istituto di Fonetica e Dialettologia, 4. https://www.academia.edu/19050378/METODOLOGIE_E_SISTEMI_PER_L_ANNOTAZIONE_LINGUISTICA

Dellwo, V. (2003). Tools for a combined analysis of speech and gestures. A M. J. Solé, D. Recasens i J. Romero (Ed.), 15th International Congress of Phonetic Sciences. Barcelona, Spain, August 3-9, 2003 (p. 351–354). ICPhS Archive. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_0351.html

Dybkjær, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N. i Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data (Deliverable D11.1, Final Report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

Garg, S., Martinovski, B., Robinson, S., Stephan, J., Tetreault, J. i Traum, D. R. (2004). Evaluation of transcription and annotation tools for a multi-modal, multi-party dialogue corpus. LREC 2004. 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal, 26-28 May 2004 (p. 2163–2166). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/summaries/758.htm

Jacobson, M. (2002). Les outils modernes pour la transcription de corpus de parole. Revue PArole, 22–23–24, 213–230.

Jacobson, M. (2004). Gestion de corpus oraux annotés : méthodes et outils. JEP 2004. 25èmes Journées d’Études sur la Parole. Fès, Maroc, 19-22 avril 2004. http://www.afcp-parole.org//doc/Archives_JEP/2004_XXVe_JEP_Fes/actes/jep2004/Jacobson.pdf

Rohlfing, K., Loehr, D., Duncan, S., Brown, A., Franklin, A., Kimbara, I., Milde, J.-T., Parrill, F., Rose, T., Schmidt, T., Sloetjes, H., Thies, A. i Wellinghoff, S. (2006). Comparison of multimodal annotation tools – workshop report. Discourse and Conversation Analysis, 7, 99–123. http://www.gespraechsforschung-online.de/fileadmin/dateien/heft2006/tb-rohlfing.pdf

Eines per a la transcripció i l’etiquetatge de corpus orals: treballs específics

Praat

Allwood, J., Grönqvist, L., Ahlsén, E. i Gunnarsson, M. (2003). Annotation and tools for an activity based spoken language corpus. A J. van Kuppevelt i R. W. Smith (Ed.), Current and new directions in discourse and dialogue (p. 1–18). Kluwer. https://doi.org/10.1007/978-94-010-0019-2_1

Bailly, G., Barbe, T. i Wang, H. (1992). Automatic labelling of large prosodic databases: Tools, methodology and links with a text-to-speech system. A G. Bailly i C. Benoît (Ed.), Talking machines: Theories, models and designs (p. 323–333). Elsevier.

Barras, C., Geoffrois, E., Wu, Z. i Liberman, M. (1998). Transcriber: A free tool for segmenting, labeling and transcribing speech. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (p. 1373–137). European Language Resources Association (ELRA). http://transag.sourceforge.net/publications/Transcriber-LREC_1998.pdf

Barras, C., Geoffrois, E., Wu, Z. i Liberman, M. (2001). Transcriber: Development and use of a tool for assisting speech corpora production. Speech Communication, 33(1–2), 5–22. https://doi.org/10.1016/s0167-6393(00)00067-4

Bernsen, N. O., Dybkjær, L. i Kolodnytsky, M. (2003). An interface for annotating natural interactivity. A J. van Kuppevelt i R. W. Smith (Ed.), Current and new directions in discourse and dialogue (p. 35–62). Kluwer. https://doi.org/10.1007/978-94-010-0019-2_3

Bernsen, N. O., Dybkjær, L. i Kolodnytsky, M. (2002). The NITE workbench: A tool for annotation of natural interactivity and multimodal data. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 43–49). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/214.htm

Bigi, B. (2015). SPPAS – Multi-lingual approaches to the automatic annotation of speech. The Phonetician, 111–112, 54–69. http://www.isphs.org/Phonetician/Phonetician_111-112.pdf#page=54

Blache, P. i Hirst, D. (2000). Multi-level annotation for spoken language corpora. Proceedings, Sixth International Conference on Spoken Language Processing (ICSLP 2000). Beijing, China, October 16-20, 2000 (p. 481–484). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_2000/i00_1481.html

Boëffard, O., Cherbonnel, B., Emerard, F. i White, S. (1993). Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 2, p. 1449–1452). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_1449.html

Cassidy, S. i Harrington, J. (2001). Multi-level annotation in the Emu speech database management system. Speech Communication, 33(1–2), 61–77. https://doi.org/10.1016/s0167-6393(00)00069-8

Chan, D. S. F. i Fourcin, A. (1993). Automatic annotation using multi-sensor data. Third European Conference on Speech Communication and Technology (EUROSPEECH–93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 187–190). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0187.html

Cosi, P. (1993). SLAM: Segmentation and labelling automatic module. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 665–668). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0665.html

Cosi, P. (1995). SLAM: A PC-based multilevel-segmentation tool. A A. J. Rubio Ayuso i J. M. López Soler (Ed.), Spech recognition and coding: New advances and trends (p. 124–127). Springer. https://doi.org/10.1007/978-3-642-57745-1_22

Currie Hall, K., Mackie, J. S. i Lo, R. Y.-H. (2019). Phonological CorpusTools. International Journal of Corpus Linguistics, 24(4), 522–535. https://doi.org/10.1075/ijcl.18009.hal

Dours, C., de Calmès, M., Kabré, H., Pecarte, J. M., Pérennou, G. i Vigouroux, N. (1989). A multi-level automatic segmentation system: SAPHO and VERIPHONE. First European Conference on Speech Communication and Technology (EUROSPEECH ’89). Paris, France, September 27-29, 1989 (Vol. 2, p. 83–86). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1989/e89_2083.html

Dybkjær, L., Berman, S., Bernsen, N. O., Carletta, J., Heid, U. i Llisterri, J. (2001). Requirements specification for a tool in support of annotation of natural interaction and multimodal data (Deliverable D11.2, Final report). IST-1999-10647 ISLE, International Standards for Language Engineering, Natural Interactivity and Multimodality Working Group. https://joaquimllisterri.cat/publicacions/Dybkjaer_et_al_01_annotation_multimodality.pdf

Dybkjær, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N. i Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data (Deliverable D11.1, Final Report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

Dybkjær, L. i Bernsen, N. O. (2000). The MATE workbench. ISLE/EAGLES Workshop «Meta-descriptions and annotation schemes for multimodal/multimedia language resources and data architectures and software support for large corpora». Athens, Greece, 29-30 May, 2000. European Language Resources Association (ELRA). https://www.mpi.nl/ISLE/documents/papers/dybkjaer_paper.pdf

Estève, Y., Deléglise, P. i Jacob, B. (2004). Système de transcription automatique de la parole et logiciels libres. Traitement automatique des langues, 45(2), 15–39. https://tal.revuesonline.com/article.jsp?articleId=5767

Eychenne, J. i Courdès-Murphy, L. (2019). Phonometrica: An open platform for the analysis of speech corpora. Proceedings of the Seoul International Conference on Speech Sciences 2019 (SICSS 2019). Seoul National University, Seoul, Korea, 15-16 November 2019 (p. 107–108). https://sicss2019.files.wordpress.com/2019/11/sicss2019proceedings.pdf

Garrido, J. M. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. Speech Prosody 2010, Fifth International Conference. Chicago, IL, USA, May 10-14, 2010 (p. 1–4). ISCA Archive. https://www.isca-speech.org/archive_v0/sp2010/sp10_041.html

Goldman, J.-P. (2011). EasyAlign: An automatic phonetic alignment tool under Praat. A P. Cosi, R. de Mori, G. di Fabbrizio i R. Pieraccini (Ed.), INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy, August 27-31, 2011 (p. 3233–3236). ISCA Archive. https://www.isca-speech.org/archive_v0/interspeech_2011/i11_3233.html

Goldman, J.-P. i Schwab, S. (2014). EasyAlign Spanish: An (semi-)automatic segmentation tool under Praat. A Y. Congosto, M. L. Montero i A. Salvador (Ed.), Fonética experimental, educación superior e investigación (Vol. 1, p. 629–640). Arco/Libros. http://latlcui.unige.ch/phonetique/easyalign/GoldmanSchwab-EasyAlignSpanish-5thCFE-2011.pdf

Gonzalez, S., Grama, J. i Travis, C. E. (2020). Comparing the performance of forced aligners used in sociophonetic research. Linguistics Vanguard, 6(1), Article 20190058. https://doi.org/10.1515/lingvan-2019-0058

Hernáez, I., Barandiarán, J., Monte, E. i Etxebarria, B. (1993). A segmentation algorithm based on acoustical features using a self organizing neural network. Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 661–663). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0661.html

Jeong, C.-G. i Jeong, H. (1996). Automatic phone segmentation and labeling of continuous speech. Speech Communication, 20(3–4), 291–311. https://doi.org/10.1016/s0167-6393(96)00064-7

Kabré, H., Pérennou, G. i Vigouroux, N. (1991). Automatic labelling of speech signal into phonetic events. Actes du XIIème Congrès International de Sciences Phonétiques / Proceedings of the XIIth International Congress of Phonetic Sciences. 19-24 août 1991. Aix-en-Provence, France / August 19-24, 1991. Aix-en-Provence, France (Vol. 5, p. 450–453). Université de Provence, Service des Publications.

Kipp, M. (2001). Anvil: A generic annotation tool for multilingual dialogue. A P. Dalsgaard, B. Lindberg, H. Benner i T. Zheng-Hua (Ed.), EUROSPEECH 2001 Scandinavia, 7th European Conference on Speech Communication and Technology, 2nd INTERSPEECH Event. Aalborg, Denmark, September 3-7, 2001 (p. 1367–1370). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_2001/e01_1367.html

Kipp, M. (2014). ANVIL: The video annotation research tool. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 420–436). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.024

Martin, P. (2004). WinPitch corpus, a text to speech alignment tool for multimodal corpora. LREC 2004. 4th International Conference on Language Resources and Evaluation. Lisbon, Portugal, 26-28 May 2004 (p. 537–540). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/summaries/780.htm

McKelvie, D., Isard, A., Mengel, A., Møller, M. B., Grosse, M. i Klein, M. (2001). The MATE workbench – An annotation tool for XML coded speech corpora. Speech Communication, 33(1–2), 97–112. https://doi.org/10.1016/s0167-6393(00)00071-6

Mertens, P. (2004). The Prosogram: Semi-automatic transcription of prosody based on a tonal perception model. A B. Bel i I. Marlien (Ed.), Speech Prosody 2004, International Conference. Nara, Japan, March 23-26, 2004 (p. 549–552). ISCA Archive. https://www.isca-speech.org/archive_v0/sp2004/sp04_549.html

Mertens, P. (2004). Un outil pour la transcription de la prosodie dans les corpus oraux. Traitement automatique des langues, 45(2), 109–130. https://tal.revuesonline.com/article.jsp?articleId=5771

Milde, J.-T. i Gut, U. (2002). The TASX-environment: An XML-based toolset for time aligned speech corpora. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 1922–1927). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/293.htm

Moreno, A., Armas, P., Mariño, J. B. i Masgrau, E. (1989). Automatic segmentation of Spanish speech into syllables. First European Conference on Speech Communication and Technology (EUROSPEECH ’89). Paris, France, September 27-29, 1989 (Vol. 2, p. 75–78). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1989/e89_2075.html

Nogueiras, A. i Moreno, A. (1998). NaniBD: A set of tools for transcribing and validating speech databases. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 2, p. 1359–1365). European Language Resources Association (ELRA). http://hdl.handle.net/2117/22969

Rose, Y. i MacWhinney, B. (2014). The PhonBank project: Data and software-assisted methods for the study of phonology and phonological development. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 308–401). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.023

Rose, Y., MacWhinney, B., Byrne, R., Hedlund, G., Maddocks, K., O’Brien, P. i Wareham, T. (2006). Introducing Phon: A software solution for the study of phonological acquisition. A D. Bamman, T. Magnitskaia i C. Zaller (Ed.), Proceedings of the 30 Annual Boston University Conference on Language Development (p. 489–500). Cascadilla Press. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4769870/

Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. Proceedings of the LREC Workshop on XML based richly annotated corpora Lisbon, Portugal, 29 May 2004 (p. 69–74). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws15.pdf

Sjölander, K. i Beskow, J. (2000). WaveSurfer: An open source speech tool. Proceedings, Sixth International Conference on Spoken Language Processing (ICSLP 2000). Beijing, China, October 16-20, 2000 (Vol. 4, p. 464–467). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_2000/i00_4464.html

Tamburini, F. i Caini, C. (2004). Automatic annotation of speech corpora for prosodic prominence. A N. Oostdijk, G. Kristoffersen i G. Sampson (Ed.), Compiling and Processing Spoken Language Corpora. Lisboa, Portugal, 24 May 2004 (p. 53–58). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws1.pdf

Tamburini, F. i Caini, C. (2005). An automatic system for detecting prosodic prominence in American English continuous speech. International Journal of Speech Technology, 8(1), 33–44. https://doi.org/10.1007/s10772-005-4760-z

Titeux, H., Riad, R., Cao, X.-N., Hamilakis, N., Madden, K., Cristia, A., Bachoud-Lévi, A.-C. i Dupoux, E. (2020). Seshat: A tool for managing and verifying annotation campaigns of audio data. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 6976–6982). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.861.pdf

Torre, D., Hernández Gómez, L. A. i Villarrubia, L. (2003). Automatic phonetic segmentation. IEEE Transactions on Speech and Audio Processing, 11(6), 617–625. https://doi.org/10.1109/TSA.2003.813579

Vorstermans, A., Martens, J.-P. i van Coile, B. (1996). Automatic segmentation and labelling of multi- lingual speech data. Speech Communication, 19(4), 271–293. https://doi.org/10.1016/s0167-6393(96)00037-4

Weisser, M. (2003). SPAACy – A semi-automated tool for annotating dialogue acts. International Journal of Corpus Linguistics, 8(1), 63–74. https://doi.org/10.1075/ijcl.8.1.03wei

Eines per a l’anàlisi i la transcripció de la parla

arrow_up

Representació fonètica de corpus orals

✓ = Lectures recomanades: nivell introductori

✓ Delais-Roussarie, E. i Post, B. (2014). Corpus annotation: Methodology and transcription systems. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 46–88). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.002

✓ Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada, (Volumen monográfico «Panorama de la investigación en lingüística informática»), 53–82. https://dialnet.unirioja.es/servlet/articulo?codigo=227025

Llisterri, J., Machuca, M. J., de la Mota, C., Riera, M. i Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del discurso oral, 8, 289–325. https://joaquimllisterri.cat/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Transcripció fonètica

Transcripció fonètica automàtica

Representació fonètica de corpus orals: nivell segmental

Allen, G. D. (1988). The PHONASCII system. Journal of the International Phonetic Association, 18(1), 9–25. https://doi.org/10.1017/s0025100300003509

Center for Spoken Language Understanding. (1995). IPA, Worldbet, and OGIbet English broad phonetic labels. Oregon Graduate Institute of Science; Technology. https://dipaola.org/stanford/facade/lipsync/refbet.pdf

Cuétara, J. O. (2004). Fonética de la ciudad de México: aportaciones desde las tecnologías del habla [Tesi de màster, Universidad Nacional Autónoma de México]. Universidad Nacional Autónoma de México, Departamento de Ciencias de la Computación. http://turing.iimas.unam.mx/~luis/DIME/publicaciones/tesis/Cuetara_Tesis_MLH-UNAM.pdf

Esling, J. H. (1988). 7.1 Computer coding of IPA symbols and 7.3 Detailed phonetic representation of computer data bases. Journal of the International Phonetic Association, 18(2), 99–106. https://doi.org/10.1017/s0025100300003704

Esling, J. H. (1990). Computer coding of the IPA: Supplementary Report. Journal of the International Phonetic Association, 20(1), 22–26. https://doi.org/10.1017/s0025100300004011

Esling, J. H. i Gaylord, H. (1993). Computer codes for phonetic symbols. Journal of the International Phonetic Association, 23(2), 83–97. https://doi.org/10.1017/s0025100300004898

Gaylord, H. (1995). Character representation. Computers and the Humanities, 29(1), 51–73. https://www.jstor.org/stable/30200343

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). Character codes and computer readable alphabets. A Spoken language reference materials (p. 30–59). Mouton de Gruyter. https://doi.org/10.1515/9783110804508-007

Gibbon, D., Moore, R. i Winski, R. (Ed.). (1998). SAMPA computer readable phonetic alphabet. A Spoken language reference materials (p. 60–107). Mouton de Gruyter. https://doi.org/10.1515/9783110804508-008

Gurlekian, J. A., Colantoni, L. i Torres, H. M. (2001). El alfabeto fonético SAMPA y el diseño de corpora fonéticamente balanceados. Fonoaudiológica, 47(3), 58–69.

Hieronymus, J. L. (1994). ASCI phonetic symbols for the world’s languages: Worldbet. AT&T Bell Laboratories. https://dipaola.org/stanford/facade/lipsync/worldbet.pdf

Llisterri, J. i Mariño, J. B. (1993). Spanish adaptation of SAMPA and automatic phonetic transcription (Technical Report SAM-A/UPC/001/V1). ESPRIT-6819 SAM-A, Speech Technology Assessment in Multilingual Applications. https://joaquimllisterri.cat/publicacions/SAMPA_Spanish_93.pdf

Losada, R. M. (2004). Unha adaptación do SAMPA para a lingua galega. A R. Álvarez Blanco, F. Fernández Rei i A. Santamarina (Ed.), A lingua galega: historia e actualidade. Actas do I Congreso Internacional. 16-20 de setembro de 1996, Santiago de Compostela (p. 615–625). Consello da Cultura Galega; Instituto da Lingua Galega. http://consellodacultura.gal/publicacion.php?id=243

Moreno, A. i Mariño, J. B. (1998). Spanish dialects: Phonetic transcription. The 5th International Conference on Spoken Language Processing, incorporating the 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998 (Paper 0598). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1998/i98_0598.html

Pineda, L. A., Castellanos, H., Cuétara, J. O., Galescu, L., Juárez, J., Llisterri, J., Pérez, P. i Villaseñor, L. (2010). The Corpus DIMEx100: Transcription and evaluation. Language Resources and Evaluation, 44(4), 347–370. https://doi.org/10.1007/s10579-009-9109-9

Schmidt, M., Fitt, S., Scott, C. i Jack, M. A. (1993). Phonetic transcription standards for European names (ONOMASTICA). Third European Conference on Speech Communication and Technology (EUROSPEECH’93). Berlin, Germany, September 22-25, 1993 (Vol. 1, p. 279–282). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1993/e93_0279.html

The IPA 1989 Kiel Convention Workgroup 9 report: Computer coding of IPA symbols and computer representation of individual languages. (1989). Journal of the International Phonetic Association, 19(2), 81–82. https://doi.org/10.1017/s002510030000387x

Uraga, E. i Pineda, L. A. (2000). A set of phonological rules for Mexican Spanish. A A. Gelbukh (Ed.), International Conference CICLing-2000. Conference on Intelligent text processing and Computational Linguistics. Proceedings. February 13 to 19, 2000, Mexico City, Mexico. Instituto Politécnico Nacional, Centro de Investigación en Computación. https://www.cicling.org/2000/book/Pineda-3.pdf

Uraga, E. i Pineda, L. A. (2002). Automatic generation of pronunciation lexicons for Spanish. A A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing: Third International Conference, CICLing 2002, Mexico City, Mexico, February 17–23, 2002 Proceedings (p. 330–338). Springer. https://doi.org/10.1007/3-540-45715-1_34

Wells, J. C. (1986). A standardized machine-readable phonetic notation. International Conference on Speech Input/Output: Techniques and applications (p. 134–137). The Institution of Electrical Engineers.

Wells, J. C. (1987). Computer-coded phonetic transcription. Journal of the International Phonetic Association, 17(2), 94–114. https://doi.org/10.1017/s0025100300003303

Wells, J. C. (1989). Computer-coded phonemic notation of individual languages of the European Community. Journal of the International Phonetic Association, 19(1), 31–54. https://doi.org/10.1017/s0025100300005892

Wells, J. C. (1994). Computer-coding the IPA: A proposed extension of SAMPA. Speech, Hearing and Language: Work in Progress, 8, 271–289. https://www.phon.ucl.ac.uk/home/sampa/ipasam-x.pdf

Wells, J. C. (1996, 18 de març). SAMPA for Spanish. University College London. https://www.phon.ucl.ac.uk/home/sampa/spanish.htm

Wells, J. C. (1999–2015). SAMPA Computer Readable Phonetic Alphabet. University College London. https://www.phon.ucl.ac.uk/home/sampa/index.html

Wells, J. C. (2000, 3 de maig). Computer-coding the IPA: A proposed extension of SAMPA. University College London. https://www.phon.ucl.ac.uk/home/sampa/x-sampa.htm

Wells, J. C. (2003). Phonetic symbols in word processing and on the web. A M. J. Solé, D. Recasens i J. Romero (Ed.), 15th International Congress of Phonetic Sciences. Barcelona, Spain, August 3-9, 2003 (p. 3105–3108). ICPhS Archive. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_3105.html

Representació fonètica de corpus orals: nivell suprasegmental

Arvaniti, A. i Baltazani, M. (2000). Greek ToBI: A system for the annotation of Greek speech corpora. LREC 2000. 2nd International Conference on Language Resources and Evaluation. Athens, Greece, 31 May - 2 June 2000. European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2000/html/summary/7.htm

Batliner, A., Kompe, R., Kießling, A., Mast, M., Niemann, H. i Nöth, E. (1998). M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases. Speech Communication, 25(4), 193–222. https://doi.org/10.1016/s0167-6393(98)00037-5

Dybkjær, L., Bernsen, N. O., Wegener Knudsen, M., Llisterri, J., Machuca, M. J., Martin, J.-C., Pelachaud, C., Riera, M. i Wittenburg, P. (2003). Guidelines for the creation of NIMM annotation schemes (Deliverable D9.2 Final Report). IST-1999-10647 ISLE, International Standards for Language Engineering, ISLE Natural Interactivity and Multimodality Working Group. https://joaquimllisterri.cat/publicacions/Dybkjaer_et_al_03_Guidelines_NIMM_annotation_schemes.pdf

Estruch, M., Garrido, J. M., Llisterri, J. i Riera, M. (1996–1997). Una aproximación fonética al estudio de la entonación. Philologia Hispalensis, 11(1), 281–293. https://doi.org/10.12795/PH.19961997.v11.i01.19

Estruch, M., Garrido, J. M., Llisterri, J. i Riera, M. (2007). Técnicas y procedimientos para la representación de las curvas melódicas. Revista de Lingüística Teórica y Aplicada, 45(2), 59–87. https://doi.org/10.4067/S0718-48832007000200007

Fernández Rei, E. i Escourido, A. B. (2008). Problemas metodológicos en la adquisición de datos prosódicos a partir de corpora. A A. Pamies, M. C. Amorós i J. M. Pazos (Ed.), Language Design. Journal of Theoretical and Experimental Linguistics. Special Issue 2: Experimental Prosody (p. 249–258). Método Ediciones. http://elies.rediris.es/Language_Design/LD-SI-2/28-FernandezRei-Escourido_dobleOK_.pdf

Garrido, J. M. (2010). A tool for automatic F0 stylisation, annotation and modelling of large corpora. Speech Prosody 2010, Fifth International Conference. Chicago, IL, USA, May 10-14, 2010 (p. 1–4). ISCA Archive. https://www.isca-speech.org/archive_v0/sp2010/sp10_041.html

Garrido, J. M. (2018). Using large corpora and computational tools to describe prosody: An exciting challenge for the future with some (important) pending problems to solve. A I. Feldhausen, J. Fliessbach i M. del M. Vanrell (Ed.), Methods in prosody: A Romance language perspective (p. 3–43). Language Science Press. https://doi.org/10.5281/zenodo.1441335

Knowles, G. (1991). Prosodic labelling: The problem of tone group boundaries. A S. Johansson i A.-B. Stenström (Ed.), English computer corpora: Selected papers and research guide (p. 149–164). de Gruyter Mouton. https://doi.org/10.1515/9783110865967.149

Knowles, G. (1996). The value of prosodic transcriptions. A G. Knowles, A. Wichmann i P. Alderson (Ed.), Working with speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus (p. 87–106). Longman.

Knowles, G. i Lawrence, L. (1987). Automatic intonation assignment. A R. Garside, G. Leech i G. Sampson (Ed.), The computational analysis of English: A corpus-based approach (p. 139–148). Longman.

Llisterri, J. (1994). Prosody encoding survey. WP 1 Specifications and Standards, T1.5. Markup Specifications (Deliverable 1.5.3, Final version). LRE-62050 MULTEXT, Multilingual Text Tools and Corpora. https://joaquimllisterri.cat/publicacions/Prosody_encoding_94.pdf

Llisterri, J. (1995, 13 d’agost). Spanish prosodic labelling [Comunicació]. Workshop on Prosodic Labelling, Stockholm, Sweden. https://joaquimllisterri.cat/publicacions/Spanish_Prosodic_Labelling.pdf

Mertens, P. (1991). Intonation. A C. Blanche-Benveniste, M. Bilger, C. Rouget i K. van den Eynde (Ed.), Le français parlé : études grammaticales (p. 159–176). Éditions du CNRS.

Pickering, B., Williams, B. i Knowles, G. (1996). Analysis of transcriber differences in the SEC. A G. Knowles, A. Wichmann i P. Alderson (Ed.), Working with speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus (p. 61–86). Longman.

Pitrelli, J. F., Beckman, M. E. i Hirschberg, J. (1994). Evaluation of prosodic transcription labeling reliability in the ToBI framework. Third International Conference on Spoken Language Processing (ICSLP 94). Yokohama, Japan, September 18-22, 1994 (Vol. 2, p. 123–126). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1994/i94_0123.html

Quazza, S. i Garrido, J. M. (1998). Supported coding schemes: 6. Prosody (MATE Deliverable D1.1). LE Telematics Project LE4 8370. https://joaquimllisterri.cat/publicacions/MATED1.1.6Prosody/D11_6_Prosody.html

Quazza, S. i Garrido, J. M. (2000, 8 de gener). MATE dialogue annotation guidelines: Prosody (MATE Deliverable D2.1). LE Telematics Project LE4 8370. http://www.andreasmengel.de/pubs/mdag.pdf

Ramírez Verdugo, M. D. (2003). A new approach to the analysis and annotation of speech and prosody based on computerized cross-linguistic corpora. Procesamiento del Lenguaje Natural, 31, 343–344. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3201/1692

Roach, P. i Arnfield, S. (1995). Linking prosodic transcription to the time dimension. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 149–160). Longman. https://doi.org/10.4324/9781315843162

Roseano, P. i Fernández Planas, A. M. (2013). Transcripció fonètica i fonològica de l’entonació: una proposta d’etiquetatge automàtic. Estudios de Fonética Experimental, 22, 275–334. https://raco.cat/index.php/EFE/article/view/275413

Silverman, K., Beckman, M. E., Pitrelli, J. F., Ostendorf, M., Wightman, C. W., Price, P., Pierrehumbert, J. B. i Hirschberg, J. (1992). ToBI: A standard for labelling English prosody. Second International Conference on Spoken Language Processing (ICSLP’92). Banff, Alberta, Canada, October 13-16, 1992 (p. 867–870). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1992/i92_0867.html

Wells, J. C. (1995, 19 de setembre). SAMPROSA (SAM prosodic transcription). University College London. https://www.phon.ucl.ac.uk/home/sampa/samprosa.htm

Williams, B. (1996). The formulation of an intonation transcription system for British English. A G. Knowles, A. Wichmann i P. Alderson (Ed.), Working with speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus (p. 38–58). Longman.

Elements suprasegmentals: transcripció

INTSINT, International Transcription System for Intonation

Astésano, C., Espesser, R., Hirst, D. i Llisterri, J. (1997). Stylisation automatique de la fréquence fondamentale : une évaluation multilingue. Actes du 4e Congrés Français d’Acoustique. Marseille, France. 14-18 avril, 1997 (Vol. 1, p. 441–443). Teknea. https://joaquimllisterri.cat/publicacions/Astesano_et_al_97.pdf

Baqué, L. i Estruch, M. (2003). Modelo de Aix-en-Provence. A P. Prieto (Ed.), Teorías de la entonación (p. 123–153). Ariel. https://sites.google.com/site/lorrainebaqueuab/publis/ModeloAix-en-ProvenceV3.pdf

Caelen-Haumont, G. i Auran, C. (2004). INTSMEL : un outil pour l’analyse des contours proéminents de F0. Bulletin PFC (Phonologie du Français Contemporain), 115–125. https://halshs.archives-ouvertes.fr/hal-00256394/

Campione, E., Flachaire, E., Hirst, D. i Véronis, J. (1997). Stylisation and symbolic coding of F0: A quantitative model. A A. Botinis (Ed.), Intonation: Theory, models, and applications. Athens, Greece, September 18-20, 1997 (p. 71–74). ISCA Archive. https://www.isca-speech.org/archive_open/int_97/inta_071.html

Campione, E., Flachaire, E., Hirst, D. i Véronis, J. (1998). Évaluation de modèles d’étiquetage automatique de l’intonation. Actes del XXIIèmes Journées d’Études sur la Parole. Martigny, Suisse, 15-19 Juin 1998 (p. 99–102). http://www.afcp-parole.org/doc/Archives_JEP/1998_XXIIe_JEP_Martigny/1998_XXIIe_JEP_Martigny.pdf

Campione, E., Hirst, D. i Véronis, J. (2000). Automatic stylisation and modelling of French and Italian intonation. A A. Botinis (Ed.), Intonation: Analysis, modelling and technology (p. 185–208). Kluwer. https://doi.org/10.1007/978-94-011-4317-2_9

Campione, E. i Véronis, J. (1998). A statistical study of pitch target points in five languages. The 5th International Conference on Spoken Language Processing, incorporating the 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998 (Paper 0845). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1998/i98_0845.html

Campione, E. i Véronis, J. (2000). Une évaluation de l’algorithme de stylisation mélodique MOMEL. Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA), 19, 27–44. https://hal.archives-ouvertes.fr/hal-00285557

Campione, E. i Véronis, J. (2001). Étiquetage prosodique semi-automatique des corpus oraux. TALN 2001. Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Tours, France, juillet 2001 (p. 122–131). https://aclanthology.org/2001.jeptalnrecital-long.10/

Campione, E. i Véronis, J. (2001). Semi-automatic tagging of intonation in French spoken corpora. A P. Rayson, A. Wilson, T. McEnery, A. Hardie i K. Shereen (Ed.), Proceedings of the Corpus Linguistics 2001 Conference. Lancaster University, UK, 29 March - 2 April 2001 (p. 90–99). Lancaster University, University Centre for Computer Corpus Research on Language. http://ucrel.lancs.ac.uk/publications/CL2003/CL2001%20conference/papers/campione.pdf

Di Cristo, A., Hirst, D., Boudouresques, N. i Louis, M. (2002). Écrire l’intonation : le système INTSIN, fondements théoriques et illustrations. Revue PArole, 22–23–24, 175–212.

Estruch, M. (2000). Évaluation de l’algorithme de stylisation mélodique MOMEL et du système de codage symbolique INTSINT avec un corpus de passages en catalan. Travaux Interdisciplinaires du Laboratoire Parole et Langage d’Aix-en-Provence (TIPA), 19, 45–61. https://hal.archives-ouvertes.fr/hal-00285558

Giordano, R. (2005). Analisi prosodica e trascrizione intonativa in INTSINT. A F. Albano Leoni i R. Giordano (Ed.), Italiano parlato: Analisi di un dialogo (con un cdrom contenente il materiale audio variamente elaborato e altri materiali) (p. 231–256). Liguori Editore.

Hirst, D. (1991). Intonation models: Towards a third generation. Actes du XIIème Congrès International de Sciences Phonétiques / Proceedings of the XIIth International Congress of Phonetic Sciences. 19-24 août 1991. Aix-en-Provence, France / August 19-24, 1991. Aix-en-Provence, France (Vol. 1, p. 305–310). Université de Provence, Service des Publications.

Hirst, D. (1999). The symbolic coding of segmental duration and tonal alignment: An extension to the INTSINT system. Proceedings, Sixth European Conference on Speech Communication and Technology (EUROSPEECH’99). Budapest, Hungary, September 5-9, 1999 (p. 1639–1642). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1999/e99_1639.html

Hirst, D. (2000). ProZed: A multilingual prosody editor for speech synthesis. IEE Seminar on State of the Art in Speech Synthesis. London, UK, 13 April 2020 (p. 4.1–4.7). IEE. https://doi.org/10.1049/ic:20000321

Hirst, D. (2002). Automatic analysis of prosody for multilingual speech corpora. A E. Keller, G. Bailly, A. I. C. Monaghan, J. Terken i M. Huckvale (Ed.), Improvements in speech synthesis: COST 258: The naturalness of synthetic speech (p. 320–327). John Wiley & Sons.

Hirst, D. (2005). Phonetic and phonological annotation of speech prosody. A R. Savy i C. Crocco (Ed.), «Analisi prosodica» Teorie, modelli e sistemi di annotazione: Atti del 2o Convegno Nazionale AISV 2005. Università degli Studi di Salerno, Campus di Fisciano, 30 Novembre - 2 Dicembre 2005 (p. 33–42). EDK Editore. https://www.aisv.it/PubblicazioniAISV/II_AISV/TR/Invited/TR_Hirst.pdf

Hirst, D. (2007). A Praat plugin for MOMEL and INTSINT with improved algorithms for modelling and coding intonation. A J. Trouvain i W. J. Barry (Ed.), 16th International Congress of Phonetic Sciences. Saarbrücken, Germany, 6-10 August, 2007 (p. 1233–1236). http://www.icphs2007.de/conference/Papers/1443/index.html

Hirst, D. (2011). The analysis by synthesis of speech melody: From data to models. Journal of Speech Sciences, 1(1), 55–83. https://doi.org/10.20396/joss.v1i1.15011

Hirst, D. i Di Cristo, A. (1998). A survey of intonation systems. A D. Hirst i A. Di Cristo (Ed.), Intonation systems: A survey of twenty languages (p. 1–44). Cambridge University Press.

Hirst, D., Di Cristo, A. i Espesser, R. (2000). Levels of representation and levels of analysis for the description of intonation systems. A M. Horne (Ed.), Prosody: Theory and experiment. Studies presented to Gösta Bruce (p. 51–88). Kluwer. https://doi.org/10.1007/978-94-015-9413-4_4

Hirst, D., Di Cristo, A., Le Besnerais, M., Najim, Z., Nicolas, P. i Roméas, P. (1993). Multi-lingual modelling of intonation patters. A D. House i P. Touati (Ed.), ESCA Workshop on Prosody. Lund, Sweden, September 27-29, 1993 (p. 204–207). ISCA Archive. https://www.isca-speech.org/archive_open/prosody_93/pro3_204.html

Hirst, D. i Espesser, R. (1993). Automatic modelling of fundamental frequency using a quadratic spline function. Travaux de l’Institut de Phonétique d’Aix, 15, 71–85. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.3623

Hirst, D., Ide, N. i Véronis, J. (1994). Coding fundamental frequency patterns for multi-lingual synthesis with INTSINT in the MULTEXT project. Second ISCA/IEEE Workshop on Speech Synthesis. Mohonk Mountain House, New Paltz, NY, USA, September 12-15, 1994 (p. 77–80). ISCA Archive. https://www.isca-speech.org/archive_open/ssw2/ssw2_077.html

Hirst, D., Nicolas, P. i Espesser, R. (1991). Coding the Fo of a continuous text in French: An experimental approach. Actes du XIIème Congrès International de Sciences Phonétiques / Proceedings of the XIIth International Congress of Phonetic Sciences. 19-24 août 1991. Aix-en-Provence, France / August 19-24, 1991. Aix-en-Provence, France (Vol. 5, p. 234–237). Université de Provence, Service des Publications.

Llisterri, J. (1996). Prosody tools efficiency and failures. WP 4 Corpus, T4.6 Speech markup and validation (Deliverable 4.5.2. Final Report). LRE-62050 MULTEXT, Multilingual Text Tools and Corpora. https://joaquimllisterri.cat/publicacions/Prosody_tools_96.pdf

Riera, M. (2001). Anàlisi acústica dels moviments tonals del grup accentual en català [Treball d’investigació de Tercer Cicle, Universitat Autònoma de Barcelona]. Universitat Autònoma de Barcelon, Departament de Filologia Espanyola, Grup de Fonètica. https://joaquimllisterri.cat/publicacions_GF/Riera2001.pdf

Véronis, J. i Campione, E. (1998). Towards a reversible symbolic coding of intonation. The 5th International Conference on Spoken Language Processing, incorporating the 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998 (Paper 0846). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1998/i98_0846.html

arrow_up

Transcripció, etiquetatge i codificació de corpus de llengua oral

Abou Haidar, L. (Ed.). (2002). Transcription de la parole normale et pathologique. Revue PArole, 22–23–24.

Albelda, M. (2005). Sistemas de transcripción de los corpus orales del español. A M. L. Carrió (Ed.), Perspectivas interdisciplinares de la lingüística aplicada (Vol. 2, p. 381–387). Asociación Española de Lingüística Aplicada (AESLA); Universitat Politècnica de València.

Allwood, J., Grönqvist, L., Ahlsén, E. i Gunnarsson, M. (2003). Annotation and tools for an activity based spoken language corpus. A J. van Kuppevelt i R. W. Smith (Ed.), Current and new directions in discourse and dialogue (p. 1–18). Kluwer. https://doi.org/10.1007/978-94-010-0019-2_1

Ávila, A. M. (1996). Problemas prácticos en la realización de corpus orales: la transliteración del corpus oral del proyecto de investigación de las variedades vernáculas malagueñas (VUM). A J. de D. Luque i A. Pamies (Ed.), Actas del Primer Simposio de Historiografía Lingüística (p. 103–112). Método Ediciones.

Ballester, A., Santamaría, C. i Marcos Marín, F. (1993). Transcription conventions used for the Corpus of Spoken Contemporary Spanish. Literary and Linguistic Computing, 8(4), 283–292. https://doi.org/10.1093/llc/8.4.283

Bilger, M., Blasco, M., Cappeau, P., Pallaud, B., Sabio, F. i Savelli, M.-J. (1997). Transcription de l’oral et interprétation : illustration de quelques difficultés. Recherches sur le français parlé, 14, 57–86. https://repository.ortolang.fr/api/content/recherches-francais-parle/v1/pdf/volume_14/57_14_RSFP.pdf

Bilger, M. (Coord.). (2008). Données orales : les enjeux de la transcription. Presses Universitaires de Perpignan.

Bladas, Ò. (2009). Manual de transcripció del discurs oral: materials de treball. Universitat de Barcelona, Departament de Filologia Catalana, Secció de Lingüística Catalana. http://hdl.handle.net/2445/106301

Blanche-Benveniste, C. (1997). Transcription et technologies. Recherches sur le français parlé, 14, 87–100. https://repository.ortolang.fr/api/content/recherches-francais-parle/v1/pdf/volume_14/87_14_RSFP.pdf

Blanche-Benveniste, C. (2002). Réflexions sur les transcriptions de corpus français parlé. Revue PArole, 22–23–24, 91–118.

Blanche-Benveniste, C. i Jeanjean, C. (1987). Le français parlé : transcription et édition. Didier.

Bloom, L. (1993). Transcription and coding for child language research: The parts are more than the whole. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 149–168). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Brazil, D. (1987). Representing pronunciation. A J. Sinclair (Ed.), Looking up: An account of the COBUILD project in lexical computing (p. 160–166). Collins.

Briz, A. i Grupo Val.Es.Co. (2002). La transcripción de la lengua hablada: el sistema del grupo Val.Es.Co. Español actual, 77, 57–85.

Briz, A. (Coord.). (1995). La transcripción. Signos y convenciones. A La conversación coloquial (Materiales para su estudio) (p. 39–48). Universitat de València, Departamento de Filología Española.

Briz, A. i Gómez Molina, J. R. (1992). Scheme of study of colloquial Spanish: Some methodological considerations. Lynx. Panorámica de estudios lingüísticos, 3, 111–124.

Cappeau, P. (1997). Données erronées : quelles erreurs commettent les transcripteurs? Recherches sur le français parlé, 14, 115–126. https://repository.ortolang.fr/api/content/recherches-francais-parle/v1/pdf/volume_14/115_14_RSFP.pdf

Cerdán, L. i Llobera, M. (1997). Actuación de los profesores en el aula: desarrollo de un modelo semiótico de transcripción. Revista Española de Lingüística Aplicada (RESLA), 12, 115–140. https://dialnet.unirioja.es/servlet/articulo?codigo=870447

Christodoulides, G. i Avanzi, M. (2015). Automatic detection and annotation of disfluencies in spoken French corpora. INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association. Dresden, Germany, September 6-10, 2015 (p. 1849–1853). ISCA Archive. https://www.isca-speech.org/archive_v0/interspeech_2015/i15_1849.html

Cook, G. (1995). Theoretical issues: Transcribing the untranscribable. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 35–53). Longman. https://doi.org/10.4324/9781315843162

Crowdy, S. (1994). Spoken corpus transcription. Literary and Linguistic Computing, 9(1), 25–28. https://doi.org/10.1093/llc/9.1.25

Crowdy, S. (1995). The BNC spoken corpus. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 224–234). Longman. https://doi.org/10.4324/9781315843162

Chafe, W. (1993). Prosodic and functional units of language. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 33–44). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Chafe, W. (1995). Adequacy, user-friendliness, and practicality in transcribing. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 54–61). Longman. https://doi.org/10.4324/9781315843162

Creer, S. i Thompson, P. (2004). Processing spoken language data: The BASE experience. A N. Oostdijk, G. Kristoffersen i G. Sampson (Ed.), Compiling and Processing Spoken Language Corpora. Lisboa, Portugal, 24 May 2004 (p. 20–27). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws1.pdf

Draxler, C., van den Heuvel, H., van Hessen, A., Calamai, S., Corti, L. i Scagliola, S. (2020). A CLARIN transcription portal for interview data. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 3353–3359). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.411.pdf

Du Bois, J. W. (1991). Transcription design principles for spoken discourse research. Pragmatics, 1(1), 71–106. https://doi.org/10.1075/prag.1.1.04boi

Du Bois, J. W. i Schuetze-Coburn, S. (1993). Representing hierarchy: Constituent structure for discourse databases. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 221–262). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Du Bois, J. W., Schuetze-Coburn, S., Cumming, S. i Paolino, D. (1993). Outline of discourse transcription. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 45–90). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Dybkjær, L., Berman, S., Kipp, M., Wegener Olsen, M., Pirrelli, V., Reithinger, N. i Soria, C. (2001). Survey of existing tools, standards and user needs for annotation of natural interaction and multimodal data (Deliverable D11.1, Final Report). ISLE Natural Interactivity and Multimodality Working Group. http://spokendialogue.dk/Publications/2001f/D11.1-14.2.2001-F.pdf

Dybkjær, L. i Bernsen, N. O. (2002). Natural interactivity resources: Data, annotation schemes and tools. LREC 2002. Third International Conference on Language Resources and Evaluation. Las Palmas, Canary Islands, Spain, 29-31 May, 2002 (p. 349–356). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2002/sumarios/213.htm

Dybkjær, L. i Bernsen, N. O. (2000). The MATE markup framework. A L. Dybkjær, K. Hasida i D. Tram (Ed.), Proceedings of the 1st SIGdial workshop on Discourse and Dialogue. Hong Kong, October 7-8, 2000 (p. 19–28). Association for Computational Linguistics. https://doi.org/10.3115/1117736.1117739

Edwards, J. A. (1992). Design principles in the transcription of spoken discourse. A J. Svartvik (Ed.), Directions in corpus linguistics: Proceedings of Nobel Symposium 82. Stockholm, 4-8 August 1991 (p. 129–147). de Gruyter Mouton. https://doi.org/10.1515/9783110867275.129

Edwards, J. A. (1993). Principles and contrasting systems of discourse transcription. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 3–23). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Edwards, J. A. (1995). Principles and alternative systems in the transcription, coding and mark-up of spoken discourse. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, markup and applications (p. 19–34). Longman. https://doi.org/10.4324/9781315843162

Ehlich, K. (1993). HIAT: A transcription system for discourse data. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 123–148). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Fink, G. A., Johanntokrax, M. i Schaffranietz, B. (1995). A flexible formal language for the orthographic transcription of spontaneous spoken dialogues. Fourth European Conference on Speech Communication and Technology (EUROSPEECH’95). Madrid, Spain, September 18-21, 1995 (Vol. 1, p. 871–874). ISCA Archive. https://www.isca-speech.org/archive/eurospeech_1995/index.html

Freitas, T. (2008). Recolha e transcrição de corpora orais. A E. Fernández Rei i X. L. Regueira (Ed.), Perspectivas sobre a oralidade (p. 297–324). Consello da Cultura Galega; Instituto da Lingua Galega. http://consellodacultura.gal/publicacion.php?id=10

Gallardo Paúls, B. (2004). La transcripción del lenguaje afásico. A B. Gallardo Paúls i M. Veyrat (Ed.), Estudios de lingüística clínica II: lingüística y patología (p. 83–115). Universitat de València. https://roderic.uv.es/handle/10550/30824

Garrard, P., Haigh, A.-M. i de Jager, C. (2011). Techniques for transcribers: Assessing and improving consistency in transcripts of spoken language. Literary and Linguistic Computing, 26(4), 389–405. https://doi.org/10.1093/llc/fqr018

Gibbon, D., Mertins, I. i Moore, R. K. (Ed.). (2000). Representation and annotation of dialogue. A Handbook of multimodal and spoken dialogue systems: Resources, terminology and product evaluation (p. 1–101). Springer. https://doi.org/10.1007/978-1-4615-4501-9_1

González Ledesma, A., de la Madrid, G., Alcántara, M., de la Torre, R. i Moreno Sandoval, A. (2004). Orality and difficulties in the transcription of spoken corpora. A N. Oostdijk, G. Kristoffersen i G. Sampson (Ed.), Compiling and Processing Spoken Language Corpora. Lisboa, Portugal, 24 May 2004 (p. 12–19). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws1.pdf

Gumperz, J. J. i Berenz, N. (1993). Transcribing conversational exchanges. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 91–122). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Hennoste, T., Koit, M., Rääbis, A. i Valdisoo, M. (2004). Developing a dialogue act coding scheme: An experience of annotating the Estonian Dialogue Corpus. A N. Oostdijk, G. Kristoffersen i G. Sampson (Ed.), Compiling and Processing Spoken Language Corpora. Lisboa, Portugal, 24 May 2004 (p. 40–47). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2004/ws/ws1.pdf

Johansson, S. (1995). The approach of the Text Encoding Initiative to the encoding of spoken discourse. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 82–98). Longman. https://doi.org/10.4324/9781315843162

Johansson, S. (1995). The encoding of spoken texts. A N. Ide i J. Véronis (Ed.), Text Encoding Initiative: Background and context (p. 149–158). Kluwer. https://doi.org/10.1007/978-94-011-0325-1_12

Lampert, M. D. i Ervin-Tripp, S. M. (1993). Structured coding for the study of language and social interacti- on. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 169–206). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Lebaupin, A. i Leroy, M. (2002). Transcription des indices segmentaux, suprasegmentaux et posturo-mimo-gestuels chez le jeune enfant. Revue PArole, 22–23–24, 231–244.

Leech, G., Weisser, M., Wilson, A. i Grice, M. (1998, 16 d’octubre). Survey and guidelines for the representation and annotation of dialogue (WP4-4). LE-EAGLES Integrated Resources Working Group. https://www.lancaster.ac.uk/fass/projects/eagles/delivera/wp4final.htm

Lindsay, J. i O’Connell, D. C. (1995). How do transcribers deal with audio recordings of spoken discourse? Journal of Psycholinguistic Research, 24(2), 101–115. https://doi.org/10.1007/bf02143958

Llisterri, J. (1996, maig). Preliminary recommendations on spoken texts (EAGLES Document EAG-TCWG-SPT/P). Expert Advisory Group on Language Engineering Standards (EAGLES). http://www.ilc.cnr.it/EAGLES96/spokentx/spokentx.html

Llisterri, J. (1999). Transcripción, etiquetado y codificación de corpus orales. Revista Española de Lingüística Aplicada, (Volumen monográfico «Panorama de la investigación en lingüística informática»), 53–82. https://dialnet.unirioja.es/servlet/articulo?codigo=227025

MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk: Vol. 1. Transcription format and programs (3a ed.). Psychology Press. https://doi.org/10.4324/9781315805672

I.– Transcription format: 1.– Introduction; 2.– Principles; 3.– CHAT outline; 4.– File headers; 5.– Words; 6.– Morphemes; 7.– Utterances; 8.– Scoped symbols; 9.– Dependent tiers; 10.– CA transcription; 11.– Signed language—BX; 12.– Extending chat; 13.– UNIBETs; 14.– Error coding; 15.– Speech act codes; 16.– Morphosyntactic coding; 17.– Word lists; 18.– Recording techniques; 19.– Symbol summary; II.– The programs: 1.– Introduction; 2.– Tutorial; 3.– The editor; 4.– Features; 5.– Analysis commands; 6.– Options; 7.– Exercises.

Manuel du transcripteur. (2005, febrer). Transcriber: A tool for segmenting, labeling and transcribing speech. http://trans.sourceforge.net/en/transguidFR.php

Maurer, B. (1999). Retour à Babel : les systèmes de transcription. A L.-J. Calvet i P. Dumont (Dir.), L’enquête sociolinguistique (p. 149–166). L’Harmattan.

Nelson, G. (1995). The International Corpus of English: Mark-up for spoken language. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application. Longman. https://doi.org/10.4324/9781315843162

Nelson, G. (1997). Standardizing wordforms in a spoken corpus. Literary and Linguistic Computing, 12(2), 79–85. https://doi.org/10.1093/llc/12.2.79

O’Connell, D. C. i Kowal, S. (1994). Some current transcription systems for spoken discourse: A critical analysis. Pragmatics, 4(1), 81–107. https://doi.org/10.1075/prag.4.1.04con

O’Connell, D. C. i Kowal, S. (1999). Transcription and the issue of standardization. Journal of Psycholinguistic Research, 28(2), 103–120. https://doi.org/10.1023/A:1023265024072

Ochs, E. (1979). Transcription as theory. A E. Ochs i B. B. Schieffelin (Ed.), Developmental pragmatics (p. 43–72). Academic Press.

Pallaud, B. (2002). Erreurs d’écoute dans la transcription de données orales. Revue PArole, 22–23–24, 267–294.

Payne, J. (1995). The COBUILD spoken corpus: Transcription conventions. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 203–207). Longman. https://doi.org/10.4324/9781315843162

Payrató, L. (1995). Transcripción del discurso coloquial. A L. Cortés (Ed.), El español coloquial: Actas del I simposio sobre análisis del discurso oral. Almería, 23-25 de noviembre de 1994 (p. 43–70). Servicio de Publicaciones de la Universidad de Almería.

Payrató, L. (1996). Transcripció del discurs coŀloquial. A L. Payrató, E. Boix, M.-R. Lloret i M. Lorente (Ed.), Corpus, corpora: Actes del 1r i 2n coŀloquis lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2) (p. 181–216). Promociones y Publicaciones Universitarias. http://hdl.handle.net/2445/111985

Peppé, S. (1995). The Survey of English Usage and the London-Lund Corpus: Computerizing manual prosodic transcription. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 187–202). Longman. https://doi.org/10.4324/9781315843162

Pino, M. (1998). Transcripción, codificación y almacenamiento de los textos orales del corpus CREA. Versión 2.0, Instituto de Lexicografía, Real Academia Española, 29/07/1997. A J. A. Samper, C. E. Hernández Cabrera i M. Troya (Ed.), Macrocorpus de la norma lingüística culta de las principales ciudades del mundo hispánico (MC-NLCH) [CD-ROM]. Universidad de las Palmas de Gran Canaria; Asociación de Lingüística y Filología de la América Latina.

Pino, M. i Sánchez, M. (1999). El subcorpus oral del banco de datos CREA-CORDE (Real Academia Española): procedimientos de transcripción y codificación. Oralia. Análisis del discurso oral, 2, 83–138.

Rehbein, I., Schalowski, S. i Wiese, H. (2014). Annotating spoken language. A Ş. Ruhi, M. Haugh, T. Schmidt i K. Wörner (Ed.), Best practices for spoken corpora in linguistic research (p. 75–94). Cambridge Scholars Publishing.

Romero, C., O’Connell, D. C. i Kowal, S. (2002). Notation systems for transcription: An empirical investigation. Journal of Psycholinguistic Research, 31(6), 619–631. https://doi.org/10.1023/A:1021217105211

Sanmartín, J. (2006). Datos conversacionales y su transcripción: el corpus Val.Es.Co y el corpus PerLA. A Y. Bürki i E. De Stefani (Ed.), Trascrivere la lingua / Transcribir la lengua: Dalla filologia all’analisi conversazionale / De la filología al análisis conversacional (p. 275–283). Peter Lang.

Senia, F. i van Velden, J. G. (1997, 16 de gener). Specification of orthographic transcription and lexicon conventions (Deliverable SD1.3.2, Final report). LRE-4001 SpeechDat. http://147.83.50.136/projects/BDG/docs.html

Sinclair, J. (1995). From theory to practice. A G. Leech, G. Myers i J. Thomas (Ed.), Spoken English on computer: Transcription, mark-up and application (p. 99–112). Longman. https://doi.org/10.4324/9781315843162

Slobin, D. (1993). Coding child language data for crosslinguistic analysis. A J. A. Edwards i M. D. Lampert (Ed.), Talking data: Transcription and coding in discourse research (p. 207–220). Lawrence Erlbaum. https://doi.org/10.4324/9781315807928

Steininger, S. (2000). Transliteration of language and labeling of emotion and gestures in SmartKom. A D. Broeder, H. Cunningham, N. Ide, D. Roy, H. Thompson i P. Wittenburg (Ed.), Meta-Descriptions and Annotation Schemes for Multimodal/Multimedia Language Resources, Proceedings. Athens, Greece, 29- 30 May 2000 (p. 49–51). European Language Resources Association (ELRA); Max Planck Institute for Psycholinguistics. https://www.mpi.nl/ISLE/documents/papers/Steininger_paper.pdf

Swann, J. (2010). Transcribing spoken interaction. A S. Hunston i D. Oakey (Ed.), Introducing applied linguistics: Concepts and skills (p. 163–176). Routledge. https://doi.org/10.4324/9780203875728

TEI Consortium (Ed.). (2021). 8. Transcriptions of speech. A TEI P5: Guidelines for Electronic Text Encoding and Interchange [Last updated on 9th April 2021] (Version 4.2.2). TEI Consortium. https://tei-c.org/release/doc/tei-p5-doc/en/html/TS.html

Villena Ponsoda, J. A. (1994). Pautas y procedimientos de representación del corpus oral de la Universidad de Málaga: informe preliminar. A M. Alvar Ezquerra i J. A. Villena Ponsoda (Ed.), Estudios para un corpus del español (p. 73–102). Universidad de Málaga.

Villena Ponsoda, J. A., Ávila, A. M., Sánchez Bohorques, J. M. i Lasarte, M. C. (2010). Problemas de anotación e intercambio en los corpus orales: estrategias para la transformación de textos etiquetados en documentos XML. El caso de los corpus PRESEEA. Oralia. Análisis del discurso oral, 13, 261–323.

Wray, A. i Bloomer, A. (2006). Transcribing speech orthographically. Projects in linguistics: A practical guide to researching language (2a ed., p. 185–195). Hodder Education.

arrow_up

Projectes sobre corpus orals i corpus de llengua oral

Projectes sobre corpus orals i corpus de llengua oral: català

Corpus orals en català per a estudis fonètics

Benet, A., Cortés, S. i Lleó, C. (2012). Phonoprosodic corpus of spoken Catalan (PhonCAT). A T. Schmidt i K. Wörner (Ed.), Multilingual corpora and multilingual corpus analysis (p. 215–230). John Benjamins. https://doi.org/10.1075/hsm.14.15ben

Garrido, J. M., Aguilar, L. i Escudero, D. (2011). GLISSANDO, un corpus de habla anotado para estudios prosódicos en catalán y español. A A. Hidalgo, Y. Congosto i M. Quilis Merín (Ed.), El estudio de la prosodia en España en el siglo XXI: perspectivas y ámbitos (p. 321–332). Universitat de València, Facultat de Filologia, Traducció i Comunicació.

Garrido, J. M., Escudero, D., Aguilar, L., Cardeñoso, V., Rodero, E., de la Mota, C., González, C., Vivaracho, C., Rustullet, S., Larrea, O., Laplaza, Y., Vizcaíno, F., Estebas, E., Cabrera, M. i Bonafonte, A. (2013). Glissando: A corpus for multidisciplinary prosodic studies in Spanish and Catalan. Language Resources and Evaluation, 47(4), 945–971. https://doi.org/10.1007/s10579-012-9213-0

Corpus orals en català per al desenvolupament de les tecnologies de la parla

Bonafonte, A., Adell, J., Esquerra, I., Gallego, S., Moreno, A. i Pérez, J. (2008). Corpus and voices for Catalan speech synthesis. A N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis i D. Tapias (Ed.), Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco, 28-30 May, 2008 (p. 3325–3329). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2008/summaries/835.html

Esquerra, I., Bonafonte, A. i Febrer, A. (1998). A bilingual Spanish-Catalan database of units for concatenative synthesis. Workshop on Language Resources for European Minority Languages. Granada, Spain, 27 May, 1998. http://ixa2.si.ehu.eus/saltmil/index.php/en/activities-mainmenu-73/saltmil-workshops-mainmenu-77/30.html

Esquerra, I., Nadeu, C., Villarrubia, L. i León, P. (1998). Design of a phonetic corpus for speech recognition in Catalan. Workshop on Language Resources for European Minority Languages. Granada, Spain, 27 May, 1998. http://ixa2.si.ehu.eus/saltmil/index.php/en/activities-mainmenu-73/saltmil-workshops-mainmenu-77/30.html

Moreno, A., Febrer, A. i Márquez, L. (2006). Generation of language resources for the development of speech tecnologies in Catalan. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy, 24-26 May, 2006 (p. 1632–1635). European Language Resources Association (ELRA). http://lrec-conf.org/proceedings/lrec2006/summaries/679.html

Recursos per les tecnologies de la parla. (s. d.). CoŀlectivaT. https://collectivat.cat/rap

Villarrubia, L., León, P., Hernández Gómez, L., Nadeu, C., Esquerra, I., Hernando, J., García Mateo, C. i Docío, L. (1998). VOCATEL and VOGATEL: Two telephone speech databases of Spanish minority languages (Catalan and Galician). Workshop on Language Resources for European Minority Languages. Granada, Spain, 27 May, 1998. http://ixa2.si.ehu.eus/saltmil/index.php/en/activities-mainmenu-73/saltmil-workshops-mainmenu-77/30.html

Corpus de llengua oral en català

Alturo, N., Bladas, Ò., Payà, M. i Payrató, L. (Ed.). (2004). Corpus oral de registres: materials de treball. Publicacions i Edicions de la Universitat de Barcelona.

Boix, E. (1996). Els materials de llengua oral dels corpus de català contemporani de la UB (CUB). A L. Payrató, E. Boix, M.-R. Lloret i M. Lorente (Ed.), Corpus, corpora: Actes del 1r i 2n coŀloquis lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2) (p. 93–114). Promociones y Publicaciones Universitarias. http://hdl.handle.net/2445/111985

Boix, E., Àlamo, M., Galindo, M. i Vila, F. X. (Ed.). (2007). Corpus de varietats socials: materials de treball. Publicacions i Edicions de la Universitat de Barcelona.

Grup d’Estudi de la Variació (GEV). (2002–2011). Corpus de Català Contemporani de la Universitat de Barcelona (CCCUB). Dipòsit Digital de la Universitat de Barcelona. http://diposit.ub.edu/dspace/handle/2445/10410

Payrató, L. i Alturo, N. (Ed.). (2002). Corpus oral de conversa coŀloquial: materials de treball. Publicacions i Edicions de la Universitat de Barcelona.

Viaplana, J., Lloret, M.-R., Perea, M.-P. i Clua, E. (2007). COD. Corpus Oral Dialectal. Promociones y Publicaciones Universitarias.

Vila, M., González, S., Martí, M. A., Llisterri, J. i Machuca, M. J. (2010). ClInt: A bilingual Spanish-Catalan spoken corpus of clinical interviews. Procesamiento del Lenguaje Natural, 45, 105-111. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/796

Projectes sobre corpus orals i corpus de llengua oral: espanyol

Corpus orals en espanyol per a estudis fonètics

Campione, E. i Véronis, J. (1998). A multilingual prosodic database. The 5th International Conference on Spoken Language Processing, incorporating the 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998 (Paper 0844). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1998/i98_0844.html

Cantero, F. J. (2016). Corpus de habla espontánea para el estudio de la entonación. A A. M. Fernández Planas (Ed.), 53 reflexiones sobre aspectos de la fonética y otros temas de lingüística (p. 151–160). Laboratori de Fonètica, Universitat de Barcelona. http://stel.ub.edu/labfon/amper/homenaje-eugenio-martinez-celdran/53reflexiones/22-FJCantero.pdf

Gabriel, C. (2012). The Hamburg Corpus of Argentinean Spanish (HaCASpa). A T. Schmidt i K. Wörner (Ed.), Multilingual corpora and multilingual corpus analysis (p. 183–198). John Benjamins. https://doi.org/10.1075/hsm.14.13gab

Garrido, J. M., Aguilar, L. i Escudero, D. (2011). GLISSANDO, un corpus de habla anotado para estudios prosódicos en catalán y español. A A. Hidalgo, Y. Congosto i M. Quilis Merín (Ed.), El estudio de la prosodia en España en el siglo XXI: perspectivas y ámbitos (p. 321–332). Universitat de València, Facultat de Filologia, Traducció i Comunicació.

Garrido, J. M., Escudero, D., Aguilar, L., Cardeñoso, V., Rodero, E., de la Mota, C., González, C., Vivaracho, C., Rustullet, S., Larrea, O., Laplaza, Y., Vizcaíno, F., Estebas, E., Cabrera, M. i Bonafonte, A. (2013). Glissando: A corpus for multidisciplinary prosodic studies in Spanish and Catalan. Language Resources and Evaluation, 47(4), 945–971. https://doi.org/10.1007/s10579-012-9213-0

Hidalgo, A. i Congosto, Y. (2011). PROE: Corpus para la caracterización prosódica de los registros orales del español. A A. Hidalgo, Y. Congosto i M. Quilis Merín (Ed.), El estudio de la prosodia en España en el siglo XXI: perspectivas y ámbitos (p. 333–349). Universitat de València, Facultat de Filologia, Traducció i Comunicació.

Llisterri, J., Machuca, M. J. i Ríos, A. (2019). VILE-P: un corpus para el estudio prosódico de la variación inter e intralocutor. A J. M. Lahoz-Bengoechea i R. Pérez Ramón (Ed.), Subsidia: Tools and Resources for Speech Sciences / Subsidia: herramientas y recursos para las ciencias del habla (p. 117–123). Universidad de Málaga. https://hdl.handle.net/10630/18177

Mora, E., Pietrosemoli, L., Cavé, C., Obediente, E. i La Cruz, E. (2005). Un corpus de pares mínimos para el español de Venezuela. Lengua y Habla, 9, 117–121. https://dialnet.unirioja.es/servlet/articulo?codigo=4002077

MULTEXT Prosodic database (ISLRN 098-719-242-965-4; Versió 1.0). (1998). European Language Resources Association (ELRA). http://www.islrn.org/resources/098-719-242-965-4/

Pineda, L. A., Cuétara, J. O., Castellanos, H., López, I. i Villaseñor, L. (2004). DIMEx100: A new phonetic and speech corpus for Mexican Spanish. A C. Lemaître, C. A. Reyes i J. A. González (Ed.), IBERAMIA 2004. 9th Ibero-American Conference on AI, Puebla, México, November 22-26, 2004. Proceedings (p. 974–984). Springer. https://doi.org/10.1007/978-3-540-30498-2_97

Pustka, E., Gabriel, C., Meisenburg, T., Burkard, M. i Dziallas, K. (2018). (Inter-)Fonología del Español Contemporáneo (I)FEC: metodología de un programa de investigación para la fonología de corpus. Loquens, 5(1), 1–16. https://doi.org/10.3989/loquens.2018.046

Renato, A. C. i Álvarez, J. A. (2004). Corpora of Latin American Spanish for research in prosody and synthesis. A A. W. Black i K. Lenzo (Ed.), Fifth ISCA ITRW on Speech Synthesis (SSW5). Pittsburgh, PA, USA, June 14-16, 2004 (p. 221–222). ISCA Archive. https://www.isca-speech.org/archive_open/ssw5/ssw5_221.html

Torreira, F. i Ernestus, M. (2010). The Nijmegen Corpus of Casual Spanish. A N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner i D. Tapias (Ed.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta, May 17-23, 2010 (p. 2981–2985). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2010/summaries/271.html

Corpus orals en espanyol per al desenvolupament de les tecnologies de la parla

Llisterri, J., Machuca, M. J., Mota, C., Riera, M. i Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from https://joaquimllisterri.cat/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Albayzín

Albayzín corpus (ELDA-S0089). Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0089/

Base de datos oral del español Albayzín. Universitat Politècnica de València, Universidad Politécnica de Madrid, Universidad de Granada, Universitat Autònoma de Barcelona, Universitat Politècnica de Catalunya. 5 CD-ROMs. 1999.

Casacuberta, F., García, R., Llisterri, J., Nadeu, C., Pardo, J. M. i Rubio, A. (1991). Development of Spanish corpora for speech research (Albayzín). A G. Castagneri (Ed.), Proceedings of the workshop on international cooperation and standardization of speech databases and speech I /O assessment methods. Chiavari, Italy. September 26-28, 1991. Retrieved from https://joaquimllisterri.cat/publicacions/Casacuberta_et_al_91.pdf

Casacuberta, F., García, R., Llisterri, J., Nadeu, C., Pardo, J. M. i Rubio, A. (1992). Desarrollo de corpus para la investigación en tecnologías del habla (Albayzín). Procesamiento del Lenguaje Natural, 12, 35-42. Retrieved from https://joaquimllisterri.cat/publicacions/Casacuberta_et_al_92_Corpus_Albayzin.pdf

Díaz, J., Rubio, A., Peinado, A., Segarra, E., Prieto, N., Casacuberta, F. (1993) "Development of task-oriented Spanish speech corpora" in EUROSPEECH 1993. Proceedings of the 3rd European Conference on Speech Communication and Technology. 21 - 23 September, 1993. Berlin, Germany.

Díaz Verdejo, J.E., Peinado, A.M., Rubio, A.J., Segarra, E., Prieto, N., Casacuberta, F. (1998) "Albayzín: a task-oriented Spanish speech corpus", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. p. 497-502.

Llisterri, J. i Poch, D. (1991). Phonetic criteria for the development of a speech database in Spanish (the Albayzín project). A G. Castagneri (Ed.), Proceedings of the workshop on international cooperation and standardization of speech databases and speech I /O assessment methods. Chiavari, Italy. September 26-28, 1991. Retrieved from https://joaquimllisterri.cat/publicacions/Llisterri_Poch_91_Albayzin.pdf

Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J. B. i Nadeu, C. (1993). Albayzín speech database: Design of the phonetic corpus. A Eurospeech 1993. Proceedings of the 3rd European conference on speech communication and technology. Vol 1. (p. 175-8). Berlin, Germany. 21- 23 September, 1993. Retrieved from https://joaquimllisterri.cat/publicacions/Moreno_et_al_93_Albayzin_Phonetic_Corpus.pdf

Ahumada

Ortega García, J., González Rodríguez, J., Marrero Aguiar, V., Díaz Gómez, J.J., García Jiménez, R., Lucena Molina, J., Sánchez Molero, J.A.G. (1998) "Speaker recognition-oriented ’Ahumada’ large speech corpus", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. p. 1101 - 1106.

Ortega García, J., González Rodríguez, J., Marrero Aguiar, V., Díaz Gómez, J.J., García Jiménez, R., Lucena Molina, J., Sánchez Molero, J.A.G. (1998) "AHUMADA: A Large Speech Corpus in Spanish for Speaker Identification and Verification", in ICASSP 1998. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing. 12 -15 May, 1998. Seattle, Washington, USA. p. 773-776.

Ortega, J., González Rodríguez, J. i Marrero, V. (2000). AHUMADA: A large speech corpus in Spanish for speaker characterization and identification. Speech Communication, 31(2–3), 255–264. https://doi.org/10.1016/S0167-6393(99)00081-3

EUROM

Chan, D., Fourcin, A., Gibbon, D., Granström, B., Huckvale, M., Kokkinakis, G., Kvale, K., Lamel, L., Lindberg, B., Moreno, A., Mouropoulos, J., Senia, F., Trancoso, I., Veld, C., Zeiliger, J. (1995) "EUROM- A Spoken Language Resource for the EU", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, p. 867-870. https://www.phon.ucl.ac.uk/resource/eurom1/eurospeech95eurom.pdf

Fourcin, A., Dolmazon, J.M. (on behalf of the SAM Project) (1991) "Speech knowledge, standards and assessment", in Actes du XIIème Congrès International des Sciences Phonétiques. 19-24 août 1991, Aix-en-Provence, France. Aix-en-Provence: Université de Provence, Service des Publications. Vol 5 p. 430-433.

Llisterri, J., Aguilar, L., Blecua, B., Machuca, M. J., de la Mota, C., Ríos, A., . . . Salavedra, J. (1993). Spanish EUROM.1: Phonetic contents. Report D 6. SAM-A/UPC/002. ESPRIT Project 6819 (SAM-A) Speech Technology Assessment in Multilingual Applications. Retrieved from https://joaquimllisterri.cat/publicacions/Llisterri_et_al_1993_Spanish_EUROM1_Phonetic_contents.pdf

Moreno, A. (1993) EUROM-1 Spanish Database. Report D6, SAM-A/UPC/003. September 1993

LC-STAR, Lexica and Corpora for Speech-to-Speech Translation Components

Arranz, V., Castell, N., Crego, J.M., Giménez, J., de Gispert, A., Lambert. P. (2004) "Bilingual connections for trilingual corpora: An XML approach", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association.

Arranz, V., Castell, N., Giménez, J. (2003) "Development of language resources for speech-to-speech translation", in RANLP 2003. International Conference on Recent Advances in Natural Language Processing. 10-12 September 2003. Borovets, Bulgaria.

Arranz, V., Castell, N., Giménez, J. (2004) "Creación de recursos lingüísticos para la traducción automática", in Sanchis Arnal, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia.

Bisani, M., Bonafonte, A., Castell, N., Hartikainen, E., Maltese, G., Moreno, A., Shammass, S., Ziegenhain, U. (2003) "Lexicon and corpora for speech to speech translation (LC-STAR)", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 317-319. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3189/1680

Conejero, D., Giménez, J., Arranz, V., Bonafonte, A., Pascual, N., Castell, N., Moreno, A. (2003) "Lexica and corpora for speech-to-speech translation: A trilingual approach", in EUROSPEECH 2003 - INTERSPEECH 2003. Proceedings of the 8h European Conference on Speech Communication and Technology. 1 - 4 September, 2003. Geneva, Switzerland. p. 1593-1596. https://www.cs.upc.edu/~nlp/papers/conejero03.pdf

de Vriend, F., Castell, N., Giménez, J., Maltese, G. (2004) "LC-STAR: XML-coded phonetic lexica and bilingual corpora for speech-to-speech translation", in Papillon 2004. 5th Workshop on Multilingual Lexical Databases. 30 August - 1 September 2004. Grenoble, France.

Fersøe, H., Hartikainen, E., van den Heuvel, H., Maltese, G., Moreno, A., Shammass, S., Ziegenhain, U. (2004) "Creation and validation of large lexica for speech-to-speech translation purposes", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association.

Hartikainen, E., Maltese, G., Moreno, A., Shammass, S., Ziegenhain, U. (2003) "Large lexica for speech-to-speech translation: From specification to creation", in EUROSPEECH 2003 - INTERSPEECH 2003. Proceedings of the 8h European Conference on Speech Communication and Technology. 1 - 4 September, 2003. Geneva, Switzerland.

SpeechDat - SALA, SpeechDat Across Latin America / SpeechDat Across All America

AURORA Project Database - Subset of SpeechDat-Car Spanish database (AURORA/CD0003-02). Paris: ELDA, Evaluations and Language Resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-AURORA-CD0003_02/#spanish

Chilean Spanish FDB-500 (U-S0317). Universitat Politècnica de Catalunya, 1998. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://universal.elra.info/product_info.php?products_id=2374

Colombian Spanish Speech Database (EELRA-U-S 0003). Universitat Politècnica de Catalunya, 1998. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://universal.elra.info/product_info.php?products_id=1548

Draxler, C., van den Heuvel, H., Tropf, H. (1998) "SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. p. 361-366. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.50.2745

Gurlekian, J., Colantoni, L., Torres, H., Rincón, A., Moreno, A., Mariño, J. (2001) "Database for an automatic speech recognition system for Argentine Spanish", in Proceedings of the IRCS Workshop on Linguistic Databases. 11-13 December 2001, University of Pennsylvania, Philadelphia, PA, USA. p. 92-98. https://www.researchgate.net/publication/2413819_Database_for_an_Automatic_Speech_Recognition_System_for_Argentine_Spanish

van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, P., Moreno, A., Richard, G. (1999) "SpeechDat-Car: Towards a collection of speech databases for automotive environments", in Proceedings. Workshop on Robust Methods for Speech Recognition in Adverse Conditions (p. 135-138). Tampere, Finland. http://hdl.handle.net/2066/76428

van den Heuvel, H.., Boudy, J., Comeyne, R., Euler, S., Moreno, A., Richard, G. (1999) "The SpeechDat-Car multilingual speech databases for in-car applications: some first validation results", in EUROSPEECH 1999. Proceedings of the 6th European Conference on Speech Communication and Technology. 5 - 9 September, 1999. Budapest, Hungary. http://hdl.handle.net/2066/76427

van den Heuvel, H.., Hall, P., Höge, H., Moreno, A., Rincón, A., Senia, F. (2004) "SALA II across the finish line : a large collection of mobile telephone speech databases from North and Latin America completed", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2004/

Moreno, A. (2000) "SALA: SpeechDat Across Latin America", in Proceedings of the 1st Workshop on Very Large Databases. May, 2000. Athens, Greece. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.1864

Moreno, A., Comeyne, R., Haslam, K., van den heuvel, H., Höge, H., Horbach, S.., Micca, G. (2000) "SALA: SpeechDat across Latin America. Results of the First Phase", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association. p. 877-882. http://www.lrec-conf.org/proceedings/lrec2000/html/summary/10.htm

Moreno, A., Gedge, O., van den Heuvel, H., Höge, H., Horbach, S., Martin, P., Pinto, E., Rincón, A., Senia, F., Sukkar, R. (2002) "SpeechDat across all America: SALA II" in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association.

Moreno, A., Höge, H., Köler, J., Mariño, J.B. (1998) "SpeechDat Across Latin America. Project SALA", in Rubio, A., Gallardo, N., Castro, R., Tejada, A. (Eds.) LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. p. 367-370.

Moreno, A., Lindberg, B., Draxler, C., Richard, G., Choukri, K., Euler, S., Allen, J. (2000) "SPEECHDAT-CAR. A Large Speech Database for Automotive Environments", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2000/html/summary/10.htm

Moreno, A., Senia, F., Rincón, A. (2002) The complete SALA II project specifications. Version 1.6. SALA II Technical Report. November 29, 2002.

SALA II Spanish from Mexico database (ELDA-S0171). Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0171/

SALA II Spanish Mobile Network Database collected in Venezuela (ELDA-S0167). ATLAS, Applied Technologies on Language and Speech, Barcelona. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0167/

SALA Spanish Colombian Database (ELRA-ST81). Universitat Politècnica de Catalunya, 2000. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://universal.elra.info/product_info.php?products_id=1466

SALA Spanish Venezuelan Database (ELDA-S0141). Universitat Politècnica de Catalunya, 2000. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0141/

Spanish SpeechDat (M) DB1 (ELDA-S0065). Universitat Politècnica de Catalunya, 1999. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0065/

Spanish SpeechDat (M) DB2 (ELDA-S0066). Universitat Politècnica de Catalunya, 1999. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0066/

Spanish SpeechDat Database for the Mobile Telephone Network (ELDA-S0119). Universitat Politècnica de Catalunya, 2003. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0119/

Spanish SpeechDat-Car Database (ELDA-S0140). Universitat Politècnica de Catalunya, 2001. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0140/

Spanish SpeechDat(II) FDB-1000 (ELDA-S0101). Universitat Politècnica de Catalunya, 1997. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0101/

Altres corpus orals en espanyol

1997 HUB-4 Broadcast News Evaluation Non English Test Material (LDC2001S91). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC2001S91

1997 HUB-5 Spanish Evaluation (LDC2002S25). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC2002S25

1997 HUB-5 Spanish Transcripts (LDC2003T04). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC2003T04

1997 Spanish Broadcast News Speech (Hub-4NE) (LDC98S74). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC98S74

1997 Spanish Broadcast News Transcripts (Hub-4NE) (LDC98T29). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC98T29

22 Language Corpus. Center for Spoken Language Understanding, Oregon Graduate Institute Science University. https://catalog.ldc.upenn.edu/LDC2005S26

Alcácer, N., Castro, M.J., Galiano, I., Granell, R., Grau, S., Griol, D. (2004) "Adquisición de un corpus de diálogo: DIHANA", in Sanchis Arnal, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia. p. 131-136. http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/III/actas3JTH.pdf

ANITA (Audio eNhancement In Telecom Applications) (ELDA-S0156) EADS Telecom. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0156/

Bordel, G., Ezeiza, A., López de Ipiña, K., Méndez, M., Peñagarikano, M., Rico, T., Tovar, C., Zulueta, E. (2004) "Development of Resources for a Bilingual Automatic Index System of Broadcast News in Basque i Spanish", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. p. 881-884.

CALLFRIEND Spanish-Caribbean Dialect (LDC96S57). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96S57

CALLFRIEND Spanish-Non-Caribbean Dialect (LDC96S58). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96S58

CALLHOME Spanish Dialogue Act Annotation (LDC2001T61). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC2001T61

CALLHOME Spanish Lexicon (LDC96L16). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96L16

CALLHOME Spanish Speech (LDC96S35). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96S35

CALLHOME Spanish Transcripts (LDC96T17). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96T17

Cieri, C., Campbell, J.P., Nakasone, H., Miller, D., Walker, K. (2004) "The Mixer corpus of multilingual, multichannel speaker recognition data", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. p. 627-630.

Esquerra, I., Bonafonte, A. i Febrer, A. (1998). A bilingual Spanish-Catalan database of units for concatenative synthesis. Workshop on Language Resources for European Minority Languages. Granada, Spain, 27 May, 1998. http://ixa2.si.ehu.eus/saltmil/index.php/en/activities-mainmenu-73/saltmil-workshops-mainmenu-77/30.html

Esteve, J., Tapias, D., Torrecilla, J.C. (1994) "La base de datos VESTEL", Comunicaciones de Telefónica I+D 5, 2: 44-54.

Miguel, A., Galiano, I., Granell, R., Hurtado, L. F., Sánchez, J. A. i Sanchis, E. (2003). La plataforma de adquisición de diálogos en el proyecto DIHANA. Procesamiento del Lenguaje Natural, 31, 341–342. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3200/1691

García Mateo, C., Diéguez, J., Docío, C., Cardenal, A. (2004) "Transcrigal: A bilingual system for automatic indexing of broadcast news", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. p. 2061-2064.

Guevara-Rukoz, A., Demirşahin, I., He, F., Chu, S.-H. C., Sarin, S., Pipatsrisawat, K., Gutkin, A., Butryna, A. i Kjartansson, O. (2020). Crowdsourcing Latin American Spanish for low-resource text-to-speech. A N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk i S. Piperidis (Ed.), Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 11-16, 2020 (p. 6504–6513). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.801.pdf

Gurlekian, J., Rodríguez, H., Colantoni, J., Torres, H. (2001) "Development of a prosodic database for an Argentine Spanish text to speech system", in Proceedings of the IRCS Workshop on Linguistic Databases. 11-13 December 2001, University of Pennsylvania, Philadelphia, PA, USA. p. 99-104. https://www.researchgate.net/publication/2586101_Development_of_a_Prosodic_Database_for_an_Argentine_Spanish_Text_to_Speech_System

Hennebert, J., Melin, H., Petrovska, D., Genoud, S. (2000) "POLYCOST: A telephone-speech database for speaker recognition", Speech Communication 31, 2-3: 265-270. https://doi.org/10.1016/S0167-6393(99)00082-5

Hozjan, V., Kacic, Z., Moreno, A., Bonafonte, A., Nogueiras, A. (2002) "Interface Databases: Design and Collection of a Multilingual Emotional Speech Database", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association. p. 2024-2028. http://www.lrec-conf.org/proceedings/lrec2002/sumarios/174.htm

Hub-5 Spanish Telephone Speech Corpus (LDC98S70). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC98S70

Hub-5 Spanish Transcripts (LDC98T27). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC98T27

Iskra, D., Grosskopf, B., Marasek, K., van den Heuvel, H., Diehl, F., Kiessling, A. (2002) "SPEECON Speech Databases for Consumer Devices: Database Specification and Validation", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002. Las Palmas de Gran Canaria, Spain. Paris: ELRA, European Language Resources Association. p. 329-333. http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm

Lamel, L.F., Adda, G., Adda-Decker, M., Corredor-Ardoy, C., Gangolf, J.J., Gauvain, J.L. (1998) "A Multilingual Corpus for Language Identification", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. 2, p. 1115-1122. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.5217

Lander, T.L., Cole, R.A., Oshika, B., Noel, M. (1995) "The OGI 22 Language Telephone Speech Corpus", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, p. 817-820.

LATINO-40 Spanish Read News ( LDC95S28). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC95S28

López Cózar R., Rubio, A.J., García, P., Segura, J.C. (1998) "A Spoken Dialogue System based on Dialogue Corpus Analysis", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. p. 55-58.

Martin, A., Miller, D., Przybocki, M., Campbell, J., Nakasone, H. (2004) "Conversational telephone speech corpus collection for the NIST speaker recognition evaluation 2004", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004. Lisbon, Portugal. Paris: ELRA, European Language Resources Association. http://www.lrec-conf.org/proceedings/lrec2004/summaries/542.htm

MICROADES, ATLAS Spanish Microphone Database (ELDA-S0165). ATLAS, Applied Technologies on Language and Speech, Barcelona. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0165/

Montero, J. M., Gutiérrez Arriola, J. M., Colás, J., Macías, J., Enríquez, E. i Pardo, J. M. (1999). Development of an emotional speech synthesiser in Spanish. Proceedings, Sixth European Conference on Speech Communication and Technology (EUROSPEECH’99). Budapest, Hungary, September 5-9, 1999 (p. 2099–2102). ISCA Archive. https://www.isca-speech.org/archive_v0/eurospeech_1999/e99_2099.html

Montero, J. M., Gutiérrez Arriola, J. M., Palazuelos, S., Enríquez, E., Aguilera, S. i Pardo, J. M. (1998). Emo- tional speech synthesis: From speech database to TTS. The 5th International Conference on Spoken Language Processing, incorporating the 7th Australian International Speech Science and Technology Conference. Sydney Convention Centre, Sydney, Australia, 30th November - 4th December 1998 (Paper 1037). ISCA Archive. https://www.isca-speech.org/archive_v0/icslp_1998/i98_1037.html

Multilanguage Telephone Speech. Center for Spoken Language Understanding, Oregon Graduate Institute. https://catalog.ldc.upenn.edu/LDC2006S35

Muthusamy, Y., Holliman, E., Wheatley, B., Picone, J., Godfrey, J. (1995) "Voice Across Hispanic America: A Telephone Speech Corpus of American Spanish," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. May, 1995. Detroit, Michigan, USA. p. 85-88.

Muthusamy, Y.K., Cole, R.A., Oshika, B.T. (1992) "The OGI multi-language telephone speech corpus", in ICSLP 1992. Proceedings of the 2nd International Conference on Spoken Language Processing. 12 - 16 October, 1992. Banff, Alberta, Canada. Edmonton: The University of Alberta. p. 895-898.

OGI Multilanguage Corpus (LDC94S17). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC94S17

Ortega Giménez, A., Sukno, F., Lleida Solano, E., Frangi Caregnato A., Miguel Artiaga, A., Buera Rodríguez, L., Zacur, E. (2004) "Base de Datos Audiovisual y Multicanal en Castellano para Reconocimiento Automático del Habla Multimodal en el Automóvil", in Sanchis Arnal, E. (Ed.) Actas de las III Jornadas en Tecnología del Habla. 1Valencia, del 17-19 de noviembre de 2004. Organizadas por la Red Temática en Tecnología del Habla. Valencia: Departamento de Sistemas Informáticos y Computación, Facultad de Informática, Universidad Politécnica de Valencia. p. 125-130. http://diec.unizar.es/intranet/articulos/uploads/Base%20de%20Datos%20Audiovisual%20y%20Multicanal%20en%20Castellano%20para%20Reconocimiento%20Automatico%20del%20Habla%20Multimodal%20en%20el%20Automovil.pdf

Ortega Giménez, A., Sukno, F., Lleida Solano, E., Frangi Caregnato A., Miguel Artiaga, A., Buera Rodríguez, L., Zacur, E. (2004) "AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. p. 763-766. http://diec.unizar.es/intranet/articulos/uploads/lrec04def2.pdf.pdf

Pineda, L. A., Castellanos, H., Cuétara, J. O., Galescu, L., Juárez, J., Llisterri, J., Pérez, P. i Villaseñor, L. (2010). The Corpus DIMEx100: Transcription and evaluation. Language Resources and Evaluation, 44(4), 347–370. https://doi.org/10.1007/s10579-009-9109-9

Pineda, L. A., Cuétara, J. O., Castellanos, H., López, I. i Villaseñor, L. (2004). DIMEx100: A new phonetic and speech corpus for Mexican Spanish. A C. Lemaître, C. A. Reyes i J. A. González (Ed.), IBERAMIA 2004. 9th Ibero-American Conference on AI, Puebla, México, November 22-26, 2004. Proceedings (p. 974–984). Springer. https://doi.org/10.1007/978-3-540-30498-2_97

Renato, A. C. i Álvarez, J. A. (2004). Corpora of Latin American Spanish for research in prosody and synthesis. A A. W. Black i K. Lenzo (Ed.), Fifth ISCA ITRW on Speech Synthesis (SSW5). Pittsburgh, PA, USA, June 14-16, 2004 (p. 221–222). ISCA Archive. https://www.isca-speech.org/archive_open/ssw5/ssw5_221.html

Siemund, R., Höge, H., Kunzmann, S., Marasek, K. (2000) "SPEECON - Speech Data for Consumer Devices", in LREC 2000. Proceedings of the Second International Conference on Language Resources and Evaluation. 31 May - 2 June, 2000, Athens, Greece. Paris: ELRA, European Language Resources Association. Vol. 2, p. 883-886. http://www.lrec-conf.org/proceedings/lrec2000/html/summary/63.htm

Spanish Speecon database (ELRA-SD165). Siemens AG - Universitat Politècnica de Catalunya. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://universal.elra.info/product_info.php?products_id=1335

Spanish Speech Corpus 1 (Appen) (ELDA-S0149). Appen, Australia. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0149/

Spanish TTS Speech Corpus (Appen) (ELDA-S0150). Appen, Australia. Paris: ELDA, Evaluations and Language resources Distribution Agency. http://catalog.elra.info/en-us/repository/browse/ELRA-S0150/

Tapias, A., Acero, A., Esteve, J., Torrecilla, J.C. (1994) "The VESTEL Telephone Speech Database", in ICSLP 1994. Proceedings of the 3rd International Conference on Spoken Language Processing. 18 - 22 September, 1994. Yokohama, Japan. p. 1811-1814.

Tlatoa Common Questions Corpus. Tlatoa, Grupo de Investigación en Tecnologías del Habla. Centro de Investigación en Tecnologías de Información y Automatización, Universidad de las Américas. Puebla, México.

Tlatoa/OGI Spanish TTS Corpus. Tlatoa, Grupo de Investigación en Tecnologías del Habla. Centro de Investigación en Tecnologías de Información y Automatización, Universidad de las Américas. Puebla, México.

de la Torre Munilla, C., Hernández-Gómez, L.A., Tapias, D. (1995) "CEUDEX: a Data Base Oriented to Context-Dependent Units Training in Spanish for Continuous Speech Recognition", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, p. 845-848.

Trancoso, I. (1995) "The ONOMASTICA Interlanguage Pronunciation Lexicon", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 1. p. 829-832.

Uraga, E., Gamboa, C. (2004) "VOXMEX Speech Database : Design of a Phonetically Balanced Corpus", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. p. 1471-1474.

VAHA, Voice Across Hispanic America (Polyphone II) (LDC96S41). Philadelphia, PA: Linguistic Data Consortium. https://catalog.ldc.upenn.edu/LDC96S41

Villaseñor, L., Montes, M., Vaufreydaz, D., Serignat, J.-F. (2003) "Elaboración de un corpus balanceado para el cálculo de modelos acústicos usando la web", in CIC 2003. XII Congreso Internacional de Computación. 13-17 de octubre de 2003. Ciudad de México, México. http://www.corpus.unam.mx/cursocorpus/Villasennor-ElaboracionCorpus-version-final.pdf

Villaseñor, L., Montes, M., Vaufreydaz, D., Serignat, J.-F. (2004) "Experiments on the the construction of a phonetically balanced corpus from the web", in Gelbukh, A. (Ed.) CICLing-2004. Proceedings of t5th International Conference on Intelligent Text Processing and Computational Linguistics. 15-21 February, 2004. Seoul, Korea. Berlin - Heidelberg: Springer (Lecture Notes in Computer Science, 2945) p. 416-419. http://ccc.inaoep.mx/~mmontesg/publicaciones/2004/PhoneticallyBalancedCorpus-cicling04.pdf

Corpus de llengua oral en espanyol

C-ORAL-ROM, Corpus integrado de referencia en lenguas romances

Alcántara Plá, M., Moreno Sandoval, A., de la Madrid Heitzmann, G., González Ledesma, A., Ares Chicote, F. (2003) "C-ORAL-ROM. Corpus integrado de referencia en lenguas romances", XIX Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural. Universidad de Alcalá, 10, 11 y 12 de septiembre de 2003. Procesamiento del Lenguaje Natural 31: 301-302. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/3181/1672

Cresti, E., Bacelar do Nascimento, F., Moreno Sandoval, A., Véronis, J., Martin, P., Choukri, K. (2004) "The C-ORAL-ROM CORPUS. A Multilingual Resource of Spontaneous Speech for Romance Languages", in LREC 2004. Proceeedings of the 4th International Conference on Language Resources and Evaluation. 26-28 May, 2004, Lisbon, Portugal. Paris: ELRA, European Language Resources Association. http://lrec-conf.org/proceedings/lrec2004/summaries/357.html/

Cresti, E., Moneglia, M. (Eds.) (2005) C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. Amsterdam: John Benjamins (Studies in Corpus Linguistics 15) (including DVD).

Cresti, E., Moneglia, M., Bacelar do Nascimento, F., Moreno Sandoval, A., Véronis, J., Martin, P., Choukri, K., Mapelli, V., Falavigna, D., Cid, A., Blum, C. (2002) "The C-ORAL-ROM Project. New methods for spoken language archives in a multilingual romance corpus", in LREC 2002. Proceedings of the Third International Conference on Language Resources and Evaluation. 27 May - 2 June, 2002, Las Palmas de Gran Canaria, Spain. Paris: European Language Resources Association. http://lrec-conf.org/proceedings/lrec2002/sumarios/290.htm

Moreno Sandoval, A. (2002) "La evolución de los corpus de habla espontánea: la experiencia del LLI-UAM", in Rubio Ayuso, A. (Ed.) Actas de las II Jornadas en Tecnologías del Habla. Granada, del 16 al 18 de diciembre de 2002. Organizadas por la Red Temática en Tecnologías del Habla. Granada: Universidad de Granada, Departamento de Electrónica y Tecnología de Computadores. http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/II/articulos/15.pdf

Corpus de conversación coloquial - Grupo Val.Es.Co

Briz, A.s (Coord.) (1995) La conversación coloquial (Materiales para su estudio). València: Universitat de València, Facultad de Filología, Departamento de Filología Española (Lengua Española) (Cuadernos de FIlología, Anejo XVI).

Briz, A. (Coord.) (2001) Corpus de conversaciones coloquiales. Anejo 1 de Oralia. Madrid: ArcoLibros.

Briz, A. et al. (1993) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", Cahiers du Centre Interdisciplinaire des Sciencies du Langage, Actes du Colloque "Le Dialogue en question". Université de Toulouse -Le Mirail, Valencia, 1994. p. 103-109.

Briz, A. (1996) "El corpus de conversación coloquial del grupo Val.Es.Co", in Payrató, Ll., Boix, E., Lloret, M.-R., Lorente, M. (Eds.) Corpus, Corpora. Actes del 1er i 2on Col·loquis Lingüístics de la Universitat de Barcelona (CLUB-1, CLUB-2). Barcelona: Promociones y Publicaciones Universitarias SA. p. 255-296.

Briz, A. et al. (1995) "La elaboración de un corpus de español coloquial. Problemas metodológicos previos", in Actas del I Congreso de Lingüística General. València: Universitat de València.

Briz, A., Gómez Molina, J.R. (1992) "Scheme of Study of Colloquial Spanish: Some Methodological Considerations", LynX, A Monographic Series in Linguistics and World Perception 3: 111-124

Cabedo, A. i Pons, S. (Eds.). (n.d.). Corpus Val.Es.Co 2.0. Valencia: Val.Es.Co. (Valencia, Espa“ol Coloquial), Departamento de FilologÃa Espa“ola, Universidad de Valencia. Retrieved from http://www.valesco.es

Val.Es.Co. (n.d.). Sistema de transcripción. Val.Es.Co., Valencia Español Coloquial. Valencia: Val.Es.Co. (Valencia, Español Coloquial), Departamento de Filología Española, Universidad de Valencia. Retrieved from https://www.uv.es/valesco/sistema.pdf

CREA, Corpus de Referencia del Español Actual - Subcorpus Oral

Pino Moreno, M., Sánchez Sánchez, M. (1999) "El subcorpus oral del banco de datos CREA-CORDE (Real Academia Española): Procedimientos de transcripción y codificación", Oralia 2: 83-138.

Corpus Oral de Referencia del Español Contemporáneo

Marcos Marín, F. (1991) "Corpus lingüístico de referencia de la lengua española", Boletín de la Academia Argentina de Letras 56: 129-155.

Marcos Marín, F., Zumárraga, V. (1991) "El corpus de referencia de la lengua española", Razón y Fe 223/1, 109, Marzo 1991: 285-293.

Marcos Marín, F., Ballester, A., Santamaría, C. (1993) "Transcription Conventions used for the Corpus of Spoken Contemporary Spanish", Literary i Linguistic Computing 8, 4: 283-292. https://www.researchgate.net/publication/31460368_Transcription_Conventions_used_for_the_Corpus_of_Spoken_Contemporary_Spanish

Marcos Marín, F., Nicolás Martínez, M.C. (2003) "El etiquetado del Corpus Oral de Referencia del Español Contemporáneo", in Scarano, A. (Ed.) Macro-syntaxe et Pragmatique. L’analyse linguistique de l’oral. Roma: Bulzoni, 2003, 321-328.

Moreno Sandoval, A. (2002) "La evolución de los corpus de habla espontánea: la experiencia del LLI-UAM", in Rubio Ayuso, A. (Ed.) Actas de las II Jornadas en Tecnologías del Habla. Granada, del 16 al 18 de diciembre de 2002. Organizadas por la Red Temática en Tecnologías del Habla. Granada: Universidad de Granada, Departamento de Electrónica y Tecnología de Computadores. http://lorien.die.upm.es/~lapiz/rtth/JORNADAS/II/articulos/15.pdf

Norma lingüística culta de las ciudades del mundo hispánico

Macrocorpus de la norma lingüística culta de las principales ciudades del mundo hispánico (MC-NLCH). Preparado por José Antonio Samper Padilla, Clara Eugenia Hernández Cabrera y Magnolia Troya Déniz. Edición en CD-ROM. Las Palmas de Gran Canaria: Servicio de Publicaciones de la Universidad de Las Palmas de Gran Canaria, 1998.

Cuestionario para el estudio coordinado de la norma lingüística culta de las principales ciudades de Iberoamérica y de la Península Ibérica. I Fonética y Fonología. Madrid: PILEI - CSIC (Departamento de Geografía Lingüística I ), 1973.

Esgueva, M., Cantarero, M. (1981) El habla de la ciudad de Madrid. Materiales para su estudio. Madrid: CSIC.

Lope Blanch, J.M. (1986) El estudio del español hablado culto. Historia de un proyecto. México: Universidad Nacional Autónoma de México (Publicaciones del Centro de Lingüística Hispánica, 22)

Lope Blanch, J.M. (Coord.) (1971) El habla de la ciudad de México. Materiales para su estudio. México: Universidad Nacional Autónoma de México.

Lope Blanch, J.M. (Coord.) (1976) El habla popular de la ciudad de México. Materiales para su estudio. México: Universidad Nacional Autónoma de México.

Lope Blanch, J.M. (Coord.) (1995) El habla popular de la República Mexicana. Materiales para su estudio. México: Universidad Nacional Autónoma de México - El Colegio de México (Publicaciones del Centro de Lingüística Hispánica, 43).

Samper, J. A. (1995) "Macrocorpus de la norma lingüística culta de las principales ciudades de España y América",Lingüística (Publicación de la Asociación de Lingüística y Filología de la América Latina) 7: 263-293.

Samper, J. A. (2014). Cincuenta años del proyecto de estudio de la norma culta hispánica. Lingüística Española Actual, 36(1), 149-170.

PRESEEA, Proyecto para el Estudio Sociolingüístico del Español del España y de América

Ávila Muñoz, A. M., Lasarte Cervantes, M. C. i Villena Ponsoda, J. A. (Eds). (2008). El español hablado en Málaga II. Corpus oral para su estudio sociolingüístico. Nivel de estudios medio (incluye un CD-ROM). Málaga: Editorial Sarriá.

I. Introducción; 1., PRESEEA y la investigación del español en el siglo XXI; 2., El proyecto PRESEEA-Málaga. Estudio Sociolingüístico del Español Urbano de Málaga (ESESUMA); II. Corpus y lingüística de corpus; 3., La lingüística de corpus. Una herramienta necesaria en la metodología (socio)lingüística actual; 4., Niveles de acceso a los corpus orales transcritos. Aplicación al macrocorpus PRESEEA; 5., Corpus PRESEEA-Málaga: nivel de estudios medio. Transcripción y etiquetado. Referencias bibliográficas. Transliteraciones. Entrevista 25. Entrevista 28. Entrevista 40.

Briceño, D. L., Fernández, M. F., Maldonado, J., Velazco, J. i Palm, P. (2010). Un nuevo corpus sociolingüístico del habla de Mérida: PRESEEA-MÉRIDA-VE. Lengua y Habla, 14, 1-11. Retrieved from http://erevistas.saber.ula.ve/index.php/lenguayhabla/article/view/1080

Lasarte Cervantes, M. C., Sánchez Sáez, J. M., Ávila Muñoz, A. M. i Villena Ponsoda, J. A. (Eds.). (2008). El español hablado en Málaga III. Corpus oral para su estudio sociolingüístico. Nivel de estudios superior (incluye un CD-ROM). Málaga: Editorial Sarriá.

Martín Butragueño, P. i Lastra, Y. (2011). Corpus sociolingüístico de la ciudad de México. Materiales de preseea-méxico. Volumen I. Hablantes de instrucción superior. México, D.F.: El Colegio de México.

Parte primera. Introducción; I., Metodología; 1., El proyecto PRESEEA; 2., El proyecto PRESEA-Málaga. Estudio Sociolingüístico del Español Urbano de Málaga (ESESUMA); II., Etiquetado del corpus. Problemas de anotación e intercambio; 0., Objetivo; 1., Niveles de acceso a los corpus orales transcritos y generación de tipos; 2., Intercambio de documentos de distinto nivel; 3., Transformación y validación de documentos a XML; 4., Conclusiones; 5., Apéndices; III., Referencias bibliográficas; Parte segunda. Muestra de transliteración; Entrevista 46; Entrevista 65.

Moreno Fernández, F. (1997) "Metodología del ’Proyecto para el Estudio Sociolingüístico del Españo del España y de América’", in Moreno Fernández, F. (Ed.) Trabajos de sociolingüística hispánica. Alcalá de Henares: Universidad de Alcalá, Servicio de Publicaciones (Ensayos y Documentos, 27) p. 137-167.

Moreno Fernández, F. (2003) Metodología del "Proyecto para el estudio sociolingüístico del español de España y de América" (Preseea). Versión revisada, Octubre de 2003. https://preseea.linguas.net/Metodolog%C3%ADa.aspx

PRESEEA. (2011). Guía PRESEEA para la investigación lingüística. Versión 2.0. PRESEEA, Proyecto para el estudio sociolingüístico del español de España y de América. Retrieved from https://preseea.linguas.net/Portals/0/Metodologia/GUIA_PRESEEA_INVESTIGACION_LINGUISTICA.pdf

Vida Castro, M. (Ed.). (2007). El español hablado en Málaga I. Corpus oral para su estudio sociolingüístico. Nivel de estudios bajo (incluye un CD-ROM). Málaga: Editorial Sarriá.

Villena Ponsoda, J. A., vila Muñoz, A. M., Sánchez Bohorques, J. M. i Lasarte Cervantes, M. C. (2010). Problemas de anotación e intercambio en los corpus orales: Estrategias para la transformación de textos etiquetados en documentos XML. El caso de los corpus PRESEEA. Oralia. Análisis del Discurso Oral, 13, 261-323.

Altres corpus de llengua oral en espanyol

Alvar Ezquerra, M., Villena Ponsoda, J.A. (Coord.) (1994) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7)

Azorín Fernández, D., Martínez Linares, M.A., Santamaría Pérez, M.I. (1999) "Léxico y creación léxica en un corpus oral de lenguaje juvenil", in Fernández González, J., Fernández Juncal, C., Marcos Sánchez, M., Prieto de los Mozos, E., Santos Río, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 1, p. 217-228.

Barcala, M., Domínguez, E., Fernández, A., Rivas, R., Santalla, M. P., Vázquez, V. i Villapol, R. (2018). El corpus ESLORA de español oral: diseño, desarrollo y explotación. CHIMERA: Romance Corpora and Linguistic Studies, 5(2), 131–151. https://doi.org/10.15366/chimera2018.5.2.003

Domínguez, C.L. (1997) "El habla de Mérida: un corpus de estudio", Lengua y Habla 2.

Domínguez, C.L., Mora, E. (Coords.) (1998) El habla de Mérida. Mérida (Venezuela): Universidad de Los Andes.

Gallardo Paúls, B., Sanmartín Sáez, J. (2005) Afasia fluente. Materiales para su estudio (Volumen 1 del corpus PerLA). València: Universitat de València.

Gallardo Paúls, B., Moreno Campos, V. (2005) Afasia no fluente. Materiales y análisis pragmático (Volumen 2 del corpus PerLA). València: Universitat de València.

Hernández Sacristán, C., Fernández Peña, L. (1992) Conversación infantil. Materiales para su estudio en niños desde los cinco a los nueve años. Valencia: Promolibro.

Martín Zorraquino, M.A. (1991) "Estudio sociolingüístico del habla de Zaragoza: problemas y primeros resultados", in Actas del Congreso de Lingüistas Aragoneses, Zaragoza, 1991. p. 169-200.

Rodríguez Yáñez, J.P., Lorenzo, A., Ramallo, F., Acuña Ferreira, V., Álvarez López, S., Ameal Guerra, A., Casares Berg, H., Valverde Juncal, M. (2001) "El Corpus Informatizado de Fala Bilingïe Galego/Castelán de la Universidad de Vigo: presentación y problemas de identificación y etiquetado de los códigos gallego y castellano", in Moreno, A.I., Colwell, V. (Eds.) Perspectivas recientes sobre el discurso. Recent perspectives on discourse. León: Secretariado de Publicaciones y Medios Audiovisuales, Universidad de León - AESLA, Asociación Española de Lingüística Aplicada. (+ CD-ROM). p. 188.

Solís, I. (2018). Corpus españoles dialógicos para el análisis de la conversación. CHIMERA: Romance Corpora and Linguistic Studies, 5(1), 117–129. https://doi.org/10.15366/chimera2018.5.1.010

Vann, R.E. (2003) "Digitizing and transcribing field recordings of Catalonian Spanish", in 3rd E-MELD (Electronic Metastructure for Endangered Languages Data) Workshop on Digitizing and Annotating Texts and Field Recordings. 11-13 July 2003. LSA Institute, Michigan State University. http://emeld.org/workshop/2003/paper-Vann.html

Vázquez Veiga, N. (1995) "’Corpus de lengua hablada en la ciudad de A Coruña’: el rol del entrevistador en la conversación semidirigida", Moenia, Revista Lucense de Lingüística i Literatura 1: 181-202.

Vera Luján, A. (1998) "Los medios de comunicación como recurso lingüístico (proyecto de acopio y distribución de materiales lingüísticos. Instituto Cervantes, España)", in La lengua española y los medios de comunicación. México: Siglo XXI Editores en coedición con la Secretaría de Educación Pública (México) y el Instituto Cervantes (España). Vol 2. p. 1331-1338. https://congresosdelalengua.es/zacatecas/paneles-ponencias/tecnologias/proyectos/vera.htm

Vann, R. E. (2009). Materials for the sociolinguistic description and corpus-based study of Spanish in Barcelona: Toward a documentation of colloquial Spanish in naturally occurring groups. Lewinston, NY: The Edwin Mellen Press.

Vila, M., González, S., Martí, M. A., Llisterri, J. i Machuca, M. J. (2010). ClInt: A bilingual Spanish-Catalan spoken corpus of clinical interviews. Procesamiento del Lenguaje Natural, 45, 105-111. Retrieved from http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/796

arrow_up

Aplicacions dels corpus orals

Aplicacions dels corpus orals: estudis fonètics i fonològics

✓ = Lectures recomanades

Altenberg, B. (1987) "Predicting text segmentation into tone units", en W. Meijs (Ed.), Corpus Linguistics and Beyond. Preceedings on English Language Research on Computerized Corpora. Amsterdam: Rodopi. p. 49-60; ; in Sampson, G., McCarthy, D. (Eds.) (2004) Corpus Linguistics: readings in a widening discipline. London - New York: Continuum International.

Campbell, N. (1990) "Measuring Speech-Rate in the Spoken English Corpus", in Aarts, J., Meijs, W. (Eds.) Theory and Practice in Corpus Linguistics. Amsterdam: Rodopi (Language i Computers, Studies in Practical Linguistics 4). p. 61-81.

Campbell, N. (1996) "Speech timing in the SEC", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 214-232.

Carrera, J. (1998) "Estudi del comportament dels segments /bl/, /gl/ i /r/", in Payrató, Ll. (Ed.) Oralment. Estudis de variació funcional. Barcelona: Publicacions de l’Abadia de Montserrat (Biblioteca Milà i Fontanals, 29). p. 57-74.

Castellanos, A., Benedí, J.-M., Casacuberta, F. (1996) "An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect", Speech Communication 20, 1-2: 23-36.

✓ Cole, J. i Hasegawa-Johnson, M. (2012). Corpus phonology with speech resources. A A. C. Cohn, C. Fougeron i M. K. Huffman (Ed.), The Oxford handbook of laboratory phonology (p. 431–440). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199575039.013.0017

Cuétara Priede, J.O. (2004) Fonética de la ciudad de México. Aportaciones desde las tecnologías del habla. Tesis para obtener el título de Maestro en Lingüística Hispánica. Maestría en Lingüística Hispánica, Posgrado en Lingüística, Universidad Nacional Autónoma de México.

✓ Delais-Roussarie, E. i Yoo, H.-Y. (2014). Corpus and research in phonetics and phonology: Methodological and formal considerations. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 193–213). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.009

Hidalgo Navarro, A. (1997) La entonación coloquial. Función demarcativa y unidades de habla. Cuadernos de Filología (Anejo XXI). Valencia: Departamento de Filología Española (Lengua Española), Facultat de Filologia, Universitat de València.

Keating, P.A., Blankenship, B., Byrd, D., Flemming, E., Todaka, Y. (1992) "Phonetic analysis of the TIMIT corpus of American English at UCLA", UCLA Working Papers in Phonetics 81: 1-16.

Keating, P.A., Byrd, D., Flemming, E., Todaka, Y (1994) "Phonetic analysis of word and segment variation using the TIMIT corpus of American English", Speech Communication 14, 1: 131-142.

Knowles, G. (1992) "Pitch contours and tones in the Lancaster/IBM spoken English corpus", in Leitner, G. (Ed) New Directions in English Language Corpora. Methodology, Results, Software Development. Berlin: Mouton de Gruyter. p. 289-300

Knowles, G. (1996) "From text structure to prosodic structure", in Knowles, G., Wichmann, A., AldersoN, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 146-167.

Knowles, G., Wichmann, A., Alderson, P. (Eds.) (1996) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman.

López Escobedo, F. (2004) El estudio de los diptongos del español de México para su aplicación en un reconocedor de habla. Tesis de Licenciatura en Lengua y Literaturas Hispánicas. Facultad de Filosofía y Letras, Universidad Nacional Autónoma de México.

Maddieson, I. (1991) "Testing the universality of phonological generalizations with a phoneticaly specified segment database: results and limitations", UCLA Working Papers in Phonetics 78: 11-25.

Martín Butragueño, P. (2003) "Hacia una descripción prosódica de los marcadores discursivos. Datos del español de México", in Martín Butragueño, P., Herrera Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). p. 375-402.

Mora, E., Pietrosemoli, L., Cavé, C., Obediente, E. i La Cruz, E. (2005). Un corpus de pares mínimos para el español de Venezuela. Lengua y Habla, 9, 117-121.

Mora, J.C. (1998) "L’elisió i la intrusió contextual en la llengua oral: una anàlisi fonètica del català", in PAYRATÓ, Ll. (Ed.) Oralment. Estudis de variació funcional. Barcelona: Publicacions de l’Abadia de Montserrat (Biblioteca Milà i Fontanals, 29). p. 75-90.

Ortiz Lira, H. (2003) "Los acentos tonales en un corpus de español de Santiago de Chile: su distribución y realización", in Martín Butragueño, P., Herrera Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). p. 303-318.

Pickering, B. (1996) "Distributional features of TSMs in the SEC", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 109.128.

Pršir, T., Goldman, J.-P. i Auchlin, A. (2013). Variation prosodique situationnelle: étude sur corpus de huit phonogenres en français. A P. Mertens i A. C. Simon (Ed.), Proceedings of the Prosody-Discourse Interface Conference 2013 (IDP-2013). Leuven, Belgium, September 11-13, 2013 (p. 107–111). https://www.arts.kuleuven.be/ling/cohistal/conference/idp2013/proceedings.html

Pršir, T., Goldman, J.-P. i Auchlin, A. (2014). Prosodic features of situational variation across nine speaking styles in French. Journal of Speech Sciences, 4(1), 41–60. https://doi.org/10.20396/joss.v4i1.15051

Rosado Robledo, L. (2003) El contacto dialectal: el caso de los inmigrantes yucatecos en la ciudad de México. Tesis de licenciatura. México: Universidad Nacional Autónoma de México.

Rudin, E., Elmer, W. (1993) "The ’Survey of English Dialects’ as a phonetic database for research in areal and variationist linguistics" in Fernández-Barrientos Martín, J. (Ed.) Jornadas Internacionales de Lingüística Aplicada/International Conference of Applied Linguistics. Robert J. Di Pietro in Memorian. Actas/Proceedings. Granada: Instituto de Ciencias de la Educación de la Universidad de Granada. Vol. 2 p. 666-673.

Samper, J.A. (1996) "El debilitamiento de /d/ en la norma culta de Las Palmas de Gran Canaria", in Arjona, M., López, J., Enríquez, A., López, G., Novella, M.A. (Eds.) Actas del X Congreso Internacional de la Asociación de Filología y Lingüística de la América Latina. Veracruz, México, 11-16 de abril de 1993. México: Universidad Autónoma Nacional de México. p. 791-796.

Samper, J.A., Troya, M. (2001) "Valores formánticos de la /e/ en sílaba abierta en la norma culta de Las Palmas de Gran Canaria", Estudios de Fonética Experimental (Universitat de Barcelona) 11: 41-66.

Stenström, A.-B. (1986) "A Study of Pauses as Demarcators in Discourse and Syntax", in Aarts, J., . Meijs, W. (Eds.) Corpus Linguistics II. New Studies in the Analysis and Exploitation of Computer Corpora. Amsterdam:Rodopi. p. 203-218.

Stenström, A.-B. (1988) "Adverbial Commas and Prosodic Segmentation", in Kytö, M., Ihalainen, M., Rissanen, M. (Eds.) Corpus Linguistics. Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi. p. 15-34.

Taylor, L. (1996) "The correlation between punctuation and tone group boundaries", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 129-145.

Wichmann, A. (1991) "A study of up-arrows in the Lancaster/IBM Spoken English Corpus", in Johansson, S., Stenström, A. (Eds) English Computer Corpora. Selected Papers and Research Guide. Berlin: Mouton de Gruyter. p. 165-178

Wichmann, A. (1996) "Prosodic style: a corpus-based approach", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 168-188.

Williams, B. (1996) "The status of corpora as linguistic data", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 3-19.

Aplicacions dels corpus orals: tecnologies de la parla

Atwell, E. (1996) "Machine learning from corpus resources for speech and handwriting recognition", in Thomas, J., Short, M. (Eds.) Using Corpora for Language Research. Studies in Honour of Geoffrey Leech. London: Longman. p. 151-166

Baker, J.M. (1993) "Dictation, Directories and Data Bases. Emerging PC Applications for Large Vocabulary Speech Recognition" in EUROSPEECH 1993. Proceedings of the 3rd European Conference on Speech Communication and Technology. 21 - 23 September, 1993. Berlin, Germany. Vol. 1 p. 3-12

Boulianne, G., Kenny, P., Lennig, M., O’Shaughnessy, D., Mermelstein, P. (1994) "Books on tape as training data for continuous speech recognition", Speech Communication 14, 1: 61-70.

Bertenstam, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., Nord, L., Serpa-Leitao, A., Ström, N. (1995) "The Waxholm Application DataBase", in EUROSPEECH 1995. Proceedings of the 4th European Conference on Speech Communication and Technology. 18 - 21 September, 1995. Madrid, Spain. Vol 1, p. 833-836.

Burger, S., Draxler, C. (1998) "Identifying Dialects of German from Digit Strings", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. II. p. 1053-1057. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.3384

Draxler, C. (Ed.) (2000) Proceedings of the Workshop on Very Large Telephone Speech Databases. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 29 May 2000. European Language Resources Association.

Draxler, C., van den Heuvel, H., Tropf, H. (1998) "SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I. p. 361-366.

Höge, H. (1998). Spoken language resources for voice driven man machine interfaces. A A. J. Rubio Ayuso, N. Gallardo, R. Castro i A. Tejada (Ed.), First International Conference on Language Resources and Evaluation: Proceedings. Granada, Spain, 28-30 May, 1998 (Vol. 1, p. 209–216). European Language Resources Association (ELRA).

Kenny, P., Boulianne, G., Garudadri, H., Trudelle, S., Hollan, R., Lenning, M., O’Shaughnessy, D. (1994) "Experiments in continuous speech recognition using books on tape", Speech Communication 14, 1: 49-60.

Lamel, L., Rosset, S., Bennacef, S., Bonneau-Maynard, H., Devillers, L., Gauvain, J.L. (1995) "Development of Spoken Language Corpora for Travel Information", in Eurospeech’95. Proceedings of the 4th European Conference on Speech Communication and Technology. Madrid, Spain, 18-21 September, 1995. Vol 3, p. 1961-1964.

Llisterri, J., Machuca, M. J., Mota, C., Riera, M. i Ríos, A. (2005). Corpus orales para el desarrollo de las tecnologías del habla en español. Oralia. Análisis del Discurso Oral, 8, 289-325. Retrieved from https://joaquimllisterri.cat/publicacions/Llisterri_Machuca_Mota_Riera_Rios_05_Corpus_Orales_Tecnologias_Habla_Espanol.pdf

Machuca, M. J. (2006) "Corpus para el desarrollo de sistemas de diálogo", in Llisterri, J., Machuca, M. J. (Eds.) Los sistemas de diálogo. Bellaterra - Soria: Universitat Autònoma de Barcelona, Servei de Publicacions - Fundación Duques de Soria (Manuals de la Universitat Autònoma de Barcelona, Lingüística, 45). p. 61-79.

Mariño, J.B., Padrell, J., Moreno, A., Nadeu, C. (2000) "Monolingual and bilingual Spanish-Catalan speech recognizers developed from SpeechDat databases", in Draxler, C. (Ed.) Proceedings of the Workshop on Very Large Telephone Speech Databases. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 29 May 2000. European Language Resources Association. p. 57-61.

Melin, H. (1999). Databases for speaker recognition: Activities in the COST250 Working Group. COST250 Workshop on speaker recognition in telephony. Rome, Italy, 10-12 November, 1999. https://www.speech.kth.se/ctt/publications/papers/cost250-00_wg2fr.pdf

Pickering, B. (1996) "Synthesising fundamental frequency contours: experimental results", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 203-213.

Pols, L. C. W. (1987) "Speech Technology and Corpus Linguistics", in W. Meijs (Ed.) Corpus Linguistics and Beyond. Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi.

Pols, L.C.W. (1990) "How useful are speech databases for rule synthesis development and assessment?", in ICSLP 1990. Proceedings of the 1st International Conference on Spoken Language Processing. 19 - 22 November, 1990. Kobe, Japan. Vol 2, p. 1289-1292.

Pols, L.C.W., van Santen, J.P.H., Abe, M., Kahn, D., Keller, R. (1998) "The use of large text corpora for evaluating text-to-speech systems", in LREC 1998. Proceedings of the First International Conference on Language Resources and Evaluation. 28 - 30 May 1998. Granada, Spain. Paris: ELRA, European Language Resources Association. Vol. I, p. 637-640; in Sampson, G., McCarthy, D. (Eds.) (2004) Corpus Linguistics: readings in a widening discipline. London - New York: Continuum International.

Riley, M., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A., McDonough, J., Nock, H., Saraclar, M., Wooters, C. i Zavaliagkos, G. (1999). Stochastic pronunciation modelling from hand-labelled phonetic corpora. Speech Communication, 29(2–4), 209–224. https://doi.org/10.1016/S0167-6393(99)00037-0

Williams, B., Alderson, P. (1996) "Synthesizing British English intonation", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 191-202.

Aplicacions dels corpus orals: anàlisi lingüística

Adolphs, S. (2010). Using a corpus to study spoken language. A S. Hunston i D. Oakey (Ed.), Introducing applied linguistics: Concepts and skills (p. 180–188). Routledge. https://doi.org/10.4324/9780203875728

Azorín, D., Martínez, M.A., Santamaría, M.I. (1999) "Léxico y creación léxica en un corpus oral de lenguaje juvenil", in Fernández, J., Fernández, C., Marcos, M., Prieto, E., Santos, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 1, p. 217-228.

Bendazzoli, C., Monti, C., Sandrelli, A., Russo, M., Baroni, M., Bernardini, S., Mack, G., Ballardini, E., Mead, P. (2004) "Towards the creation of an electronic corpus to study directionality in simultaneous interpreting", in Compiling and Processing Spoken Language Corpora. LREC 2004, International Conference on Language Resources and Evaluation. 24th May 2004. Lisboa, Portugal. p. 33-39.

Bentivoglio, P., Sedano, M. (1993) "Investigación sociolingüística: sus métodos aplicados a una experiencia venezolana", Boletín de Lingüística 8: 3-35.

Berglund, Y. (1999) "Exploiting a large spoken corpus: An end-user’s way to the BNC", International Journal of Corpus Linguistics 4,1: 29-52.

Biber, D., Johansson, S., Leech, G., Conrad, S., Finegan, E. (1999) Longman Grammar of Spoken and Written English. London: Pearson Education.

Blanche-Benveniste, C (1997) Approches de la langue parlée en français. Paris: Ophrys (Collection L’Essentiel Français)

Blanche-Benveniste, C., Bilger, M., Rouget, Ch., van den Eynde, K. (1991) Le français parlé. Etudes grammaticales. Paris: Editions du Centre National de la Recherche Scientifique (Sciences du Langage)

Bortolini, U. (1997) "L’uso del sistema CHILDES nell’analisi fonologica del linguaggio infantile", in Bortolini, U., Puzzuto, E. (Eds.) Il Progetto CHILDES-Italia. Contributi di ricerca sulla lingua italiana. Tirrenia: Edizioni del Cerro. p. 13.42; in Quaderni del Centro di Studio per le Ricerche di Fonetica 16 (1997): 3-34.

Carter, R., McCarthy, M. (1997) Exploring Spoken English. Cambridge: Cambridge University Press.

Garcés Gómez, M. P. (1994) "Elementos de cohesión en el español hablado: ’pues’", in Alvar Ezquerra, M., Villena Ponsoda, J.A. (Coord.) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7).

Garcés Gómez, M. P. (1994) "Funciones y valores de ’entonces’ en el español hablado", in Alvar Ezquerra, M., Villena Ponsoda, J.A. (Coord.) Estudios para un corpus del español. Málaga: Universidad de Málaga (Analecta Malacitana, Anejo 7).

González Salgado, J.A. (2005) "Los corpus sonoros en la investigación de la lengua hablada", CLAC, Círculo de Lingüística Aplicada a la Comunicación 24. http://webs.ucm.es/info/circulo/no24/gsalgado.htm

Hidalgo Navarro, A. (1997) La entonación coloquial. Función demarcativa y unidades de habla. Cuadernos de Filología (Anejo XXI). Valencia: Departamento de Filología Española (Lengua Española), Facultat de Filologia, Universitat de València.

Hidalgo Navarro, A. (1998) "Alternancia de turnos y conversación. Sobre el papel regulador de los segmentos en el habla simultánea", Lingüística Española Actual 22, 2: 217-138.

Hidalgo Navarro, A. (1998) "Expresividad y función pragmática de la entonación en la conversación coloquial", Oralia. Análisis del discurso oral 1: 69-92.

Hidalgo Navarro, A. (2001) "Entonación y conversación: sucesión de turnos y superposiciones de habla", in de Bustos, J. J., Charadeau, P., Girón, J.L., Iglesias, S., López Alonso, C. (coord.) Lengua, discurso texto. I Simposio Internacional de Análisis del Discurso. Madrid: Visor. p. 1597-1609.

Hidalgo Navarro, A. (2001) "Modalidad oracional y entonación. Notas sobre el funcionamiento pragmático de los rasgos suprasegmentales en la conversación", Moenia. Revista Lucense de Lingüística i Literatura 7: 271-292.

Hidalgo Navarro, A. (2003) "Microestructura discursiva y segmentación informativa en la conversación coloquial", ELUA, Estudios de Lingüística Aplicada, Universidad de Alicante 17: 367-385.

Jiménez Ruiz, J.L. (1999) "Campo de realización de la preposición "hasta" en el Corpus de la Variedad Juvenil Universitaria Alicantina", in Fernández, J., Fernández, C., MarcoS, M., Prieto, E., SANTOS, L. (Eds.) Lingüística para el siglo XXI. III Congreso de Lingüística General (CLG3). Salamanca: Ediciones de la Universidad de Salamanca (Aquilafuente, 9). vol 2, p. 963-972.

Lorenzo Suárez, A.M., Gómez guinovart, J. (1996) "Aspectos de análise lingüístico-cuantitativa automática do galego oral", in Gómez Guinovart, J., Lorenzo Suárez, A. (Eds,) Lingüística e informática. Santiago de Compostela: Tórculo Edicións. p. 57-86.

Martín Butragueño, P. (2003) "Hacia una descripción prosódica de los marcadores discursivos. Datos del español de México", in Martín Butragueño, P., Herrera Z. E. (Eds.) La tonía. Dimensiones fonéticas y fonológicas. México: El Colegio de México, Centro de Estudios Lingüísticos y Literarios (Cátedra Jaime Torres Bodet, Estudios de Lingüística 4). p. 375-402.

McCarthy, M. (1999) Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press.

Rudin, E., Elmer, W. (1993) "The ’Survey of English Dialects’ as a phonetic database for research in areal and variationist linguistics" in Fernández-Barrientos Martín, J. (Ed) Jornadas Internacionales de Lingüística Aplicada/International Conference of Applied Linguistics. Robert J. Di Pietro in Memorian. Actas/Proceedings. Granada: Instituto de Ciencias de la Educación de la Universidad de Granada. vol. 2 pp 666-673

Stenström, A.-B., Svartvik, J. (1994) "Imparsable speech: Repeats and other nonfluencies in spoken English", in Oostdijk, N., de Haan, P. (Eds) Corpus-based Research into Language. Amsterdam: Rodopi. p. 241-254

Williams, B. (1996) "The status of corpora as linguistic data", in Knowles, G., Wichmann, A., Alderson, P. (Eds.) Working with Speech: Perspectives on research into the Lancaster/IBM Spoken English Corpus. London i New York: Longman. p. 3-19.

Aplicacions dels corpus orals: adquisició de segones llengües

Fonètica i adquisició de segones llengües: corpus i estudis basats en corpus

Aplicacions dels corpus orals: adquisició de la primera llengua

Rose, Y. i MacWhinney, B. (2014). The PhonBank project: Data and software-assisted methods for the study of phonology and phonological development. A J. Durand, U. Gut i G. Kristoffersen (Ed.), The Oxford handbook of corpus phonology (p. 308–401). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199571932.013.023

Rose, Y., MacWhinney, B., Byrne, R., Hedlund, G., Maddocks, K., O’Brien, P. i Wareham, T. (2006). In- troducing Phon: A software solution for the study of phonological acquisition. A D. Bamman, T. Magnitskaia i C. Zaller (Ed.), Proceedings of the 30 Annual Boston University Conference on Language Development (p. 489–500). Cascadilla Press. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4769870/

Rose, Y. i Stoel-Gammon, C. (2015). Using PhonBank and Phon in studies of phonological development and disorders. Clinical Linguistics & Phonetics, 29(8–10), 686–700. https://doi.org/10.3109/02699206.2015.1041609

Aplicacions dels corpus orals: fonètica judicial

Corpus per a la recerca en fonètica judicial

Aplicacions dels corpus orals: fonètica clínica

Corpus de parla amb alteracions

Aplicacions dels corpus orals: documentació de llengües minoritzades

Graaf, T. de (2002) "The use of archives and fieldwork for the study of the endangered languages of Russia", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. p. 29-1 - 29-4.

I-Wen Su, L. (2002) "Documentation of Formosan languages", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. p. 32-1 - 32-8.

Jacobson, M. (2004) "Corpus oraux en linguistique de terrain", in Véronis, J. (Ed.) Le traitement automatique des corpus oraux, Traitement automatique des langues 45, 2: 63-88.

Kokkinakis, G., Coutsogeorgopoulos, H., Dermatas, H., Kaitsas, G. (2000) "Electronic dictionary of pronunciation and usage of the Graecanic dialect of Southern Italy", in Ó Cróinín, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. p. 30-40.

Levin, L., Vega, R., Carbonell, J., Brown, R., Lavie, A., Cañulef, E., Huenchullan, C. (2002) "Data collection and language technologies for Mapudungun", in Proceedings of the International LREC Workshop on Resources and Tools in Field Linguistics. LREC 2002.Third International Conference on Language Resources and Evaluation. Las Palmas, 26-27 May 2002. p. 18-1 - 18-4.

Ljublinskaja, M., Sherstinova, T., Kuznetsova, E. (2000) "Digital sounded lexicon of Nenets", in Ó Cróinín, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. p. 71-74.

Mercier, G., Siroux, J., Favereau, F., Louis, F. (2000) "Courseware based on speech technology for Breton language pronunciation learning: Speech data bases and bilingual spoken dictionary", in Ó Cróinín, D. (Ed.) Proceedings of the Workshop on Developing Language Resources for Minority Languages: Reusability and Strategic Priorities. LREC 2000, Second International Conference on Language Resources and Evaluation. Athens, Greece, 30 May 2000. European Language Resources Association. p. 11-18.

arrow_up

Lingüística de corpus i corpus escrits

Corpus orals i corpus de llengua oral


Corpus orals i corpus de llengua oral – Bibliografia
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona

La pàgina va ser modificada per darrera vegada el