


default search action
25th NoDaLiDa / 11th Baltic-HLT 2025: Tallinn, Estonia
- Richard Johansson, Sara Stymne:
Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies, NoDaLiDa/Baltic-HLT 2025, Tallinn, Estonia, March 3-4, 2025. University of Tartu Library 2025, ISBN 978-9908-53-109-0 - Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).
- Ali Al-Laith, Alexander Conroy, Kirstine Nielsen Degn, Jens Bjerring-Hansen, Daniel Hershcovich:
Annotating and Classifying Direct Speech in Historical Danish and Norwegian Literary Texts. 1-7 - Diego Alves:
Diachronic Analysis of Phrasal Verbs in English Scientific Writing. 8-16 - Elsa Andersson, Johan Falkenjack, Arne Jönsson:
Applying and Optimising a Multi-Scale Probit Model for Cross-Source Text Complexity Classification and Ranking in Swedish. 17-27 - Bjarki Ármannsson, Hinrik Hafsteinsson, Jóhannes B. Sigtryggsson, Atli Jasonarson, Einar Freyr Sigurðsson, Steinþór Steingrímsson:
Playing by the Rules: A Benchmark Set for Standardized Icelandic Orthography. 28-36 - Bjarki Ármannsson, Finnur Ágúst Ingimundarson, Einar Freyr Sigurðsson:
An Icelandic Linguistic Benchmark for Large Language Models. 37-47 - Maria Berger:
Transfer-Learning German Metaphors Inspired by Second Language Acquisition. 48-54 - Matthieu Pierre Boyer, Mathieu Dehouck:
Comparative Concepts or Descriptive Categories: a UD Case study. 55-65 - Noel Chia, Ines Rehbein, Simone Paolo Ponzetto:
Investigating the effectiveness of Data Augmentation and Contrastive Learning for Named Entity Recognition. 66-79 - Sander Bijl de Vroe, George Stampoulidis, Kai Hakala, Aku Rouhe, Mark van Heeswijk, Jussi Karlgren:
Comparing Human and Machine Translations of Generative Language Model Evaluation Datasets. 80-85 - Aleksei Dorkin, Kairit Sirts:
GliLem: Leveraging GliNER for Contextualized Lemmatization in Estonian. 86-97 - Tita Enstad, Trond Trosterud, Marie Iversdatter Røsok, Yngvil Beyer, Marie Roald:
Comparative analysis of optical character recognition methods for Sámi texts from the National Library of Norway. 98-108 - Naome A. Etori, Arturs Kanepajs, Kevin Lu, Randu Karisa:
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama. 109-120 - Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares:
Better Benchmarking LLMs for Zero-Shot Dependency Parsing. 121-135 - Artem Fedorchenko, Tanel Alumäe:
Optimizing Estonian TV Subtitles with Semi-supervised Learning and LLMs. 136-141 - Pascale Feldkamp, Márton Kardos, Kristoffer L. Nielbo, Yuri Bizzoni:
Modeling Multilayered Complexity in Literary Texts. 142-158 - Lea Fischbach, Caroline Kleen, Lucie Flek, Alfred Lameli:
Does Preprocessing Matter? An Analysis of Acoustic Feature Importance in Deep Learning for Dialect Classification. 159-169 - Emilie Francis:
Language of the Swedish Manosphere with Swedish FrameNet. 170-180 - Steinunn Rut Friðriksdóttir, Dan Saattrup Nielsen, Hafsteinn Einarsson:
Hotter and Colder: A New Approach to Annotating Sentiment, Emotions, and Bias in Icelandic Blog Comments. 181-191 - Yaroslav Getman, Tamás Grósz, Katri Hiovain-Asikainen, Tommi Lehtonen, Mikko Kurimo:
Towards large-scale speech foundation models for a low-resource minority language. 192-200 - Ona de Gibert, Tommi Nieminen, Yves Scherrer, Jörg Tiedemann:
OpusDistillery: A Configurable End-to-End Pipeline for Systematic Multilingual Distillation of Open NMT Models. 201-208 - Ona de Gibert, Dayyán O'Brien, Dusan Varis, Jörg Tiedemann:
Mind the Gap: Diverse NMT Models for Resource-Constrained Environments. 209-216 - Isidora Glisic, Caitlin Richter, Anton Karl Ingason:
Testing relevant linguistic features in automatic CEFR skill level classification for Icelandic. 217-222 - Rob van der Goot, Anette Jensen, Emil Allerslev Schledermann, Mikkel Wildner Kildeberg, Nicolaj Larsen, Mike Zhang, Elisa Bassignana:
MorSeD: Morphological Segmentation of Danish and its Effect on Language Modeling. 223-229 - Emil Häglund, Johanna Björklund:
Opinion Units: Concise and Contextualized Representations for Aspect-Based Sentiment Analysis. 230-240 - Þórir Hrafn Harðarson, Hrafn Loftsson, Stefán Ólafsson:
Aligning Language Models for Icelandic Legal Text Summarization. 241-251 - Johannes Heinecke, Maria Boritchev, Frédéric Herledan:
Question-parsing with Abstract Meaning Representation enhanced by adding small datasets. 252-257 - Erik Henriksson, Otto Tarkka, Filip Ginter:
FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering. 258-268 - Tollef Emil Jørgensen, Jens Breitung:
Margins in Contrastive Learning: Evaluating Multi-task Retrieval for Sentence Embeddings. 269-278 - Andra Kalnaca, Tatjana Pakalne, Kristine Levane-Petrova:
Database of Latvian Morphemes and Derivational Models: ideas and expected results. 279-286 - Jurgita Kapociute-Dzikiene, Toms Bergmanis, Marcis Pinnis:
Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States. 287-295 - Elisabeth Kaukonen, Ahmed Sabir, Rajesh Sharma:
How Aunt-Like Are You? Exploring Gender Bias in the Genderless Estonian Language: A Case Study. 296-301 - Indrek Kiissel, Liisi Piits, Heete Sahkai, Indrek Hein, Liis Ermus, Meelis Mihkla:
Estonian isolated-word text-to-speech synthesiser. 302-306 - Kätriin Kukk, Danila Petrelli, Judit Casademont, Eric J. W. Orlowski, Michal Dzielinski, Maria Jacobson:
BiaSWE: An Expert Annotated Dataset for Misogyny Detection in Swedish. 307-312 - Maria Kunilovskaya, Iuliia Zaitova, Wei Xue, Irina Stenger, Tania Avgustinova:
Predictability of Microsyntactic Units across Slavic Languages: A translation-based Study. 313-322 - Jenny Kunz:
Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT. 323-330 - Murathan Kurfali, Shorouq Zahra, Evangelia Gogoulou, Luise Dürlich, Fredrik Carlsson, Joakim Nivre:
SweSAT-1.0: The Swedish University Entrance Exam as a Benchmark for Large Language Models. 331-339 - Hele-Andra Kuulmets, Taido Purason, Mark Fishel:
How Well do LLMs know Finno-Ugric Languages? A Systematic Assessment. 340-353 - Dávid í Lág, Barbara Scalvini, Jon Gudnason:
Mapping Faroese in the Multilingual Representation Space: Insights for ASR Model Optimization. 354-358 - Ilze Lokmane, Mikus Grasmanis, Agute Klints, Gunta Nespore-Berzkalne, Peteris Paikens, Lauma Pretkalnina, Laura Rituma, Madara Stade, Evelina Taurina:
Towards a Derivational Semantics Resource for Latvian. 359-366 - Risto Luukkonen, Jonathan Burdge, Elaine Zosa, Aarne Talman, Ville Komulainen, Väinö Hatanpää, Peter Sarlin, Sampo Pyysalo:
Poro 34B and the Blessing of Multilinguality. 367-382 - Giacomo Magnifico, Eduard Barbu:
Can summarization approximate simplification? A gold standard comparison. 383-389 - Johanna Männistö, Joseph Attieh, Jörg Tiedemann:
A Comparative Study of PEFT Methods for Python Code Generation. 390-396 - Vladislav Mikhailov, Petter Mæhlum, Victoria Ovedie Chruickshank Langø, Erik Velldal, Lilja Øvrelid:
A Collection of Question Answering Datasets for Norwegian. 397-407 - Tommi Nieminen, Jörg Tiedemann, Sami Virpioja:
Incorporating Target Fuzzy Matches into Neural Fuzzy Repair. 408-418 - Joakim Nivre:
Constructions and Strategies in Universal Dependencies. 419-423 - Emil Nuutinen, Iiro Rastas, Filip Ginter:
Finnish SQuAD: A Simple Approach to Machine Translation of Span Annotations. 424-432 - Romina Oji, Jenny Kunz:
How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters. 433-439 - Phoebe Parsons, Knut Kvale, Torbjørn Svendsen, Giampiero Salvi:
Match 'em: Multi-Tiered Alignment for Error Analysis in ASR. 440-447 - Phoebe Parsons, Per Erik Solberg, Knut Kvale, Torbjørn Svendsen, Giampiero Salvi:
Adding Metadata to Existing Parliamentary Speech Corpus. 448-457 - Dmytro Pashchenko, Lisa Yankovskaya, Mark Fishel:
Paragraph-Level Machine Translation for Low-Resource Finno-Ugric Languages. 458-469 - Bolette S. Pedersen, Nathalie Carmen Hau Sørensen, Sanni Nimb, Dorte Haltrup Hansen, Sussi Olsen, Ali Al-Laith:
Evaluating LLM-Generated Explanations of Metaphors - A Culture-Sensitive Study of Danish. 470-479 - Esther Ploeger, Paola Saucedo, Johannes Bjerva, Ross Deans Kristensen-McLachlan, Heather C. Lent:
Tokenization on Trial: The Case of Kalaallisut-Danish Legal Machine Translation. 480-491 - Wessel Poelman, Miryam de Lhoneux:
The Roles of English in Evaluating Multilingual Language Models. 492-498 - Andrei Politov, Oleh Shkalikov, René Jäkel, Michael Färber:
Revisiting Projection-based Data Transfer for Cross-Lingual Named Entity Recognition in Low-Resource Languages. 499-507 - Cristina Reguera-Gómez, Denis Paperno, Maaike H. T. de Boer:
Empathy vs Neutrality: Designing and Evaluating a Natural Chatbot for the Healthcare Domain. 508-517 - Caitlin Richter, Kolbrún Friðriksdóttir, Kormákur Logi Bergsson, Erik Anders Maher, Ragnheiður María Benediktsdóttir, Jon Gudnason:
Assessed and Annotated Vowel Lengths in Spoken Icelandic Sentences for L1 and L2 Speakers: A Resource for Pronunciation Training. 518-524 - Mike Riess, Tollef Emil Jørgensen:
The BRAGE Benchmark: Evaluating Zero-shot Learning Capabilities of Large Language Models for Norwegian Customer Service Dialogues. 525-536 - Egil Rønningstad, Lilja Charlotte Storset, Petter Mæhlum, Lilja Øvrelid, Erik Velldal:
Mixed Feelings: Cross-Domain Sentiment Classification of Patient Feedback. 537-543 - Javier de la Rosa, Vladislav Mikhailov, Lemei Zhang, Freddy Wetjen, David Samuel, Peng Liu, Rolv-Arild Braaten, Petter Mæhlum, Magnus Breder Birkenes, Andrey Kutuzov, Tita Ranveig Enstad, Hans Christian Farsethås, Svein Arne Brygfjeld, Jon Atle Gulla, Stephan Oepen, Erik Velldal, Wilfred Østgulen, Lilja Øvrelid, Aslak Sira Myhre:
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective. 544-560 - Dan Saattrup Nielsen, Kenneth C. Enevoldsen, Peter Schneider-Kamp:
Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks. 561-572 - David Samuel, Vladislav Mikhailov, Erik Velldal, Lilja Øvrelid, Lucas Georges Gabriel Charpentier, Andrey Kutuzov, Stephan Oepen:
Small Languages, Big Models: A Study of Continual Training on Languages of Norway. 573-608 - Barbara Scalvini, Iben Nyholm Debess, Annika Simonsen, Hafsteinn Einarsson:
Rethinking Low-Resource MT: The Surprising Effectiveness of Fine-Tuned Multilingual Models in the LLM Age. 609-621 - Barbara Scalvini, Annika Simonsen, Iben Nyholm Debess, Hafsteinn Einarsson:
Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell. 622-633 - Yves Scherrer, Olli Kuparinen:
Interactive maps for corpus-based dialectology. 634-638 - Carolin M. Schuster, Maria-Alexandra Roman, Shashwat Ghatiwala, Georg Groh:
Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings. 639-650 - Rishabh Shastry, Patricia Chiril, Joshua Charney, David Uminsky:
Entailment Progressions: A Robust Approach to Evaluating Reasoning Within Larger Discourse. 651-660 - Karen de Souza, Alexandre Nikolaev, Maarit Koponen:
Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content. 661-679 - Steinþór Steingrímsson, Einar Freyr Sigurðsson, Atli Jasonarson:
MC-19: A Corpus of 19th Century Icelandic Texts. 680-687 - Mathias Stenlund, Hemanadhan Myneni, Morris Riedel:
Surface-Level Morphological Segmentation of Low-resource Inuktitut Using Pre-trained Large Language Models. 688-696 - Maria Irena Szawerna, Simon Dobnik, Ricardo Muñoz Sánchez, Elena Volodina:
The Devil's in the Details: the Detailedness of Classes Influences Personal Information Detection and Labeling. 697-708 - Christina Tånnander, Jens Edlund:
Braxen 1.0. 709-713 - Sofia Elena Terenziani:
Temporal Relation Classification: An XAI Perspective. 714-728 - Samia Touileb, Vladislav Mikhailov, Marie Ingeborg Kroka, Lilja Øvrelid, Erik Velldal:
Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles. 729-738 - Jesper Vaaben Bornerup, Christian Hardmeier:
Efficient Elicitation of Fictitious Nursing Notes from Volunteer Healthcare Professionals. 739-754 - Teemu Vahtola, Songbo Hu, Mathias Creutz, Ivan Vulic, Anna Korhonen, Jörg Tiedemann:
Analyzing the Effect of Linguistic Instructions on Paraphrase Generation. 755-766 - Thomas Vakili, Martin Hansson, Aron Henriksson:
SweClinEval: A Benchmark for Swedish Clinical Natural Language Processing. 767-775 - Socrates Vakirtzian, Vivian Stamou, Yannis Kazos, Stella Markantonatou:
Dialectal treebanks and their relation with the standard variety: The case of East Cretan and Standard Modern Greek. 776-784 - Søren Vejlgaard Holm, Lars Kai Hansen, Martin Carsten Nielsen:
Danoliteracy of Generative Large Language Models. 785-800 - Huiling You, Samia Touileb, Erik Velldal, Lilja Øvrelid:
NorEventGen: generative event extraction from Norwegian news. 801-811 - Mike Zhang, Max Müller-Eberstein, Elisa Bassignana, Rob van der Goot:
SnakModel: Lessons Learned from Training an Open Danish Large Language Model. 812-825 - Elaine Zosa, Ville Komulainen, Sampo Pyysalo:
Got Compute, but No Data: Lessons From Post-training a Finnish LLM. 826-832

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.