publications

2025

  1. BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
    Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, and Leshem Choshen
    2025

2024

  1. BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
    Nikitas Theodoropoulos, Giorgos Filandrianos, Vassilis Lyberatos, Maria Lymperaiou, and Giorgos Stamou
    In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, Nov 2024

2022

  1. From {Solution} Synthesis to {Student Attempt} Synthesis for Block-Based Visual Programming Tasks
    Adish Singla, and Nikitas Theodoropoulos
    In Proceedings of the 15th International Conference on Educational Data Mining (EDM), Jul 2022