Nikitas Theodoropoulos

Athens, Greece

me.png

I like to do research in machine learning, NLP, and cognitive science.
Currently applying to PhD positions.

You can email me at nikitastheodorop@gmail.com
See other ways to connect below.
You can find my CV here .

Research Interests

Through my research, I want to understand fundamental aspects of intelligence, both in machines and humans, with an emphasis on language.
Some of the topics I am interested in are:

  • Computational models of human language acquisition and processing
  • Human-like models, that are computation and sample efficient
  • Multi-agent interaction and communication
  • Embodied agents and language grounding
  • Multilinguality and low-resource languages
  • Interpretability and the scientific study of language models
  • Language emergence and evolution
Short CV

I graduated with a BSc & MSc in Electrical and Computer Engineering from the National Technical University of Athens (NTUA). I completed my thesis at the Artificial Intelligence and Learning Systems Lab (AILS), supervised by Prof. Giorgos Stamou. In our research, we investigated language modeling with human-like data constraints of at most 100 million words, as part of the 2024 BabyLM Challenge.

During my studies, I interned with the Machine Teaching group at MPI-SWS, where I was advised by Prof. Adish Singla. We worked on AI for programming education focusing on block-based visual programming (e.g., Scratch). Our work focused on modeling and predicting student behavior. We designed a benchmark for synthesizing a student’s attempt at an unknown task, given the student’s attempt at a known reference task.

In the past, I also worked with Prof. Alexandros Potamianos at the Speech and Language group at NTUA. Our research topic was learning brain-derived (fMRI) word representations and applying them to NLP tasks.

More about me

In my free time, I enjoy indoor climbing 🧗, playing the piano 🎹, and reading 📚.

I also like to learn new and exciting programming languages, and I am interested in open-source, decentralized, and self-hosted software.

news

Oct 15, 2025 We released BabyBabelLM: a multilingual benchmark of developmentally plausible training data for 45 languages! Find more here: babylm.github.io/babybabellm

publications

  1. BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
    Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, and Leshem Choshen
    2025
  2. BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
    Nikitas Theodoropoulos, Giorgos Filandrianos, Vassilis Lyberatos, Maria Lymperaiou, and Giorgos Stamou
    In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, Nov 2024
  3. From {Solution} Synthesis to {Student Attempt} Synthesis for Block-Based Visual Programming Tasks
    Adish Singla, and Nikitas Theodoropoulos
    In Proceedings of the 15th International Conference on Educational Data Mining (EDM), Jul 2022