Nikitas Theodoropoulos

Athens, Greece

me.png

I like to do research on machine learning, NLP, and cognitive science.
Currently searching/applying for PhD positions.

You can email me at nikitastheodorop@gmail.com
See other ways to connect bellow.
You can find my CV here .

Research Interests

Some of the topics I am interested in are:

  • Computational linguistics
  • Language acquisition
  • Interpretability
  • Mutlilinguality and low-resource languages
  • Mutlimodality, and language grounding
  • Multi-agent interaction and communication
  • Language evolution
  • Symbolic and neuro-symbolic methods
Sort CV

I graduated with a BSc & MSc in Electrical and Computer Engineering from the National Technical University of Athens (NTUA). I completed my thesis at the Artificial Intelligence and Learning Systems lab (AILS) supervised by Prof. Giorgos Stamou. In our research we investigated language modeling with human-like data constraints of at most 100 million words, as part of the 2024 BabyLM Challenge.

During my studies, I interned with the Machine Teaching group at MPI-SWS where I was advised by Prof. Adish Singla. We worked on AI for progamming education and specifically block-based visual programming (e.g., Scratch). Our work focused on modeling and predicting student behavior. We designed a benchmark for synthesizing a student’s attempt at an unkown task, given the student’s attempt at a known reference task.

In the past, I also worked with Prof. Alexandros Potamianos at the Speech and Language group at NTUA. Our research topic was learning brain-derived (fMRI) word representations and applying them in NLP tasks.

More about me

In my free time I enjoy indoors climbing 🧗, playing the piano 🎹, and reading 📚.
I also like to learn new and exciting programming languages, and I am interested in open-source, decentralized, and self-hosted software.

news

Oct 15, 2025 We release BabyBabelLM: A multilingual benchmark of developmentally plausible training data for 45 languages! Find more here: babylm.github.io/babybabellm

publications

  1. BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
    Jaap Jumelet, Abdellah Fourtassi, Akari Haga, Bastian Bunzeck, Bhargav Shandilya, Diana Galvan-Sosa, Faiz Ghifari Haznitrama, Francesca Padovani, Francois Meyer, Hai Hu, Julen Etxaniz, Laurent Prévot, Linyang He, María Grandury, Mila Marcheva, Negar Foroutan, Nikitas Theodoropoulos, Pouya Sadeghi, Siyuan Song, Suchir Salhan, Susana Zhou, Yurii Paniv, Ziyin Zhang, Arianna Bisazza, Alex Warstadt, and Leshem Choshen
    2025
  2. BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
    Nikitas Theodoropoulos, Giorgos Filandrianos, Vassilis Lyberatos, Maria Lymperaiou, and Giorgos Stamou
    In The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning, Nov 2024
  3. From {Solution} Synthesis to {Student Attempt} Synthesis for Block-Based Visual Programming Tasks
    Adish Singla, and Nikitas Theodoropoulos
    In Proceedings of the 15th International Conference on Educational Data Mining (EDM), Jul 2022