LLM Evaluation & AI Safety
Designing evaluation pipelines for LLM robustness, moral-capability analysis and failure-mode discovery. Experience with synthetic data generation, custom metrics, adversarial prompting, value-alignment tests and computational experiment execution on shared infrastructure.
Efficient LLM Serving
Building low-latency LLM serving pipelines with vLLM, quantisation and open-weight models for multi-user research environments and real-time HRI experiments.
Robotics & Human-Robot Interaction
Integrating LLM-side interaction modules into ROS2-based socially assistive robotics systems, with structured dialogue, ASR/TTS coordination and real-time constraints for lab-based HRI studies.
ML Systems & HPC
Training and evaluating transformer-based and BERT-style models using shared HPC, Slurm/HTCondor workflows and multi-GPU servers, with reproducible experiment configuration and analysis pipelines.
Reinforcement Learning & Optimisation
Research-level exposure to reinforcement learning, including from-scratch PPO and no-critic GRPO implementations for continuous-control experiments, plus coursework honours in Reinforcement Learning.
Production Software & Data Platforms
Designing production Python/TypeScript microservices, APIs, CI/CD pipelines, Kubernetes/MicroK8s deployments and Kafka/NiFi data-processing systems for real-time monitoring platforms.
Professional Profile
I am a research engineer and ML systems builder working across LLM evaluation, efficient model serving, AI safety, robotics/HRI and production data platforms. I enjoy problems where scientific uncertainty meets engineering constraints: systems that need rigorous modelling, careful evaluation and reliable software.
At CSIC-IIIA, I design evaluation workflows for moral capabilities, robustness and model behaviour in text classifiers and LLMs. My work includes synthetic data generation, custom metric design, BERT-style classifier fine-tuning, open-weight LLM evaluation and computational experiments on shared HPC and multi-GPU infrastructure.
I also build efficient LLM-serving pipelines for real-time human-robot interaction, using vLLM, quantisation and structured dialogue components integrated with ROS2-based robotic systems. The goal is practical AI behaviour under real interaction constraints, not just offline benchmark performance.
Before AI research, I worked full-time as a software engineer at Axión, building production microservices, on-prem Kubernetes/MicroK8s infrastructure, GitLab CI/CD, Kafka/NiFi ETL pipelines and real-time monitoring systems for public-service and private infrastructure.
At Oxford, my current work focuses on game-theoretic results for learning stability and equilibrium in multi-agent systems, with applications to multi-agent coordination and learning dynamics.
LLM Evaluation
vLLM Serving
ROS2 HRI
HPC Experiments
Kubernetes/Data Platforms