Software Engineer, NLP Enthusiast

About Me

I am an AI Engineer at Defence Science and Technology Agency (DSTA) working on various fields in Machine Learning including Natural Language Processing (NLP, Automatic Speech Recognition (ASR) and Computer Vision (CV). On the side, I engage in research relating to NLP where my research interest include Low-resource Languages, Language Representation and Biases in LLMs.

Research Publication

Disentangling singlish discourse particles with task-driven representation

Abstract: Singlish, or formally Colloquial Singapore English, is an English-based creole language originating from the SouthEast Asian country Singapore. The language contains influences from Sinitic languages such as Chinese dialects, Malay, Tamil and so forth. A fundamental task to understanding Singlish is to first understand the pragmatic functions of its discourse particles, upon which Singlish relies heavily to convey meaning. This work offers a preliminary effort to disentangle the Singlish discourse particles (lah, meh and hor) with task-driven representation learning. After disentanglement, we cluster these discourse particles to differentiate their pragmatic functions, and perform Singlish-to-English machine translation. Our work provides a computational method to understanding Singlish discourse particles, and opens avenues towards a deeper comprehension of the language and its usage.

Full paper available here
Stylistic Evolution and LLM Neutrality in Singlish Language

Abstract: Singlish is a creole rooted in Singapore’s multilingual environment and continues to evolve alongside social and technological change. This study investigates the evolution of Singlish over a decade of informal digital text messages. We propose a stylistic similarity framework that compares lexico-structural, pragmatic, psycholinguistic, and encoder-derived features across years to quantify temporal variation. Our analysis reveals notable diachronic changes in tone, expressivity and sentence construction over the years. Conversely, while some LLMs were able to generate superficially realistic Singlish messages, they do not produce temporally neutral outputs, and residual temporal signals remain detectable despite prompting and fine-tuning. Our findings highlight the dynamic evolution of Singlish, as well as the capabilities and limitations of current LLMs in modeling sociolectal and temporal variations in the colloquial language.

Preprint available here

Experience

Defence Science and Technology Agency (DSTA)

Software Engineer

Sept 2024 - Present

Developing AI models in speech and computer vision, with a focus on low-resource languages and environments.
Building backend systems, leveraging technologies such as Kafka for real-time data streaming and processing.
Working at the intersection of AI and infrastructure to create scalable, efficient, and high-performance solutions.
Developing responsive, user-friendly user-interface for web applications.

Elettronica Group

AI Engineer (Intern)

June 2023 ‑ Aug 2023

Selected as part of a Global Internship Program that allows students to work in relevant companies worldwide.
Developed methodologies to improve on both visible and infrared drone‑detection technologies using various Computer Vision Models.

Defence Science and Technology Agency (DSTA)

AI Engineer (Intern)

May 2022 ‑ Aug 2022

Worked with generative LLMs (such as GPT-models) to perform translation of low-resource languages.

Education

University of Edinburgh

MSc in Speech and Language Processing

2023-2024

Awarded Grade: Distinction
Dissertation Topic: Towards Representational Learning of Pragmatic Functions of Singlish Discourse Particles Using LLMs
Relevant Classes:
- Accelerated Natural Language Processing
- Speech Processing
- Natural Language Understanding, Generation, and Machine Translation
- Automatic Speech Recognition
- Reinforcement Learning

McGill University

Bsc in Computer Science, Minor in Mathematics

2020-2023

GPA 3.93/4.0
Relevant Classes:
- Applied Machine Learning
- Computational Linguistics
- Probabilistic Programming
- NLP and Data Science

Projects

Happify.AI

A Smart Radio

A smart radio that uses facial recognition technology to predict user's emotion to modify their song playlist.

Used Deep Face and Dlib models to predict emotions and fatigue level of users based on their facial expression. The system enqueues songs based on predictions using algorithm which ensures a smooth transition between songs based on their characteristics (tempo, danceability etc.) obtained using Spotipy to optimize user’s mood and focus level.

A Little More About Me

On top of the really exciting and nerdy stuff that I do at work, I am also a competitive multisport (duathlons and triathlons) athlete. Here are some pictures of me in action:

alt text

Linus Foo