Project Description

At nunu.ai, we build AI agents that can operate apps, games, and software like a human. A core challenge in this space is enabling agents to reliably understand and act in structured environments with minimal supervision.

This project focuses on fine-tuning or adapting Vision-Language Models (VLMs) to efficiently play grid-based mobile games, with a particular emphasis on match-3 and merge-2 mechanics.

These games are deceptively complex: they require spatial reasoning, pattern recognition, planning multiple moves ahead, and adapting to stochastic outcomes. The goal of this project is to push VLMs beyond generic perception into highly efficient, decision-capable agents in constrained environments.

You will explore approaches such as:

The end goal is to create a system (or framework) that can consistently and efficiently play these games at a high level, and serve as a foundation for broader agent capabilities.

This is a research + applied engineering project, with room to shape direction depending on your strengths.

🎯 Responsibilities

📋 Requirements

We hire smart and passionate people who are ready to learn fast. None of these requirements are hard constraints if you’re exceptional: