Experience – Srikanth Cherla

Interests

Machine Learning • Generative AI • Deep Learning • Large Language Models • Generative Music • Music Information Retrieval

Projects

I have over a decade’s experience in AI, Machine Learning and Signal Processing, and their application to music analysis and generation, video & audio analytics and gaming among other areas. Here I describe my notable past and ongoing projects.

Large Language Models

Unity AI Assistant is an LLM-powered AI assistant for Unity developers. It is available via a browser-based chat interface where it is able to respond to queries with advice on how to achieve desired high-level goals, code and actions that can solve low-level tasks, and recommendations of relevant Unity products and tools. More importantly, the same chat interface is also available via the Unity Editor where the awareness of the state of a project and a tight integration with the Editor’s capabilities allows Muse to not only to tailor its responses to the given project, but also to create and perform (possibly complex and repetitive) project-related tasks on behalf of the developer.

As an engineer on this project since its very first day, my key focus during its course has been the empowerment of AI Assistant through integration with tools and services offered by Unity and other third-party teams through our native tool use / function-calling protocol and, more recently, the Model Context Protocol (MCP). I have also contributed, among other things, to AI Assistant’s ingestion pipeline in making it aware of various useful sources of information and its agentic workflows to respond meaningfully to complex user requests.

Gameplay Simulation for Balancing Game Economies

As part of my first project at Unity Labs, I developed a problem-solving agent for simulating different types of player behaviour within a game economy. A Game Economy is the system in a game that governs how resources (such as coins, chests, levels and so on) are awarded and consumed within a game. Designing a game economy that is engaging and reasonable is difficult, and game designers tend to rely on spreadsheets to define (static) models of economies and player behaviour. Given the model (an implementation of the components and dynamics) of a game economy, I used a parametrised problem-solving agent to simulate different types of goal-oriented behaviour (maximise / minimise resource expenditure, complete the game as quickly as possible, etc.) within the confines of this model and in the process generated plots and animations of resource creation / consumption in the course of gameplay which was meant to help game designers understand their economy better.

The application of the Problem-Solving Agent naturally extended to a Reinforcement Learning agent. I did several weeks of self-study and even completed a related course on Coursera in anticipation of this but the project was, unfortunately, discontinued due to reasons beyond my control.

Music Recommendation

After having completed the very well-designed Coursera specialisation on Recommender Systems, I have had the opportunity to apply my learnings straight away at Moodagent. Here I worked on Collaborative Filtering algorithms to recommend music to users based on the similarities between their interactions with items (songs, playlists, artists, albums, etc.) and those of other users. After several years focussed on generative models of music, this was a great opportunity for me to apply my machine learning experience in a new domain, work with exciting new models and broaden my focus to the deployment of ML algorithms and their scalability to massive datasets.

Artificial Music Intelligence

Having joined Jukedeck in January 2016 shortly after it was launched, I spent over three years there developing new and effective machine learning models (a host of deep recurrent neural networks) that made up the core of our ingenious AI music comper. This was the first commercially available AI composer which could create full, well-structured pieces of music of a certain style and mood requested by the user. My key responsibilities included reviewing state-of- the-art machine learning models for music generation, implementing and engineering such models while also developing new models based on them, carrying out comparative evaluations between the quality of music generated by various models, to develop and maintain the code used for these purposes, and to communicate my work in the form of reports and technical publications.

Neural Probabilistic Data Modelling (PhD)

While I was a doctoral student at City, University of London between September 2012 and July 2016, I spent my time researching and developing neural probabilistic models for modelling different types of data. This work began with an experiment to model sequences of musical pitch in melodies with neural networks, and a class of connectionist models known as Restricted Boltzmann Machines (RBM). I demonstrated the efficacy of these models in comparison to state-of-the-art n-gram models in modelling musical melody. Together with my fellow PhD student Son N. Tran, I further extended this experiment to include other connectionist models, and proposed a new addition to the RBM family of models known as the Recurrent Temporal Discriminative Restricted Boltzmann Machine (RTDRBM). We also proposed a theoretical generalisation and extensions of the Discriminative RBM. These new models were benchmarked on standard machine learning tasks such as Handwritten Digit Recognition, Parts-of-Speech Tagging and Optical Character Recognition.

Digital Waveguide Modelling

In this work done during my time at Simon Fraser University, we presented an extension to an existing measurement technique used to estimate the reflection and transmission functions of musical instrument bells within the context of parametric waveguide models. We presented an alternate post-signal-processing technique that overcame the difficulties stemming from the fact that the bell is not easily separated from the bore for an isolated measurement for certain wind-instruments. The result was a measurement of the saxophone’s round-trip reflection function from which its transfer function, or its inversethe impulse response, may be constructed.

Melodic Phrase Continuation (MSc)

As my Master thesis project at Universitat Pompeu Fabra, I developed a model for automatically generating melodies that are stylistically similar to a given example melody, are meaningful and interesting. Given an example melodic phrase, it was first segmented into its component notes using onset and pitch detection algorithms. The detected notes were clustered based on pitch and duration to generate a symbol for each note. Note onsets were used in metrical analysis for the underlying rhythm. The generation of new melodies from sequences of notes relied on Variable-order Markov Models.

Failure Prediction in Industrial Machinery

During a brief 3-month internship with my master thesis supervisor Hendrik Purwins, I assisted in a joint project between UPF and PMC Technologies wherein given a set of sensor measurements obtained from factory equipment at various points in time, a Support Vector Machine predicts when the manufacturing process is likely to fail due to a failure in one of its stages.

Environmental Audio Classification

The was the second major project I was involved in while working at Siemens – Corporate Technology. The goal was to recognize various audio events in an audio stream, typically from a camera deployed in a place of interest. The MFCC features were used to describe various environmental sounds. Events were explicitly modeled into two categories (non-stationary and quasi-stationary) depending on their nature. The One-pass Dynamic Programming based decoding framework was used to classify the modeled sounds. This work was published both in INTERSPEECH and in ICASSP.

Human Action Recognition in Video

I began working on this project soon after I landed my first job at Siemens – Corporate Technology. We developed a system for recognising human activity in video, with potential application to surveillance and assisted living. In this project, first an isolated action recognition system that recognizes actions in a video when action boundaries are known was developed. Following this, it was enhanced into a continuous action recognition system that assumes no knowledge of individual action boundaries in the video. Spatio-temporal features derived from the moving silhouette of a person were used to represent actions. One-pass Dynamic Programming approach using DTW combined with the Average-template with Multiple Features representation for classifying actions.

Video Stitching

This project, also carried out during my time at Siemens – Corporate Technology, involved the development of an image stitching system which was extended to successive frames of a video (either from a file or a camera stream).

Traffic Violation Detection System

This was my undergraduate project at the International Institute of Information Technology – Hyderabad. The objective of this project was to detect common traffic-rule violations (wrong-side driving, speeding, etc.) that occurred at a junction near the IIIT-H main gate. The proposed system employed two cameras working simultaneously. Motion of vehicles on the road was analyzed using background subtraction and optical flow by the first, wide field-of-view camera. A picture of the license plate of the offender was obtained with the second camera. Poster presentation for this project received a prize at the Bachelor Project Showcase Day in the IIIT-H.