Research Interests

• Machine Learning • Neural Networks • Signal Processing • Music Information Retrieval • AI for Creativity & Education • Technology for Well-being •


Previous Projects

AI-Driven Music Generation
Affiliation: Jukedeck Ltd.

This project involves the development of new and effective machine learning models that make up the core of an ingenious AI-Driven music composition system that creates full, well- structured pieces of music of user-specified styles and moods. My key roles in this project include reviewing state-of- the-art machine learning models that can be employed for music generation, implementing and engineering such models while also developing new models based on them, carrying out comparative evaluations between the quality of music generated by various models these models, and to develop and maintain the code used for these purposes.


Neural Probabilistic Models for Melody Prediction,
Sequence Labelling and Classification (PhD Thesis)

Affiliation: City University London

This project begins with an experiment to model sequences of musical pitch in melodies with a class of purely data-driven connectionist models. It was demonstrated that a set of six such models could perform on par with, or better than state-of-the-art n-gram models previously evaluated in an identical setting. A new model known as the Recurrent Temporal Discriminative Restricted Boltzmann Machine (RTDRBM), was introduced in the process and found to outperform the rest of the models.


Based on the above success of the RTDRBM, its application was extended to a non-musical sequence labelling task (Optical Character Recognition) by modifying the prediction algorithm used in the melody modelling task. The generalised outperformed a set of 8 baseline models. Furthermore, a theoretical extension to an existing model which was also employed in the above pitch prediction task - the Discriminative Restricted Boltzmann Machine (DRBM) - was proposed, leading to three new variants of the DRBM. The first two of these have been evaluated here on the benchmark MNIST dataset and shown to perform on par with the original DRBM.


Digital Waveguide Model of the Tenor Saxophone
Affiliation: Simon Fraser University

This work presented an extension to an existing measurement technique used to estimate the reflection and transmission functions of musical instrument bells within the context of parametric waveguide models. We presented an alternate post-signal-processing technique that overcame the difficulties stemming from the fact that the bell is not easily separated from the bore for an isolated measurement for certain wind-instruments. The result was a measurement of the saxophone's round-trip reflection function from which its transfer function, or its inversethe impulse response, may be constructed.


Automatic Phrase Continuation from Audio Melodies (Master Thesis)
Affiliation: Universitat Pompeu Fabra

The goal of the project was to generate melodies that are stylistically similar to a given example melody, are meaningful and interesting. Given an example melodic phrase, it is first segmented into its component notes using onset and pitch detection algorithms. Clustering is performed on the detected notes based on pitch. Note onsets are used in metrical analysis for the underlying rhythm. The prediction framework relies on the Variable-length Markov Chain model for melody generation.


Classification of Environmental Audio
Affiliation: Siemens - Corporate Technology

In this project, it was required to recognize various audio events in an audio stream, typically from a camera deployed in a place of interest. The MFCC features were used to describe various environmental sounds. Events were explicitly modeled into two categories (non-stationary and quasi-stationary) depending on their nature. The One-pass Dynamic Programming based decoding framework was used to classify the modeled sounds.


Human Action Recognition from Video
Affiliation: Siemens - Corporate Technology

In this project, first an isolated action recognition system that recognizes actions in a video when action boundaries are known was developed. Following this, it was enhanced into a continuous action recognition system that assumes no knowledge of individual action boundaries in the video. Spatio-temporal features derived from the moving silhouette of a person were used to represent actions. One-pass Dynamic Programming approach using DTW combined with the Average-template with Multiple Features representation for classifying actions.


Real-time Camera Based Traffic Violation Detection System (Undergraduate Project)
Affiliation: International Institute of Information Technology - Hyderabad

The objective of this project was to detect common traffic-rule violations (wrong-side driving, speeding, etc.) that occurred at a junction near the IIIT-H main gate. The proposed system employed two cameras working simultaneously. Motion of vehicles on the road was analyzed using background subtraction and optical flow by the first, wide field-of-view camera. A picture of the license plate of the offender was obtained with the second camera. Poster presentation for this project received a prize at the Bachelor Project Showcase Day in the IIIT-H.


Video Stitching
Affiliation: Siemens - Corporate Technology

This project involved the development of an image stitching system, which is extended to successive frames of a video (either from a file or a camera stream).


Srikanth Cherla