Interests
Machine Learning • Deep Learning • Signal Processing • Music Information Retrieval • AI for Creativity & Education • Recommender Systems
Experience
I have over a decade of experience in Machine Learning and Signal Processing with applications in the domains of video, audio, manufacturing and, more recently, music. Here I describe my notable past and ongoing projects …
Music Recommendation
After having completed the very well-designed Coursera specialisation on Recommender Systems, I have had the opportunity to apply my learnings straight away at my current job at Moodagent. Here I’m focused, in particular, on Collaborative Filtering algorithms to recommend music to users based on the similarities between their interactions with items (songs, playlists, artists, albums, etc.) and those of other users. It has so far been a great opportunity for me to apply my machine learning experience in a new domain, work with exciting new models and focus more than I did in the past on the deployment of ML algorithms and their scalability to massive datasets.
Artificial Music Intelligence
Having joined Jukedeck in January 2016 shortly after it was launched, I spent over three years there developing new and effective machine learning models (a host of deep recurrent neural networks) that made up the core of our ingenious AI music comper. This was the first commercially available AI composer which could create full, well-structured pieces of music of a certain style and mood requested by the user. My key responsibilities included reviewing state-of- the-art machine learning models for music generation, implementing and engineering such models while also developing new models based on them, carrying out comparative evaluations between the quality of music generated by various models, to develop and maintain the code used for these purposes, and to communicate my work in the form of reports and technical publications.
Neural Probabilistic Data Modelling (PhD)
While I was a doctoral student at City, University of London between September 2012 and July 2016, I spent my time researching and developing neural probabilistic models for modelling different types of data. This work began with an experiment to model sequences of musical pitch in melodies with neural networks, and a class of connectionist models known as Restricted Boltzmann Machines (RBM). I demonstrated the efficacy of these models in comparison to state-of-the-art n-gram models in modelling musical melody. Together with my fellow PhD student Son N. Tran, I further extended this experiment to include other connectionist models, and proposed a new addition to the RBM family of models known as the Recurrent Temporal Discriminative Restricted Boltzmann Machine (RTDRBM). We also proposed a theoretical generalisation and extensions of the Discriminative RBM. These new models were benchmarked on standard machine learning tasks such as Handwritten Digit Recognition, Parts-of-Speech Tagging and Optical Character Recognition.
Digital Waveguide Modelling
In this work done during my time at Simon Fraser University, we presented an extension to an existing measurement technique used to estimate the reflection and transmission functions of musical instrument bells within the context of parametric waveguide models. We presented an alternate post-signal-processing technique that overcame the difficulties stemming from the fact that the bell is not easily separated from the bore for an isolated measurement for certain wind-instruments. The result was a measurement of the saxophone’s round-trip reflection function from which its transfer function, or its inversethe impulse response, may be constructed.
Melodic Phrase Continuation (MSc)
As my Master thesis project at Universitat Pompeu Fabra, I developed a model for automatically generating melodies that are stylistically similar to a given example melody, are meaningful and interesting. Given an example melodic phrase, it was first segmented into its component notes using onset and pitch detection algorithms. The detected notes were clustered based on pitch and duration to generate a symbol for each note. Note onsets were used in metrical analysis for the underlying rhythm. The generation of new melodies from sequences of notes relied on Variable-order Markov Models.
Failure Prediction in Industrial Machinery
During a brief 3-month internship with my master thesis supervisor Hendrik Purwins, I assisted in a joint project between UPF and PMC Technologies wherein given a set of sensor measurements obtained from factory equipment at various points in time, a Support Vector Machine predicts when the manufacturing process is likely to fail due to a failure in one of its stages.
Environmental Audio Classification
The was the second major project I was involved in while working at Siemens – Corporate Technology. The goal was to recognize various audio events in an audio stream, typically from a camera deployed in a place of interest. The MFCC features were used to describe various environmental sounds. Events were explicitly modeled into two categories (non-stationary and quasi-stationary) depending on their nature. The One-pass Dynamic Programming based decoding framework was used to classify the modeled sounds. This work was published both in INTERSPEECH and in ICASSP.
Human Action Recognition in Video
I began working on this project soon after I landed my first job at Siemens – Corporate Technology. We developed a system for recognising human activity in video, with potential application to surveillance and assisted living. In this project, first an isolated action recognition system that recognizes actions in a video when action boundaries are known was developed. Following this, it was enhanced into a continuous action recognition system that assumes no knowledge of individual action boundaries in the video. Spatio-temporal features derived from the moving silhouette of a person were used to represent actions. One-pass Dynamic Programming approach using DTW combined with the Average-template with Multiple Features representation for classifying actions.
Video Stitching
This project, also carried out during my time at Siemens – Corporate Technology, involved the development of an image stitching system which was extended to successive frames of a video (either from a file or a camera stream).
Traffic Violation Detection System
This was my undergraduate project at the International Institute of Information Technology – Hyderabad. The objective of this project was to detect common traffic-rule violations (wrong-side driving, speeding, etc.) that occurred at a junction near the IIIT-H main gate. The proposed system employed two cameras working simultaneously. Motion of vehicles on the road was analyzed using background subtraction and optical flow by the first, wide field-of-view camera. A picture of the license plate of the offender was obtained with the second camera. Poster presentation for this project received a prize at the Bachelor Project Showcase Day in the IIIT-H.