Reflections on Three Months of Remote Work

Beginnings…

It started with my wife Nina and I deciding that it would be good for us to move to Hyderabad for three months starting Oct 2, 2016 until Jan 8, 2017 for various reasons. As I was keen on continuing work at Jukedeck, I proposed the idea of me working remotely during this period to Patrick, the COO of the company. After some deliberation and another meeting with Ed (the CEO), much to my delight, the company decided to give it a try under the condition that we would review this arrangement each month and be quick to act in case of any unexpected (negative) eventualities. I was very excited and at the same time anxious as this was the first time I ever worked remotely from home.

Preparation

Before leaving London, I had a quick meeting with my team lead Kevin who was very supportive of this idea and we discussed a few things while leaving others to be dealt with as and when needed. First, we decided that I would be working from 11AM until 7PM IST instead of my usual working hours of 9:30AM to 5:30 PM BST which would (considering the 4.5 hours difference in time between Hyderabad and London) give me five hours of overlap in time with my team and three hours during which I would be by myself. We did also note that daylight savings time would set clocks in the UK one hour behind making the time difference 5.5 hours from the initial 4.5 hours but agreed to consider the option of me starting an hour later when this happens. We also agreed that I would update my team with my work every morning on the standup channel we have on Slack. If there were any brainstorming meetings, I would have the opportunity propose ideas before the meeting and again after going through the Google Doc containing the minutes of the meeting after it finishes. And I would be in touch with my team through Slack and, whenever needed, Skype. We also discussed a few worst case scenarios where, if this arrangement did not work out, I would consider switching to part-time work or even a sabbatical leave until my return to London in January.

Setting Things Up

On arrival in India, without any delay my first task was to setup an office at home. My parents, who we were living with during these three months, allowed me to use the guest room/study as my office. I had a reasonably quiet space with a big enough desk to work on. Although it went through a few changes as time went by, it essentially looked like this:

Home Office

Once I got started this way, I was ready to go! In the rest of this post, I’ll write about some of the things that stand out in my memory from others that were more mundane and easy to forget.

Participating in Standups Remotely

About two weeks into my move out of London I started noticing that I was unable to keep up with what some of my teammates were working on. I realised that this was due to the fact that while I was updating everyone with my work on Slack, the reverse was not happening. I brought this up with Kevin and we decided that the simplest thing to do would be for me to attend standups via Skype. We started first with one of the team members holding a laptop during standups with a Skype session which turned out to be a bit cumbersome in addition to the poor audio/video quality. Switching to a mobile phone was less cumbersome but still didn’t help the quality. We then came to know that it was possible to send video messages over Skype and while this was not real-time, it was certainly very clear and allowed me to go over standup in my own time. So we settled with this.

I suppose the bright side of this arrangement was that standups were brief, concise and to the point. There is a tendency for standups to turn into discussions about something very specific, involving only some of the team members while others wait without necessarily knowing what the conversation is about. It certainly avoided such a situation, and I even had a couple of my team mates acknowledge this benefit to me since we started with it.

Pair Programming Remotely

I was assigned a task at one point that required pair programming with my colleague, Marco. This was the first time for both of us to take part in remote pair programming. The first alternative we tried was to use the Atom editor plugin called atom-pair. It worked, however, as this was around the time when my broadband connection quality was at its worst the editor took several minutes to update the text that Marco typed, on my screen. It was bad. We then decided to switch to a more lightweight alternative as we have our trusty Jukedeck server, Ada. The setup was the following. We both connected to Ada via SSH. Once we were in, I started a Tmux session and opened the Python source file using Vim. Marco switched users to be me (he had superuser privileges on the server so I did not have to share my password with him for this) and attached himself to the same Tmux session from his end. Despite the lag, this worked like a charm! This setup came with the added benefit that we could open any number of shells through Tmux, and also have the IPython interpreter running alongside our editor to test our changes. While doing this, we also had a Skype session open where we discussed things. We carried on for about 5 hours with this with hardly any interruptions and got quite a bit of work done.

For the first attempt, I think this went very well. And a win for the very minimal command-line approach to work that I am strongly in favour of. As an alternative to both users involved in remote pair programming using one of the users’ accounts, a dedicated pair programming account can be created on the server which has access to all the relevant source repositories and to which multiple users have access. This would help if one or both of the users engaged in pair programming do not have superuser privileges.

Making Presentations Remotely

At Jukedeck, we have what we call Lunch & Learn (L & L) sessions where a member of the team (or someone the company invites) makes a presentation about a particular topic that might benefit or be interesting to others. I volunteered to do my first L & L session on “Machine Learning at Jukedeck” on Nov 1, 2016 where I planned to go over the basics of machine learning and how we employ it to power our AI music composer. The setup was fairly basic and simple. We initiated a Skype conversation on my colleague Eliza’s laptop, and I emailed a copy of my presentation to her so that she could navigate through it while I spoke from the other end. It went on smoothly without any interruptions and the message seemed to have gone across quite well. I did answer a few questions too, but couldn’t follow a few others due to a poor signal.

I found this to be a nice way to stay in touch with everyone else in the company (apart from my team members who I was liaising with everyday regarding work) and make my presence felt. I was keen on doing another L & L remotely, however, there was not enough time for this before my return to London.

Internet Issues

The only thing I wish had worked out better was my internet connection. Although we had a working 10 Mbps connection from BSNL (India’s National ISP), it was far from reliable. There were brief and frequent outages throughout the day on many days which was frustrating when loading webpages, pulling code changes from GitHub or working remotely via SSH on our company server. My only consolation was a patient and polite customer service, and the courteous technicians they sent forth to fix the connection. Fortunately, the worst of my connection woes lasted only during the first two weeks after which things got better.

To add to my troubles, the IP address (of my home router) from which I connected to our server in London kept changing on a daily basis, and since we had IP-based access restrictions in place I had to share my new IP every morning with Marco who would allow me to connect from it. We did this for about two weeks when we decided to simply unblock a range of IPs from which I seemed to be connecting. In contrast to this, in the UK, one’s public IP (say at home) does not seem to change over time which is what motivated this IP-based access restriction and made it possible in the first place. So now that I’m back, all those IPs are once again blocked and things are more secure once again.

Change in Working Hours

In the second month of my remote work, daylight savings time kicked in and I was one extra hour ahead of my team back in London. Kevin let me decide whether or not I would like to change when I started my day. Initially I did, so that I have the same number of overlapping hours with my team. After about two or so weeks, I found that this was not working out, mainly because I was almost completely losing the most productive part of my day – the morning hours before lunch. Plus, my day typically ended between 8-9PM and this nearly ruled out any prospects of making plans for the evening.

I decided to altogether break away from the daily 11-7 routine, and started even earlier in the morning on certain days when I didn’t anticipate much interaction with my team members. And by this time, both Kevin and I were seeing things work well and had the confidence that moving things around a bit is a minor risk to take if there was a chance of me being more productive. And it certainly didn’t make things worse!

Change of Location

It was around a month and a half after I first started working from my parents’ guest room when my cozy little home-office in the study stopped feeling as cozy. It felt isolating, and I just didn’t look forward to going in there every weekday morning. Clearly, I needed a change of environment. I’d been reading some books about remote work around that time (more on these below) and they suggested either trying out cafes or coworking spaces which would be bustling with some activity that might alleviate the feeling of isolation and lead to a healthier state-of-mind.

For a start, I moved to the dining room. This helped, as it was a bigger space and I’d see people more often than I did in the study. A friend of mine also put me in touch with one of his friends (a senior of mine from the IIIT-Hyderabad) told me of one of his batchmates who had gone on to found his own Data Science startup Predera that had an office in Hyderabad, and he was more than happy to let me work from there if I wanted to. As the office was at least a 30 minute drive away from where I was, I kept postponing my visit and ultimately didn’t end up going there but it was certainly very generous of him to keep the offer open!

The Ups

What I found particularly nice about this setup was that there were very few distractions and thus it was a joy to code, review research literature and GitHub pull requests thoroughly. Furthermore, any conversation I had with my colleagues related to work was concise and to the point. I was in the quiet comfort of my home and the nature of my work which mostly involved individual work with the occasional discussion with a colleague or two was well-suited for a remote setup. Personally, I didn’t find it hard to motivate myself to stick to a work schedule and I would like to attribute this to four years of learning to do this during my PhD. I felt more often like I did justice to the work I took up because of the lack of distractions. One can almost see this as an exercise in self-discipline.

I was also able to skip the roughly 1 hour long daily commute between my home and the office, which was a noticeable change. I wouldn’t really count this a positive change as I cycle to work everyday in London and rather enjoy doing it. However, I can imagine that for someone who drives or takes the public transport to work to work this might come as a change that they would welcome. In my case, I spent the hour I saved on exercise and running in a nearby park so there was not so much difference in what I spent the time for.

And I was also reminded on several occasions during the three months, what a fantastic team at Jukedeck I was a part of! All my team members were supportive of my move, patient and creative in dealing with any glitches that arose thereof and not once showed any signs of disapproval. Kevin was very good at assigning tasks to me that were both challenging and that I could work independently on with some discussion with others in the team. This really minimised delays and feelings of anxiety in me that I wasn’t able to contribute which could have ensued otherwise.

The Downs

While the experience was mostly positive, there were of course some downsides that became evident after just he first couple of weeks. As much as I did make the effort to share my input before and after the meetings, I felt a little less in control when it came to the direction in which the meetings went as this usually involved debating and persuasion which were much easier to do by being there in person. This effect would be far less pronounced if we were a fully remote company, in which case our processes would work the same for everyone. However, this wasn’t the case and while many things did work out, good communication was the biggest challenge among those that didn’t.

Physical absence from the office did feel isolating on a few occasions. I made up for this by engaging in the occasional friendly banter with colleagues over Slack and responding to their non-work related posts which helped me feel like I was still a part of what was happening in the office. It was also important to get out of the house a couple of times during the week just for some change of environment otherwise I felt locked-in. I missed all the team outings and lunches which was a very good opportunity to bond at a personal level with my teammates.

I did see myself falling behind some new developments taking place in the office, particularly those that came about during meetings that I was unable to attend due to the time-difference or poor communication between me and the team in London during the meeting. At least in my case, Skype (or Slack video chat) did not work as well as I had hoped they would. I would say the success-rate was around 40%.

Again, these downsides were not something that couldn’t be addressed but I thought it would only be fair to mention them along with the things that did work out. I’ll not speculate about how it would have turned out otherwise, but I sure was happy to be back in person to the Jukedeck office in London after three months.

If You’re Interested in Remote Work Too…

There is an excellent Hacker News post that answers several questions related to remote work, and also contains some very handy links to websites that facilitate remote work and create opportunities for those seeking to make a career out of working remotely. Here I came across two very well-written books on working remotely. The first is called “Remote: Office not Required” by the founders of 37Signals (now Basecamp) which is a company that has seemingly mastered the art of effective remote work. And the second is “The Ultimate Guide to Remote Work” by the folks at Zapier. You can even get a free PDF/MOBI copy of this book on their website. While there is some degree of (I should note, reasonable) self-marketing that went into both these books, they are definitely well worth reading for anyone wanting to get an insight into the pros and cons of remote work. Essentially, what all the different resources gathered in this Ask HN page suggest is that thanks to technology, we’re heading towards a world where remote work (at least in the tech sector) is becoming more and more feasible for those seeking a change from the 9-5 office work. It certainly gave me something to relate to, tips to follow and a feeling of being a part of a larger (but not large in an absolute sense) movement.

In Retrospect

As much as I had my apprehensions (as I often do with many things), I think this was a fantastic experience overall – I got to spend time with my parents after nearly two years of being away in the UK busy with my PhD, meet old friends, get married to my lovely wife Nina who I must thank for insisting on moving to India for three months, and last but obviously not the least be a part of a work arrangement that was indeed something new and unique in my experience. It got me interested and researching about making a career working remotely, which is something I believe I’m likely to follow up on at some point later in my career.

Merry Christmas!

My lovely wife Nina and I recorded a little video where we play a cover of the song “Have Yourself a Merry Little Christmas” by Hugh Martin and Ralph Blane, to wish all our loved ones a merry Christmas.

So here’s wishing everyone a merry Chirstmas and a very happy New Year from the both of us!

Let’s Encrypt for Free!

This is an account of how I went from no encryption, to almost getting a paid SSL certificate to finally making and installing a free one on my domain. It started with me setting up an ownCloud server on my hosting account to access and sync my data on the cloud after going from Dropbox, to Copy to Mega and finally to pCloud over a span of five or so years.

Why ownCloud?

Mainly because I have a shared hosting account with Arvixe (an excellent hosting service) with unlimited data storage, and I was curious as to how much of an effort it would be to set up my own cloud storage since hearing about ownCloud a couple of months ago. It turns out that it wasn’t much of an effort after all. I simply contacted the support team at Arvixe who made the ownCloud app available on my cPanel and then it was just a matter of filling a simple online form with little details such as where to store your data, which address to access the ownCloud web interface on, etc. The ownCloud project is fantastic! And from what I’ve seen, it has most (if not all) features that any other company like Dropbox or Mega has to offer. It took me 15 minutes to set things up, install the (Linux) client and sync my cloud storage (a folder on my hosting account) with a local folder.

So is that it? Turns out that there’s more. Since now I’m transferring data to and from my domain, it is preferred that the connection to the domain is secure. And the connection can be secured with SSL Encryption.

SSL Encryption

I won’t go much into TLS/SSL encryption here as there are plenty of resources online that explain it. It would suffice to know that it is a way for a website to secure the connection between itself and a visitor so that any data exchanged between the two is encrypted and not visible to a (potentially malicious) third party that is eavesdropping on the connection. This is necessary to prevent what is known as a man-in-the-middle attack where a hacker intercepts the connection between the website and its visitor and collects the data being transmitted between the two (which may sometimes be confidential, such as credit card numbers, personal identification numbers, etc.) without either the website or its visitor knowing about it.

The Chromium Browser address bar when the connection to the page is not secured (note the “http://”).

There has lately been a growing interest on the web to adopt SSL (or its successor TLS) to secure connections between them and their visitors. Google has even proposed to blacklist websites that don’t adopt the SSL protocol. At a first glance, one can know whether or not a website is secure by keeping an eye out for a green lock next to the address bar, and the fact that it says https://   (with the green lock symbol) in the addres bar instead of http:// . The s here stands for secure. And if you click on the green lock, it pops up a little window that shows who the site has been secured by.

The Chromium Browser address bar when the connection to the page is secured (note the “https://”).

All this stress on security and privacy is, in my opinion, justified. So given that now I’m transmitting my data between my local machine and my domain I decided it would be a good idea to adopt SSL encryption on my domain. This can be realised by obtaining an SSL certificate from a Certification Authority.

SSL Certificates and Certification Authorities

In order to obtain an SSL certificate for your domains, you should purchase it from a certification authority (CA) or a reseller who sells it at a cheaper rate sans some extra benefits of support that the CA would be able to offer for a higher price. Some of the most popular CAs around are Symantec, GeoTrust, GlobalSign, DigiCert and GoDaddy. Each of these CAs sells you a certificate for a fixed period of time – typically 1 to 3 years – and offers different packages such as Extended Validation, Wildcard domain certification, etc. For instance, have a look at what Symantec, GeoTrust and GlobalSign have to offer. These are very similar options but priced differently depending on the CA’s credibility (which apparently is a major factor in deciding whom to go with) and what is contained in the option.

On the other hand, there are companies that purchase certificates from the CAs in bulk and re-sell them at a cheaper rate. These are websites such as SSL Shopper, or even your own hosting company. I know my hosting company Arvixe re-sells certificates purchased from GlobalSign. Depending on whether you are purchasing your certificate from a re-seller or directly from a CA, the price varies between $17 (the lowest I could find for a RapidSSL certificate from SSL Shopper) to a few thousand dollars.

A CA or a re-seller issues you a certificate following a verification procedure that confirms that you are indeed the owner of the domain and that your company is a legitimate one whose credentials have been verified by this issuing authority. And the verification process is either manual or fully automatic and depending on how thoroughly it is done, the issuance of a certificate can take anything between a few minutes to a few weeks. I did not complete this process myself (for reasons explained below) but I do recall abandoning a few applications midway because it seemed like a hassle to provide them with information I didn’t even know the meaning of. And although an expensive, time-consuming and thorough process might make sense for a big company that is dealing in a lot of financial transactions and exchange of information with its customers with a lot at stake, I felt like it was an overkill in my case when all I wanted to secure was my personal domain and communications with my ownCloud server (remember?).

Now this all sounds good, and I was almost convinced that I should buy myself one of the cheaper certificates for a few dollars a year from SSL Shopper. And I gathered all this information over a week of looking things up in my free time. I was quite sure that I had covered all viable options but I couldn’t help wondering whether it’s possible to get an SSL certificate for my personal domain for free. One final DuckDuckGo search led me to a StackOverflow post that answered this question in affirmative!

Let’s Encrypt

The StackOverflow post pointed me to the Let’s Encrypt initiative which essentially offers means for one to generate SSL certificate oneself via a fully automated verification process. Not just that, it offers you with a host of ways in which this can be done depending on your level of comfort with using the command-line, cPanel or any other means through which verification can be carried out. I was skeptical that something like this is too good to be true, but it isn’t. Also, the project is sponsored by several well-known organisations such as the Linux Foundation, Mozilla, EFF and CISCO. And the certificate is accepted by all mainstream browsers. As a coincidence, I later found out that The Site Wizard, which I had referred to several times in the past while choosing a hosting provider, website templates, etc. is also secured by a certificate from Let’s Encrypt!

Now this was exactly what I wanted, i.e. to secure my personal domain so that I can transfer data between my location and my ownCloud server. It does not matter to me (at least for now) how much extra assurance a seal from a known CA such as DigiCert or Symantec gives a visitor to my website. Plus, it’s absolutely free. In my case, I had the certificate generated within minutes through ZeroSSL with an automated ACME verification process that involved me creating two files with specific content on my domain that were verified by this website. There are many alternatives to ZeroSSL, any of which can  be used as per one’s convenience. One thing to note is that the certificate issued by ZeroSSL is valid only for three months, but I don’t mind repeating the very simple process again when my current certificate expires.

Last Words

So to conclude, securing one’s website with a TLS/SSL certificate is not as hard or expensive as it may seem at first glance thanks to Let’s Encrypt. I’m very impressed by this initiative, and found it to be a perfect alternative for my needs given all other options known to me. The Let’s Encrypt team is currently seeking funding for their operations and I’m about to donate to it as a token of my appreciation. So if you are in a similar situation as I was before my research that led me to Let’s Encrypt, I hope you benefit from reading this post!

A Visit to MusicMuni Labs

I was in Bangalore last weekend with my wife Nina who delivered a talk on Music Therapy for Dementia at ARDSICON 2016. During my stay there, I took the opportunity to meet a couple of friends at MusicMuni Labs, a startup that is working on some very cool apps for Music Education in India. This post talks a little about MusicMuni and what they’re up to.

To give you some background, this startup is the brainchild of two of my friends Gopala Koduri and Sankalp Gulati together with their mentor Prof. Xavier Serra and other music technology researchers at the Music Technology Group (MTG) of Universitat Pompeu Fabra. The MTG has created several successful startups in the past, and this is one of their newest ventures that employs the research that has been carried out as a part of the CompMusic project among other state-of-the-art music technology research for learning and exploring music of the Hindustani and Carnatic traditions.

The team has so far released two apps for Android – Riyaz and Sarāga, both currently in the beta stage and steadily gaining a user base. This is how the team describes Riyaz which is apparently their main focus at the moment:

“This android application aims to facilitate music learning for beginner to intermediate level music students by making their practice (riyaz) sessions more efficient. This application includes cutting edge music technologies that employ perceptually relevant models to automatically evaluate how well a student is singing compared to a reference music lesson. Students get a fine grained feedback on their singing.”

And Sarāga is described as follows:

“Sarāga is an android application that provides an enriched listening atmosphere over a collection of Carnatic and Hindustani music. It allows Indian art music connoisseurs and casual listeners to navigate, discover and listen to these music traditions using familiar, relevant and culturally grounded concepts. Sarāga includes inclusive designing of innovative visualizations and inter and intra-song navigation patterns that present musically rich information to the user on a limited screen estate such as mobiles. These time synchronized visualizations of musically relevant facets such as melodic patterns, samas locations and sections provides a user with better understanding and appreciation of these music traditions.”

They’re a very early stage startup with a very small and dedicated team, so I wish them all the very best and look forward to exciting updates from them in the future. Do check out their apps on the links I shared above in this blog if you interested in classical music of India. And if you’re looking to do an internship with a passion for music and music technology, they would be happy to hear from you!

MusicMuni Labs – Swapnil, Utkarsh, Gopala and Sankalp (from left-to-right).

 

Jukedeck @ The Science Museum Lates

I had the opportunity to join my colleagues at Jukedeck – Patrick, Lydia, Eliza, Matt, Katerina and Gabriele – at the Science Museum Lates last night. For those of you that are unfamiliar with the concept, Lates are adults-only, after-hours theme nights that take place in The Science Museum (in London) on the last Wednesday of every month. It is attended by various organisations that would like to showcase their work relating to a chosen theme to an audience, as well as an audience that is keen on learning more about the science and technology underlying the theme. On the last day of August 2016, it was Jukedeck’s turn to show-off its awesome technology at the museum and some of us volunteered to tag along.

Lydia and me (in the background) explaining what Jukedeck and its technology is about to curious visitors at our stall.

The museum was packed with visitors, and it was great to see so many people interested in our technology! I hardly had the time to go grab some dinner amidst the constant stream of people wanting to listen to our music and know more about the underlying algorithms. To me, as someone who does the research and writes the code that generates our music, this was an incredibly rewarding experience to see first-hand the appreciation people had for our work. It’s, in many ways, like having a poster presentation at a conference but with a non-technical audience. I enjoyed it very muchIn the future, I’ll try my best not to let such opportunities pass. And I look forward to attending the event myself in the future as a spectator! If you happen to be in London around the time this event is on, I highly recommend attending it if you’re interested in science and technology.

Music and Connectionism

The many contributions made during the past three decades to computer-assisted analysis and generation of music with the aid of Connectionist architectures can be seen to have occured in two waves, in parallel with developments in Connectionist research itself. During the first wave, the founding principles of Connectionism were introduced (Rumelhart et al., 1986) through the idea of Parallel Distributed Processing according to which mental phenomena occur as a result of simultaneous interactions between simple elementary processing units, as opposed to the then prevailing notion of Sequential Symbolic Processing which explained the same phenomena in terms of sequential interactions between complex goal-specific units. Its significance is largely theoretical, with a few experimental and empirical results to support the feasibility of the theory. Following several years of reduced interest, the second wave further strengthened the claims made by its precursor through a series of successful high-impact real-world applications. This was owing to both the proposal of newer theories, and the availability of greater computational power and vast amounts of data that enabled the demonstration of the efficacy of these theories nearly two decades on (Bengio, 2009; LeCun et al.,2012). The innovations that came about as a result of these two phases trickled down to several application domains (Krizhevsky et al., 2012; Hinton et al., 2012;Collobert et al., 2011) of which music is one (Todd and Loy, 1991; Griffith and Todd,1999; Humphrey et al., 2012). This section reviews notable contributions among the many that demonstrated the application of connectionism to symbolic music modelling during these two waves in order to present a historical perspective together with an overview of the techniques employed.

The First Wave

The first set of notable approaches which apply Connectionism to the analysis and generation of symbolic music were proposed in the years following the publication of the influential text on Parallel Distributed Processing (Rumelhart et al.,1986). While the breadth of contributions to the field during this period is indeed vast, I present a brief historical perspective only on work involving Feedforward Neural Networks, Recurrent Neural Networks and Boltzmann Machines, and refer the reader to (Rumelhartet al., 1986; Medler, 1998) for more in-depth and comprehensive reviews. Many of the inventions and algorithms proposed during this period persisted through the decades that followed and significantly impacted research in Artificial Intelligence, and the now thriving field of Machine Learning. These were the years that saw the maturation of the previously proposed Perceptron (Rosenblatt, 1958) into the Multi-Layer Perceptron (also known as the Feedforward Neural Network) and the invention of the Backpropagation algorithm (Rumelhart et al., 1988) which offered a simple and efficient means to train this model on data, thus leading to a surge in its popularity. The architecture of the Feedforward Neural Network (FNN) was further adapted to deal with sequential data into the Recurrent Neural Network(RNN) (Elman, 1990; Jordan, 1986), and likewise, the Backpropagation algorithm extended into the Backpropagation Through Time (BPTT) (Werbos, 1990) to train this new architecture. Other algorithms were also proposed around the same time to carry out real-time learning in the RNN architecture (Williams and Zipser, 1989). Another significant innovation from this period is the Boltzmann Machine family of models (Smolensky, 1986; Hinton et al., 1984), which consists of undirected graphical models that learn joint probability distributions of sets of visible and latent variables through a process of minimisation of an energy function associated with configurations of these variables. Probabilistic inference can be carried out in these models to determine conditional distributions, typically of interest in various prediction tasks.

Contributions to Connectionist theory and Artificial Intelligence, such as the above, generated interest in their adoption into several application domains that foresaw their potential benefits. This included the computer-assisted analysis and synthesis of music. One of the first systems for this purpose, known as HARMONET (Hild et al., 1992), was designed for harmonising chorales in the style of J S Bach. It consists of a symbolic (rule-based) component together with a recurrent neural network, and generates four part harmonisations of a given chorale melody. The role of the neural network is to generate human-like harmonisations within the rules dictated by music theory, which when taken literally tend to result in “aesthetically offensive musical output”. HARMONET divides the harmonisation task into three subtasks. In the first, a harmonic skeleton of the chorale melody is generated for every quarter note of the given melody (which essentially involves determining the bass voice of the chorale) using a recurrent neural network. The network takes as inputs harmonies generated at previous time-steps, and also the local context and global position (with respect to the beginning of the melody) of the note at the current time-step to generate a harmony for it. A novel representation for the pitch of each musical note was introduced at this stage which encodes the harmonic functions that contain the note, thus introducing hand-crafted musicological information as input to the network. This is followed by the generation of the alto and tenor voices taking into account the given soprano voice in the melody, and the bass voice generated in the previous step. Finally, ornamenting eighth notes are added to the result at each chord by another network which takes into account the local harmonic context. The system was evaluated by an audience of music professionals who judged the quality of the harmonisations. By treating each of the possible harmonizations of the first network above as classes and changing its output units to softmax (Specht, 1990), the system can be used for predicting harmonic expectation over time.

The work initiated in the context of HARMONET was later extended to create MELONET (Feulner and Hörnel, 1994) – a system comprised of a hierarchy of neural networks operating at different time-scales which models melodies assequences of harmony-based motifs and varies one of the chorale voices generated by HARMONET. It uses, what are known as delayed-update neurons in a recurrent network which, by integrating their inputs over a certain time-span reflectlong-term information about the melody input. It works hand-in-hand with HARMONET to generate the said variations. In a subsequent publication, a committee of such neural networks, each of which has learned a specific harmonisation style, was used to recognise different styles of harmonisation according to how expected it is to each network (Hörnel and Menzel, 1998).

Chorale harmonization has also been the focus in (Bellgard and Tsang, 1994) where, in contrast to HARMONET, the approach relies solely on a connectionist model — the Boltzmann Machine (BM) (Hinton et al., 1984; Smolensky, 1986). Four-part writing in practice is regarded here as being the result of a unique set of choices made by the composer between various competing harmonization techniques, which is not clearly defined in practice and is thus essentially an imprecise and noisy process. This is where the stochastic nature of the model employed is highlighted as an advantage over the deterministic nature of models from previous work i.e., HARMONET, CHORAL (Ebcio ̆glu, 1988). The task is viewed as one of pattern completion (or gap-filling) where a given chorale melody is only the partial specification of a complete piece of information which is the harmonized chorale. A BM learns local harmonic constraints through a series of overlapping time-windows extracted at each time-step in the chorale. Harmonization is achieved in an identical fashion, but with the learned model slid (in time) along a given chorale melody. Its visible units are comprised of a mixture of multinomial and binomial units which represent three octaves of musical pitch, musical rest and phrase-control variables. The energy-function associated with a Boltzmann machine to assess the quality of learning in the model is also used to assess the quality of the harmonies generated by the model. Sliding the BM in time along a temporal input gives, what is referred to by the authors as the Effective Boltzmann Machine (EBM).

As music is inherently temporal in nature, recurrent neural networks (RNNs) are a natural choice for modelling musical structure. In one of the first applications of neural networks to music (Todd, 1989), a special case of the RNN known as the Jordan network (Jordan, 1986) was made to memorize and interpolate between melodies of different styles. The network consists of an input, hidden and an output layer. A part of the input layer consist of a set of plan units that indicate the style of the melody being learned or produced. The rest is a set of context units which maintain the memory of the sequence produced so far by combining the effects of the most recently predicted output in the sequence (which is fed back as input) and an exponentially decreasing sum of all of the network’s previous inputs. The input layer is fully connected to the hidden layer which is in turn fully connected to the output layer. The network models sequences of pitches and durations, and uses a fixed size time-window of notes in its input context units and predicts the same number of notes as those in the input time-window for the next time-step which are fed back to its input layer. All melodies are transposed into the key of C, and a binary one-hot representation (a vector containing all 0s and a single 1 corresponding to a particular value) is used for pitch. A time-slice representation is used for duration where the length of a note is given by the number of consecutive evenly spaced time-slices (of eighth-note duration), with additional information about its onset. The purpose of the network is to memorize melodies that it has come across, associating each melody with a plan so that it can also interpolate between melodies when plans are interpolated, and change melodies dynamically as well when plans are changed.

An often cited work in connectionist music composition is that of Mozer (1991), where an RNN named CONCERT is empoyed for learning structure in melodies to generate novel variations on them. In contrast to the above described approach in (Todd, 1989), this network uses an Elman RNN (Elman, 1990) and also contains a learning stage (absent in the other) where the backpropagation through time (BPTT) algorithm (Werbos, 1990) is applied to tune the weights of the network to the prediction task. The task is to predict the next note, given the previous one and the state of its hidden layer in the most recent time-step which accounts for the notes further back in time that are not dealt with explicitly. The shortcomings of the network’s architecture in dealing with long-term memory and global structure of a musical piece are addressed by taking into account the notes in the melody at multiple time-resolutions, and also employing an additional parameter that enabled controlling its sensitivity to recent versus not so recent notes in a melody. With the generation of aesthetically pleasing melodies being the focus of the network, the task-unaware one-hot representation of notes in it is abandoned (or retained only for the sake of interpreting results) in favour of a perceptually motivated one, based on earlier empirical observations by Shepard (1982). The model was evaluated by having it extend a C major diatonic scale, learn the structure of diatonic scales, learn random walk sequences of pitches, learn specific kinds of phrase patterns and generating new melodies in the style of J S Bach.

A different approach inspired by the Target-note Technique in Bebop jazz is explored by Toiviainen (1995), wherein given a typical jazz chord progression an auto-associator network emulates the creativity of an improviser. The melodies generated by the model rely on the starting notes at any given point in time, together with the current chord to determine the possible melodic patterns, and the next chord in the progression to determine the possible target notes to follow. Several constraints that reflect the typical practices in jazz improvisation, such as the relationship between the musical pitch of a note in the melody and the root of the current chord, typical chord-types occurring in jazz progressions, typical syncopation in improvised melodies, etc. influenced the design choices for the architecture of the network. The network relies on the Hebbian learning rule for updating its connections while learning from data. A moving time-window approach was adopted for representing time, where each window corresponded to one half-measure. Thus in each step of its operation during the generative process, the network generated a melody of length equal to a half-measure, which was fed back into it in order to generate the next one, and so on. The fact that such a network learns to generate music from examples in a dataset, much like a typical jazz musician who improvises based on the repertoire that she/he has paid attention to overtime is what motivates this approach. The author concludes that “the melodies produced by the network resemble those of a beginning improviser”, based on a qualitative assessment of its generations learned from excerpts of solos played by the trumpet player Clifford Brown, over chord changes in George Gershwin’s “I’ve Got Rhythm”.

The above list of connectionist systems for the analysis and synthesis of symbolic music consists of notable contributions among those that laid the foundations for future work on the subject. It is, by no means exhaustive, and there exist several others that explore other musical phenomena with connectionist architectures considered beyond the scope of this review. I point the inquisitive reader to (Todd and Loy, 1991; Griffith and Todd, 1999) for a comprehensive summary of work carried out in the field during, what I refer to here as, the first wave of connectionism.

The Second Wave

The second wave of interest in neural networks and connectionism, which has prevailed for nearly a decade (with hardly any signs of subsiding) at the time of writing of this post, can be said to have come about towards the end of what is generally known as the AI Winter (Hendler, 2008). Its success has been attributed to the culmination of three key factors — theoretical and empirical advances in connectionist research, the presence of very powerful hardware in modern computers, and the availability of vast amounts of data. This wave brought with it several new innovations in connectionist architectures and algorithms which also fueled a revival in the study and application of older ones brought about by its precursor. The theoretically known, but often practically infeasible concept of a deep neural network (a feedforward neural network with more than one hidden layer) was made into reality during this period with the introduction of new methods for pre-training these networks layer-by-layer in an unsupervised fashion before training on a certain task in a supervised manner (Bengio et al., 2007; Hinton et al., 2012). The Restricted Boltzmann Machine (RBM), a generative unsupervised model which was, in part responsible for this turnaround, was extended in many different ways to serve as a supervised learning model and a classifier (Salakhutdinov et al., 2007; Larochelle and Bengio, 2008), a sequence learning model (Sutskever and Hinton, 2007; Sutskever et al., 2009; Taylor et al., 2007) and generalised to handle different types of data (Welling et al., 2004). The RBM, in turn, soared in popularity thanks to the Contrastive Divergence algorithm (Hinton, 2002; Tieleman,2008) which made it possible to train this model more efficiently than was previously possible. Likewise, the limitation of recurrent neural networks in modelling very long-term memory was also addressed to increase their effectiveness as sequence models (Martens and Sutskever, 2011). A previously proposed architecture to address the same issue of long-term memory — the Long Short Term Memory (LSTM) network (Hochreiter and Schmidhuber, 1997) was also revisited and is now even more widely used as a sequence model, with proposals of other models inspired by it (Chung et al., 2014). Another architecture that underwent a breakthrough is the Convolutional Neural Network which is now the de facto standard for object recognition and related image recognition and classification tasks (Krizhevsky et al., 2012). All these advances had a significant impact on three application areas — Natural Language Processing, Speech Processing and Computer Vision (Lecun et al., 2015), the very tasks in which the failure of Artificial Intelligence to perform well in the past was an important reason for a drop in interest in the field, i.e. the AI Winter.

This revival of interest in connectionist research inspired a body of work that deals with a diverse set of musical tasks using symbolic music. One such application was in modelling melodies by capturing short melodic motifs in them using a Time Convolutional RBM (TC-RBM) (Spiliopoulou and Storkey, 2011). In contrast to other RBM-based sequence models (Sutskever et al., 2009; Taylor and Hinton, 2009), the TC-RBM does not make use of any recurrent connections and relies on the idea of convolution through time over fixed-length subsequences within a window centered at each time-step (Lee et al., 2009). Furthermore, a weight-sharing mechanism which features in this model helps it achieve translation invariancealong time, which is desirable as motifs can occur anywhere in a musical piece. The approach models both the pitch and duration of notes, and uses an implicit representation of time by discretising it in eighth-note intervals. A two-fold evaluation of this model was carried out with the model on the Nottingham Folk Music Database. A qualitative evaluation involved the analysis of the latent distributed representations learned by the TC-RBM when presented with musical data in its visible layer, which were found to convey information about the scale, octave and chords. In a quantitative evaluation, the model was made to predict the next k time-steps given a fixed-length context. The prediction log-likelihood was computed approximately by sampling from the model, and the Kullback-Leibler divergence was used to determine the closeness of the model’s predictions to the empirical distribution.

As a continuation of a previously proposed probabilistic grammar based approach for generating Jazz solos known as the Impro-visor (Keller and Morrison, 2007), a Deep Belief Network (Hinton et al., 2006) (DBN, a probabilistic generative model made up of a stack of the aforementioned Restricted Boltzmann Machines) was experimented with for the same purpose (Bickerman et al., 2010). As modelling entire melodies, or solos requires dealing with long-term dependencies that are not feasible with a non-recurrent model such as the DBN, only 4-bar jazz licks (short, coherent melodies) are modelled at each time-step. As in some of the approaches outlined above, a sliding-window is used to model temporal information, with a window-size of one measure (4 beats) of the piece of music, and a step-size of 1 beat. The visible (input) layer of the DBN simultaneously modelled the joint distribution of the chromatic pitch-class, duration and onset, and octave of the melody note, and the chord underlying the melody, thus allowing the model to associate chords with various melodic features which is a key factor to consider in jazz music. The model was trained generatively using the Contrastive Divergence algorithm (Hinton, 2002; Tieleman, 2008) on a large corpus of 4-bar jazz licks. With the DBN being a stochastic generative model, novel jazz licks could be sample done beat at a time from it in generative mode. While it could be demonstrated that the model does indeed generate the desired licks, the authors conclude in favour of their previous grammatical approach to lick generation over the DBN stating the subjective quality of the generated licks and the large training time of the DBNs to support this choice.

The approaches described above use non-recurrent models which have largely been superceded, when it comes to the modelling of sequential data, by recurrent models that are a more natural fit for temporal data. In an attempt towards style-independent polyphonic music generation in (Boulanger-Lewandowski et al., 2012), an RNN-RBM is made to model sequential information directly from the piano-roll notation (Orio, 2006b). The reason for dealing with this notation is to avoid making any kind of prior assumptions regarding the nature of the modelling task that would simplify it, thus leaving much for the model to determine by itself. The RNN-RBM is a stochastic model and can be understood as a sequence of RBMs, which at each time-step of the sequence are conditioned by the hidden layer of an RNN. Thus in addition to the RNN modelling sequential information, the RBM models correlations between variables (MIDI note values) that occur simultaneously at each time-step. The latter is often ignored in standard RNNs, and can be viewed as an advantage of this model given sufficient data since it also entails the need for a greater number of model parameters. The model is targeted at the task of automatic music transcription and is thus required to model time in seconds incontrast to other symbolic music modelling approaches that represent time relative to the musical score, thus requiring an additional step of alignment between the audio and symbolic formats. Time, in this model, is represented in terms of consecutive slices of the quantised musical signal. It is trained using the mini-batch gradient descent and the Backpropagation Through Time algorithms. It was found that this model outperforms others addressing the same task. This work has also inspired other very close extensions with the same goal, that claim improved performance (Goel et al., 2014; Lyu et al., 2015).

A previous approach by Eck and Schmidhuber (2002) for modelling Blues music with a Long Short-Term Memory (LSTM) RNN can be said to have influenced the above described one (Boulanger-Lewandowski et al., 2012) in its choice to not incorporate any prior musicological information in order to simplify the modelling task. As mentioned earlier, the LSTM is an enhanced version of the basic RNN and has been shown to be able to successfully model longer temporal dependencies than the latter. Here, once again, successive slices of the musical signal are treated as time-steps. A quantisation step-size of 8 notes per measure was used, and thus the 12-bar blues musical segments used for training the model were each 96 time-steps in length. The first experiment carried out with this model involved having it learn and generate a musical chord structure, from which the authors conclude that this is a fairly straightforward task for the model, and also expected given its previous success in tasks involving counting. In the second experiment both melody and chords are learned, leading to a conclusion that the LSTM is indeed able to generate a blues melody constrained by the learned chord structure that sounds better than a random walk across the pentatonic scale and are faithful to the examples in the training set. The evaluation in this case is left to the listener who is encouraged to visit a webpage containing the pieces of music generated by the network.

A more recent study with the LSTM (Franklin, 2006) carried out further experiments with this model on jazz-related tasks. Here, various note representations were studied in order to incorporate musical knowledge into the network. This can be contrasted with the approach adopted in (Eck and Schmidhuber, 2002; Boulanger-Lewandowski et al., 2012) that avoids making any music theoretic assumptions. A pitch representation based on major and minor thirds known as the circle-of-thirds representation, and a duration representation known as the modular-duration representation which extends that proposed in (Mozer, 1991) were used to train the dual pitch/duration LSTMs. Two experiments were carried out. The first focused on short musical tasks, and only sequences of musical pitch were considered. These included outputting in sequence the four chord tones given a dominant seventh chord as input, determining whether or not a given sequence of notes are ordered chromatically, and reproducing a specific 32 note melody of the form AABA given only the first note as input. A single network was used for all these tasks. In the second experiment, which focused on long musical tasks, the objective was to learn the melody of the song Afro Blue composed by the jazz percussionist Mongo Santamaria. Two separate networks are used to learn musical pitch sequences and note duration sequences respectively. The study concludes in favour of the LSTM and a detailed qualitative analysis of the results with respect to the authors’ expectations.

Lambert et al. (2015) trained a two-layered RNN on the Mazurka dataset (MAZ), an audio dataset of expressively performed piano music. The first layer for the system is a Gradient Frequency Neural Network (GFNN) (Large et al., 2010), which uses nonlinear oscillators to model metre perception of a periodic signal. The second layer contains LSTM units which model the output of the GFNN and predict rhythmic onset as a time-series activation function. This work builds on previous experiments involving a symbolic data in which the authors find that the LSTM performs time-series modeling significantly better when GFNNs are used exclusively (Lambert et al., 2014a,b). Their GFNN-LSTM model was able to predict rhythmic onsets with an f-measure of 71.4%.

As stated before, there seems to be very little work focusing on connectionist models for information theoretic music modelling. One such attempt is presented in (Cox, 2010), where the relationship between entropy and meaning in music inspired by (Meyer, 1956, 1957) is explored with the help of Recurrent Neural Networks that estimate instantaneous entropy for music with multiple parts in the analysis of a string quartet piece composed by Joseph Haydn. The model considered here contains two components – a long-term model (LTM), and a short-term model (STM) (Conklin and Witten, 1995). The parameters of each model are learned through exposure to appropriate data. The LTM models global stylistic characteristic acquired by a listener over a longer time-span. The STM models context-specific information, available in a melody while it is being processed by the listener, in the generation of expectations. Predictions made by each modelare combined using ensemble methods, and this has been shown previously to improve the quality of predictions over individual models in the past (Conklin and Witten, 1995; Pearce, 2005). The work demonstrates that the entropies as predicted by the model are sensitive to the effects of cadences, resolutions, textural change, and interruptions in music.

References

  1. Matthew I. Bellgard and C P Tsang. Harmonizing Music the Boltzmann Way. Connection Science, 6(2):281–297, 1994.
  2. Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy Layer-wise Training of Deep Networks. Advances in Neural Information Processing Systems, 19:153, 2007.
  3. Yoshua Bengio. Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1):1–127, 2009.
  4. Greg Bickerman, Sam Bosley, Peter Swire, and Robert Keller. Learning to Create Jazz Melodies using Deep Belief Nets. In International Conference On Computational Creativity, 2010.
  5. Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In International Conference on Machine Learning, 2012.
  6. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555, 2014.
  7. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (almost) from Scratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.
  8. Darrell Conklin and Ian H Witten. Multiple viewpoint systems for music prediction. Journal of New Music Research, 24(1):51–73, 1995.
  9. Greg Cox. On the Relationship Between Entropy and Meaning in Music: An Exploration with Recurrent Neural Networks. In Annual Conference of the Cognitive Science Society, pages 429–434, 2010.
  10. Kemal Ebcio ̆glu. An Expert System for Harmonizing Four-Part Chorales. Computer Music Journal, pages 43–51, 1988.
  11. Douglas Eck and Juergen Schmidhuber. Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks. In IEEE Workshop on Neural Networks for Signal Processing, pages 747–756. IEEE, 2002.
  12. Jeffrey L Elman. Finding Structure in Time. Cognitive science, 14(2):179–211, 1990.
  13. Johannes Feulner and Dominik Hörnel. MELONET: Neural Networks that Learn Harmony-based Melodic Variations. In Proceedings of the International Computer Music Conference, pages 121–121. INTERNATIONAL COMPUTER MUSIC ASSOCIATION, 1994.
  14. Judy A Franklin. Recurrent Neural Networks for Music Computation. INFORMS Journal on Computing, 18(3):321–338, 2006.
  15. Kratarth Goel, Raunaq Vohra, and JK Sahoo. Polyphonic Music Generation by Modeling Temporal Dependencies using a RNN-DBN. In Artificial Neural Networks and Machine Learning–ICANN 2014, pages 217–224. Springer, 2014.
  16. Niall Griffith and Peter M Todd. Musical Networks: Parallel Distributed Perception and Performance. MIT Press, 1999.
  17. James Hendler. Avoiding Another AI Winter. IEEE Intelligent Systems, 2(23):2–4,2008.
  18. Hermann Hild, Johannes Feulner, and Wolfram Menzel. Harmonet: A Neural Net for Harmonizing Chorales in the Style of JS Bach. In Advances in Neural Information Processing Systems, pages 267–274, 1992.
  19. Geoffrey E Hinton, Terrence J Sejnowski, and David H Ackley. Boltzmann Machines: Constraint Satisfaction Networks that Learn. Carnegie-Mellon University, Department of Computer Science Pittsburgh, PA, 1984.
  20. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  21. Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18:1527–1554, 2006.
  22. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
  23. Sepp Hochreiter and Jürgen Schmidhuber. Long Short-term Memory. Neural computation, 9(8):1735–1780, 1997.
  24. Dominik Hörnel and Wolfram Menzel. Learning Musical Structure and Style with Neural Networks. Computer Music Journal (CMJ), 22(4):44–62, 1998.
  25. Eric J Humphrey, Juan Pablo Bello, and Yann LeCun. Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics. In International Society for Music Information Retrieval Conference, pages 403–408, 2012.
  26. Michael I Jordan. Serial Order: A Parallel Distributed Processing Approach. Technical report, Institute for Cognitive Science, University of California San Diego, 1986.
  27. Robert M Keller and David R Morrison. A Grammatical Approach to Automatic Improvisation. In Sound and Music Computing Conference, pages 11–13, 2007.
  28. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems, volume 1, page 4, 2012.
  29. Andrew Lambert, Tillman Weyde, and Newton Armstrong. Beyond the Beat: Towards Metre, Rhythm and Melody Modelling with Hybrid Oscillator Networks. In Joint 40th International Computer Music Conference and 11th Sound & Music Computing conference, Athens, Greece, 2014a.
  30. Andrew Lambert, Tillman Weyde, and Newton Armstrong. Studying the Effect of Metre Perception on Rhythm and Melody Modelling with LSTMs. In Tenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2014b.
  31. Andrew J. Lambert, Tillman Weyde, and Newton Armstrong. Perceiving and Predicting Expressive Rhythm with Recurrent Neural Networks. In 12th Sound & Music Computing conference, Maynooth, Ireland, 2015.
  32. Edward W. Large, Felix V. Almonte, and Marc J. Velasco. A Canonical Model for Gradient Frequency Neural Networks. Physica D: Nonlinear Phenomena, 239(12):905–911, June 2010.
  33. Hugo Larochelle and Yoshua Bengio. Classification using discriminative restricted Boltzmann machines. In International Conference on Machine Learning, pages 536–543. ACM Press, 2008.
  34. Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient Backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.
  35. Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, (521):436–444, 2015.
  36. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In International Conference on Machine Learning, pages 609–616. ACM, 2009.
  37. Qi Lyu, Zhiyong Wu, and Jun Zhu. Polyphonic Music Modelling with LSTM-RTRBM. In ACM Conference on Multimedia, pages 991–994. ACM, 2015.
  38. David A Medler. A Brief History of Connectionism. Neural Computing Surveys, 1:18–72, 1998.
  39. James Martens and Ilya Sutskever. Learning Recurrent Neural Networks with Hessian-free Optimization. In Proceedings of the 28th International Conferenceon Machine Learning (ICML-11), pages 1033–1040, 2011.
  40. Leonard B Meyer. Emotion and Meaning in Music. University of Chicago Press,1956.
  41. Leonard B Meyer. Meaning in music and information theory. The Journal of Aesthetics and Art Criticism, 15(4):412–424, 1957.
  42. Michael C Mozer. Connectionist music composition based on melodic, stylistic and psychophysical constraints. Music and Connectionism, pages 195–211,1991.
  43. Nicola Orio. Music Retrieval: A Tutorial and Review, volume 1. Now Publishers Inc., 2006a.
  44. Marcus Thomas Pearce. The Construction and Evaluation of Statistical Models of Melodic Structure in Music Perception and Composition. PhD thesis, City Uni-versity London, 2005.
  45. Frank Rosenblatt. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological review, 65(6):386, 1958.
  46. David E. Rumelhart, James L. McClelland, and the PDP Research Group, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition,Vol. 1: Foundations. MIT Press, 1986.
  47. Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. Restricted Boltzmann Machines for Collaborative Filtering. In International Conference on Machine Learning, pages 791–798. ACM, 2007.
  48. Roger N Shepard. Geometrical Approximations to the Structure of Musical Pitch. Psychological Review, 89(4):305, 1982.
  49. Paul Smolensky. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194–281. MIT Press, 1986.
  50. Donald F Specht. Probabilistic neural networks. Neural networks, 3 (1):109–118,1990.
  51. Athina Spiliopoulou and Amos Storkey. Comparing probabilistic models for melodic sequences. In Machine Learning and Knowledge Discovery in Databases, pages 289–304. Springer, 2011.
  52. Ilya Sutskever and Geoffrey E Hinton. Learning Multilevel Distributed Representations for High-dimensional Sequences. In International Conference on Artificial Intelligence and Statistics, pages 548–555, 2007.
  53. Ilya Sutskever, Geoffrey E Hinton, and Graham W Taylor. The Recurrent Temporal Restricted Boltzmann Machine. In Advances in Neural Information Processing Systems, pages 1601–1608, 2009.
  54. Graham W Taylor, Geoffrey E Hinton, and Sam Roweis. Modeling Human Motion using Binary Latent Variables. In Advances in Neural Information Processing Systems, pages 1345–1352, 2007.
  55. Graham W Taylor and Geoffrey E Hinton. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In International Conference on Machine Learning, pages 1025–1032. ACM, 2009.
  56. Tijmen Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine Learning, pages 1064–1071. ACM, 2008.
  57. Peter M Todd. A Connectionist Approach to Algorithmic Composition. Computer Music Journal, 13(4):27–43, 1989.
  58. Peter M Todd and D Gareth Loy. Music and Connectionism. MIT Press, 1991.
  59. Petri Toiviainen. Modeling the target-note technique of bebop-style jazz improvisation: An artificial neural network approach. Music Perception, pages 399–413,1995.
  60. Max Welling, Michal Rosen-Zvi, and Geoffrey E Hinton. Exponential Family Harmoniums with an Application to Information Retrieval. In Advances in Neural Information Processing Systems, pages 1481–1488, 2004.
  61. Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
  62. Ronald J Williams and David Zipser. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation, 1(2):270–280, 1989.

This above post is an excerpt from my doctoral thesis (with minor modifications so that it makes sense outside the context of the manuscript), accepted in July, 2016.

Oral Presentation at the 28th International Joint Conference on Neural Networks

My paper was accepted accepted for oral presentation at the 28th International Joint Conference on Neural Networks, held in the picturesque town of Killarney in Ireland. The title of the paper is quite a mouthful – “Discriminative Learning and Inference in the Recurrent Temporal RBM for Melody Modelling” and its abstract is the following:

“We are interested in modelling musical pitch sequences in melodies in the symbolic form. The task here is to learn a model to predict the probability distribution over the various possible values of pitch of the next note in a melody, given those leading up to it. For this task, we propose the Recurrent Temporal Discriminative Restricted Boltzmann Machine (RTDRBM). It is obtained by carrying out discriminative learning and inference as put forward in the Discriminative RBM (DRBM), in a temporal setting by incorporating the recurrent structure of the Recurrent Temporal RBM (RTRBM). The model is evaluated on the cross entropy of its predictions using a corpus containing 8 datasets of folk and chorale melodies, and compared with n-grams and other standard connectionist models. Results show that the RTDRBM has a better predictive performance than the rest of the models, and that the improvement is statistically significant.

I presented the paper in the session on Recurrent Neural Networks. The model that we proposed in the paper – the RTDRBM – was the first original Machine Learning contribution of my PhD. And it was a pleasure to collaborate with my friend and colleague Son Tran in the work. He presented a second paper at the conference titled, “Efficient Representation Ranking for Transfer Learning” .

With Son and my supervisor Artur after my presentation.

Yet again a conference has taken me to a place in the world that I probably would’ve never visited otherwise! This doesn’t at all mean that the visit wasn’t worthwhile. The lush green Irish landscape, the charming town of Killarney and the abounding nature around it, and a friendly and welcoming hostel all made this a very memorable trip! Unfortunately, I had sore throat and a fever during much of my stay so I chose Irish coffee over a pint of Guinness (which I heard tastes much better in Ireland) when I had the chance. I regret this, but maybe that’s another reason to visit Ireland once again sometime!

On one of my healthier days in Killarney.

Oral Presentation at the 15th International Society for Music Information Retrieval Conference

We had two papers accepted at the 15th International Society for Music Information Retrieval Conference (ISMIR). Given the fantastic experience I had at ISMIR the year before, I was super-excited to travel to Taipei to attend the conference. The first of these papers is titled, “Multiple Viewpoint Melodic Prediction with Fixed-Context Neural Networks” and is in some ways a continuation of my work from the previous ISMIR conference. The abstract of the paper is as follows:

“The multiple viewpoints representation is an event-based representation of symbolic music data which offers a means for the analysis and generation of notated music. Previous work using this representation has predominantly relied on n-gram and variable order Markov models for music sequence modelling. Recently the efficacy of a class of distributed models, namely restricted Boltzmann machines, was demonstrated for this purpose. In this paper, we demonstrate the use of two neural network models which use fixed-length sequences of various viewpoint types as input to predict the pitch of the next note in the sequence. The predictive performance of each of these models is comparable to that of models previously evaluated on the same task. We then combine the predictions of individual models using an entropy-weighted combination scheme to improve the overall prediction performance, and compare this with the predictions of a single equivalent model which takes as input all the viewpoint types of each of the individual models in the combination.”

The paper was presented as a poster. The second paper is based on very interesting work I did in collaboration with Siddharth Sigtia and Emmanouil Benetos on automatic transcription of polyphonic music, titled “An RNN-based Music Language Model for Improving Automatic Music Transcription” that Siddharth presented as another poster.

I have to note that this year’s ISMIR organisation was fantastic! Everything from the review process, information on the website to the venue, the assitance at the venue, and the banquet were very well managed and executed by the organisers. The most interesting part of the conference for me was the keynote lecture, titled “Sound and Music Computing for Exercise and (Re-)habilitation” by Prof. Ye Wang, in which he described the potential in music to serve as a means to rehabilitate and improve the quality of life of individuals with different ailments, and illustrated this with the help of a few projects his group at the National University of Singapore has been working. It was a very inspiring talk, and I really admire Dr. Wang’s statement regarding the often overlooked direct impact of research and published work to society which has been the cornerstone of these projects. I have lately taken interest in Music Therapy and have been going through some literature to see if my own work on music modelling can in some way be applied to achieve therapeutic goals. There were some interesting late-breaking sessions as well that I took part in, including the very successful one organised by my supervisor Tillman on Big Data and Music where I was taking notes during the discussion.

And finally, as is always the case when I attend a conference, I did take some time off in Taipei and its surrounding areas. On one evening, I joined some friends and colleagues to go see the tallest building in the city – Taipei 101.

Jan and I with Taipei 101 in the background (Photo Courtesy: Marius Miron)
Jan and I with Taipei 101 in the background (Photo Courtesy: Marius Miron)

On another day, a couple of us planned a day-trip to a nearby village called Jiufen where we checked out some temples, the market and the old Japanese mining village on top of a hill.

The gang that went on a day-trip to Jiufen, led by our lovely host Kailie (first from the right).
The gang that went on a day-trip to Jiufen, led by our lovely host Kailie (first from the right).

And on another day, I joined my buddy Marius on a local site-seeing round to see some local museums, Shilin night market, Chiang Kai Shek Memorial, and other places before taking the long flight back to London eventually.

Taipei was fantastic, and I’d be up for another visit anytime! Last but not least, the hospitality of Fun Taipei hostel made the whole trip a little better each day.

Poster Presentation at the Machine Learning Summer School

I was selected to attend the Machine Learning Summer School in Reykjavik between April 25-May 4, 2014. I was also awarded a travel grant to attend this event which made it possible for me to attend it. I also proposed to present a poster about my ongoing work on musical pitch prediction with neural networks.

Many of the topics were very new to me, but I found the tutorials on Machine Learning and HCI (Roderick Murray-Smith), Introduction to ML (Neil Lawrence), Deep Learning (Yoshua Bengio), Probabilistic Modelling (Iain Murray), and Reinforcement Learning (David Silver) particularly interesting. Especially the last talk seemed like there was much in it that could be adopted into my own work on music modelling and I was very tempted to do so. Let’s see how that goes.

I was also a bit stressed carrying out experiments for a paper we’re submitting to the 15th International Society of Music Information Retrieval Conference (ISMIR 2014). So fingers-crossed that it will all work out for that.

I managed to travel a little while I was in Reykjavik. This was something that had to be done given how novel a destination Iceland is. I joined the rest of the workshop attendees on the Golden Circle Tour that showed us some fascinating and very alien Icelandic landscapes.

Mount Esjan as seen from my hostel (Kex Hostel) in Reykjavik. Iceland is full of stunning nature!

And finally, I made a last-minute trip to the Blue Lagoon on the day before my return to London.

Last-minute decision to visit the Blue Lagoon that certainly paid off!

It was indeed very fortunate that I was able to attend the summer school in Reykjavik. This has been an incredible learning experience one of the most unique destinations I have been to in my entire life!

I’m sharing a copy of the poster (made using Beamer/LaTeX) I presented here.

MLSS 2014 Presentation

Oral Presentation at the 14th International Society for Music Information Retrieval Conference

Having had one paper accepted at the 14th International Society for Music Information Retrieval Conference (ISMIR 2013), I travelled to Brazil for two weeks where I was in Curitiba first for a week where the conference was being held and then in Rio for the rest of the time on a holiday. ISMIR is the leading conference when it comes to research in Music Information Retrieval and other related topics in Music Technology. The paper I presented there was titled, “A Distributed Model for Multiple Viewpoint Melodic Prediction”. Its abstract is the following:

“The analysis of sequences is important for extracting information from music owing to its fundamentally temporal nature. In this paper, we present a distributed model based on the Restricted Boltzmann Machine (RBM) for melodic sequences. The model is similar to a previous successful neural network model for natural language. It is first trained to predict the next pitch in a given pitch sequence, and then extended to also make use of information in sequences of note-durations in monophonic melodies on the same task. In doing so, we also propose an efficient way of representing this additional information that takes advantage of the RBM’s structure. In our evaluation, this RBM-based prediction model performs slightly better than previously evaluated n-gram models in most cases. Results on a corpus of chorale and folk melodies showed that it is able to make use of information present in longer contexts more effectively than n-gram models, while scaling linearly in the number of free parameters required.”

Welcome to ISMIR 2013
Welcome to ISMIR 2013

The paper was chosen for an oral presentation, and it also won a Best Student Paper Award at the conference. On the final day of the conference, I also organised a late-breaking session on “MIR in Music Education” which is a topic I am very interested in, and also participated in several other sessions organised by others.

Receiving my Best Student Paper prize with other prize recipients.
Receiving my Best Student Paper prize with other prize recipients.

I also met a very interesting guy named Anderson during my stay at the Knock Knock hostel in Curitiba, who is also a PhD student doing his research on Armadillos!

A very good friend I made during my stay at the Knock Knock Hostel in Curitiba
A very good friend I made during my stay at the Knock Knock Hostel in Curitiba

Then I travelled to Rio de Janeiro for a week where I lived in a hostel located just a few minutes away from Copacabana beach. I spent my time there hanging out at the many beaches, and visiting iconic landmarks such as Cristo Redentor and Sugarloaf mountain among other places recommended to me by the locals I met in the hostel, and also taking a bus tour with some other tourists.

Something very unusual I noticed at Escadaria Selaron!
Something very unusual I noticed at Escadaria Selaron!

I was also joined there by my supervisor Tillman, and my friend and colleage Reinier who accompanied me during some site-seeing.

Tillman and Reinier with a bust of Heitor Villa-Lobos
Tillman and Reinier with a bust of Heitor Villa-Lobos

In all this was a fabulous experience and I thoroughly enjoyed my time in Brazil! I’m sharing a copy of my paper and presentation slides below.

ISMIR 2013 Paper presentation-ismir-2013

ISMIR 2013 Presentation