Completed Andrew Ng’s “Structuring Machine Learning Projects” course on Coursera

I successfully completed this course with a 96.7% mark. It was fairly easy given my experience so far in machine learning and deep learning, but there were a few new ideas that I learned here and also others that I investigated in greater depth out of my own curiosity while doing it. I felt like the Transfer Learning, Multitask Learning and End-to-End ML lectures are not really useful immediately after the course unless one takes these up after the course in greater depth as the lectures on these topics were quite superficial and brief. The practical advice, however, and the hand-on exercises that focused on real-world scenarios were useful and I wish there was more of the latter (perhaps optional) in the course.

Here’s a link to the certificate I received from Coursera for this course.

My Experience in Applying for a Work Visa at the UK Home Office

This is more a rant out of frustration than anything else, and I hope this will help others get a sense of what a nightmare it can be to deal with the UK Home Office when something goes wrong.

Over 2 months ago on the 9th of May, 2017 I submitted an application for a Tier 2 Work visa with my employer’s backing to the UK Home Office. With the fee that we paid for this application (GBP 1,354.00), a processing time of 8 weeks or less was guaranteed to us on the Home Office website. However, it has been nearly 10 weeks now and we have not received a decision or a status update on my visa application. I called them up around 2 weeks ago when the 8-week period had passed (5th of July, 2017) and after a near 30 minute wait, the lady who answered my call casually told me that there were delays, that I would hear from them in “a couple of weeks”, and not to panic. I explained to her how the fact that the Home Office is in possession of my passport and current residence permit is causing me a lot of inconvenience and she recommended that I request my documents back from them through their website. I tried this, but I was not allowed to do so without withdrawing my application altogether as my employer is not a “Premium Sponsor” – they’re only a young start-up so I was kind of expecting this to be the case.

A few days after this, I came to know through the Citizen’s Advice website that I might be able to get more information about the progress of my application or possibly have it expedited by contacting my local MP (Ms. Marsha de Cordova who is the Labour MP for Battersea) and I did that as well. I’m now waiting for her office to respond. In the meantime, I checked the status of my application on the Home Office website on Saturday (15th July, 2017) and was surprised to see that a decision on my application had been made on the 4th of June, 2017 and so I should have received my documents back by the 14th of June. Neither of those have happened and we’re still in the dark as to the status of my application and the whereabouts of my passport and current residence permit.

I, once again, called the Home office earlier today (17th of July, 2017) to inquire about my application in light of the aforementioned new information that a decision had been made about it. After nearly an hour long wait, I got through to a representative. All the service he said that he could offer me at that point was to forward the details of my application to one of his colleagues who I would hear back from in 3-5 working days. I was also struck by his lack of any empathy whatsoever when I expressed my concern and anxiety on being kept in the dark about the state of my application and important documents way beyond the service standard that was communicated to me. This is besides the point anyway – I can’t expect some random representative of the Home Office to act as my crying shoulder. I hung up feeling a bit worthless, but whatever.

I’m very unhappy about this entire experience. And it seems to defeat the purpose of making a formal complaint about the Home Office processes to a department in the Home Office itself but I’ll do it anyway. My passport being in possession of the Home Office beyond the deadline for processing my application is inconvenient, to say the least. I have had to refrain from any travel outside the UK, and hold off making financial transactions between the UK and India through my bank as my passport is required for these purposes. I could prepare myself to put up with it for the two months that I was told it would take to process my application, however, now I have absolutely no idea when my documents would be returned to me, where they are, and when a decision on my visa application will be made. I found the response from the Home Office representatives very unsatisfactory, and I feel that I am being taken for granted by being kept in the dark with no sense of urgency in returning my documents.

I’ll wait for the 3-5 days as I’ve been told. If anyone who reads this has any other advice for me that might help me, please do post your advice in the comments below. I would appreciate it.

Edit (25th of July, 2017): Got my passport back with the approval and the new residence permit as well. As it turns out, these were mailed to an address that I moved out of in mid-May after making the application when I was still living there. To their credit, the Home Office did indeed do the job well within time (and I really appreciate the caseworker’s effort when in comes to that). I do still maintain that their helpdesk is by far the worst that I have ever come across with unsympathetic representatives, incredibly long waiting time, and their inability to give me simple answers regarding my application. Also, having had access to both my email address and telephone number, it would’ve helped if they had communicated the fact that my application had been processed through at least one of these channels instead of relying solely on Royal Mail.

Reflections on Three Months of Remote Work

Beginnings…

It started with my wife Nina and I deciding that it would be good for us to move to Hyderabad for three months starting Oct 2, 2016 until Jan 8, 2017 for various reasons. As I was keen on continuing work at Jukedeck, I proposed the idea of me working remotely during this period to Patrick, the COO of the company. After some deliberation and another meeting with Ed (the CEO), much to my delight, the company decided to give it a try under the condition that we would review this arrangement each month and be quick to act in case of any unexpected (negative) eventualities. I was very excited and at the same time anxious as this was the first time I ever worked remotely from home.

Preparation

Before leaving London, I had a quick meeting with my team lead Kevin who was very supportive of this idea and we discussed a few things while leaving others to be dealt with as and when needed. First, we decided that I would be working from 11AM until 7PM IST instead of my usual working hours of 9:30AM to 5:30 PM BST which would (considering the 4.5 hours difference in time between Hyderabad and London) give me five hours of overlap in time with my team and three hours during which I would be by myself. We did also note that daylight savings time would set clocks in the UK one hour behind making the time difference 5.5 hours from the initial 4.5 hours but agreed to consider the option of me starting an hour later when this happens. We also agreed that I would update my team with my work every morning on the standup channel we have on Slack. If there were any brainstorming meetings, I would have the opportunity propose ideas before the meeting and again after going through the Google Doc containing the minutes of the meeting after it finishes. And I would be in touch with my team through Slack and, whenever needed, Skype. We also discussed a few worst case scenarios where, if this arrangement did not work out, I would consider switching to part-time work or even a sabbatical leave until my return to London in January.

Setting Things Up

On arrival in India, without any delay my first task was to setup an office at home. My parents, who we were living with during these three months, allowed me to use the guest room/study as my office. I had a reasonably quiet space with a big enough desk to work on. Although it went through a few changes as time went by, it essentially looked like this:

Home Office

Once I got started this way, I was ready to go! In the rest of this post, I’ll write about some of the things that stand out in my memory from others that were more mundane and easy to forget.

Participating in Standups Remotely

About two weeks into my move out of London I started noticing that I was unable to keep up with what some of my teammates were working on. I realised that this was due to the fact that while I was updating everyone with my work on Slack, the reverse was not happening. I brought this up with Kevin and we decided that the simplest thing to do would be for me to attend standups via Skype. We started first with one of the team members holding a laptop during standups with a Skype session which turned out to be a bit cumbersome in addition to the poor audio/video quality. Switching to a mobile phone was less cumbersome but still didn’t help the quality. We then came to know that it was possible to send video messages over Skype and while this was not real-time, it was certainly very clear and allowed me to go over standup in my own time. So we settled with this.

I suppose the bright side of this arrangement was that standups were brief, concise and to the point. There is a tendency for standups to turn into discussions about something very specific, involving only some of the team members while others wait without necessarily knowing what the conversation is about. It certainly avoided such a situation, and I even had a couple of my team mates acknowledge this benefit to me since we started with it.

Pair Programming Remotely

I was assigned a task at one point that required pair programming with my colleague, Marco. This was the first time for both of us to take part in remote pair programming. The first alternative we tried was to use the Atom editor plugin called atom-pair. It worked, however, as this was around the time when my broadband connection quality was at its worst the editor took several minutes to update the text that Marco typed, on my screen. It was bad. We then decided to switch to a more lightweight alternative as we have our trusty Jukedeck server, Ada. The setup was the following. We both connected to Ada via SSH. Once we were in, I started a Tmux session and opened the Python source file using Vim. Marco switched users to be me (he had superuser privileges on the server so I did not have to share my password with him for this) and attached himself to the same Tmux session from his end. Despite the lag, this worked like a charm! This setup came with the added benefit that we could open any number of shells through Tmux, and also have the IPython interpreter running alongside our editor to test our changes. While doing this, we also had a Skype session open where we discussed things. We carried on for about 5 hours with this with hardly any interruptions and got quite a bit of work done.

For the first attempt, I think this went very well. And a win for the very minimal command-line approach to work that I am strongly in favour of. As an alternative to both users involved in remote pair programming using one of the users’ accounts, a dedicated pair programming account can be created on the server which has access to all the relevant source repositories and to which multiple users have access. This would help if one or both of the users engaged in pair programming do not have superuser privileges.

Making Presentations Remotely

At Jukedeck, we have what we call Lunch & Learn (L & L) sessions where a member of the team (or someone the company invites) makes a presentation about a particular topic that might benefit or be interesting to others. I volunteered to do my first L & L session on “Machine Learning at Jukedeck” on Nov 1, 2016 where I planned to go over the basics of machine learning and how we employ it to power our AI music composer. The setup was fairly basic and simple. We initiated a Skype conversation on my colleague Eliza’s laptop, and I emailed a copy of my presentation to her so that she could navigate through it while I spoke from the other end. It went on smoothly without any interruptions and the message seemed to have gone across quite well. I did answer a few questions too, but couldn’t follow a few others due to a poor signal.

I found this to be a nice way to stay in touch with everyone else in the company (apart from my team members who I was liaising with everyday regarding work) and make my presence felt. I was keen on doing another L & L remotely, however, there was not enough time for this before my return to London.

Internet Issues

The only thing I wish had worked out better was my internet connection. Although we had a working 10 Mbps connection from BSNL (India’s National ISP), it was far from reliable. There were brief and frequent outages throughout the day on many days which was frustrating when loading webpages, pulling code changes from GitHub or working remotely via SSH on our company server. My only consolation was a patient and polite customer service, and the courteous technicians they sent forth to fix the connection. Fortunately, the worst of my connection woes lasted only during the first two weeks after which things got better.

To add to my troubles, the IP address (of my home router) from which I connected to our server in London kept changing on a daily basis, and since we had IP-based access restrictions in place I had to share my new IP every morning with Marco who would allow me to connect from it. We did this for about two weeks when we decided to simply unblock a range of IPs from which I seemed to be connecting. In contrast to this, in the UK, one’s public IP (say at home) does not seem to change over time which is what motivated this IP-based access restriction and made it possible in the first place. So now that I’m back, all those IPs are once again blocked and things are more secure once again.

Change in Working Hours

In the second month of my remote work, daylight savings time kicked in and I was one extra hour ahead of my team back in London. Kevin let me decide whether or not I would like to change when I started my day. Initially I did, so that I have the same number of overlapping hours with my team. After about two or so weeks, I found that this was not working out, mainly because I was almost completely losing the most productive part of my day – the morning hours before lunch. Plus, my day typically ended between 8-9PM and this nearly ruled out any prospects of making plans for the evening.

I decided to altogether break away from the daily 11-7 routine, and started even earlier in the morning on certain days when I didn’t anticipate much interaction with my team members. And by this time, both Kevin and I were seeing things work well and had the confidence that moving things around a bit is a minor risk to take if there was a chance of me being more productive. And it certainly didn’t make things worse!

Change of Location

It was around a month and a half after I first started working from my parents’ guest room when my cozy little home-office in the study stopped feeling as cozy. It felt isolating, and I just didn’t look forward to going in there every weekday morning. Clearly, I needed a change of environment. I’d been reading some books about remote work around that time (more on these below) and they suggested either trying out cafes or coworking spaces which would be bustling with some activity that might alleviate the feeling of isolation and lead to a healthier state-of-mind.

For a start, I moved to the dining room. This helped, as it was a bigger space and I’d see people more often than I did in the study. A friend of mine also put me in touch with one of his friends (a senior of mine from the IIIT-Hyderabad) told me of one of his batchmates who had gone on to found his own Data Science startup Predera that had an office in Hyderabad, and he was more than happy to let me work from there if I wanted to. As the office was at least a 30 minute drive away from where I was, I kept postponing my visit and ultimately didn’t end up going there but it was certainly very generous of him to keep the offer open!

The Ups

What I found particularly nice about this setup was that there were very few distractions and thus it was a joy to code, review research literature and GitHub pull requests thoroughly. Furthermore, any conversation I had with my colleagues related to work was concise and to the point. I was in the quiet comfort of my home and the nature of my work which mostly involved individual work with the occasional discussion with a colleague or two was well-suited for a remote setup. Personally, I didn’t find it hard to motivate myself to stick to a work schedule and I would like to attribute this to four years of learning to do this during my PhD. I felt more often like I did justice to the work I took up because of the lack of distractions. One can almost see this as an exercise in self-discipline.

I was also able to skip the roughly 1 hour long daily commute between my home and the office, which was a noticeable change. I wouldn’t really count this a positive change as I cycle to work everyday in London and rather enjoy doing it. However, I can imagine that for someone who drives or takes the public transport to work to work this might come as a change that they would welcome. In my case, I spent the hour I saved on exercise and running in a nearby park so there was not so much difference in what I spent the time for.

And I was also reminded on several occasions during the three months, what a fantastic team at Jukedeck I was a part of! All my team members were supportive of my move, patient and creative in dealing with any glitches that arose thereof and not once showed any signs of disapproval. Kevin was very good at assigning tasks to me that were both challenging and that I could work independently on with some discussion with others in the team. This really minimised delays and feelings of anxiety in me that I wasn’t able to contribute which could have ensued otherwise.

The Downs

While the experience was mostly positive, there were of course some downsides that became evident after just he first couple of weeks. As much as I did make the effort to share my input before and after the meetings, I felt a little less in control when it came to the direction in which the meetings went as this usually involved debating and persuasion which were much easier to do by being there in person. This effect would be far less pronounced if we were a fully remote company, in which case our processes would work the same for everyone. However, this wasn’t the case and while many things did work out, good communication was the biggest challenge among those that didn’t.

Physical absence from the office did feel isolating on a few occasions. I made up for this by engaging in the occasional friendly banter with colleagues over Slack and responding to their non-work related posts which helped me feel like I was still a part of what was happening in the office. It was also important to get out of the house a couple of times during the week just for some change of environment otherwise I felt locked-in. I missed all the team outings and lunches which was a very good opportunity to bond at a personal level with my teammates.

I did see myself falling behind some new developments taking place in the office, particularly those that came about during meetings that I was unable to attend due to the time-difference or poor communication between me and the team in London during the meeting. At least in my case, Skype (or Slack video chat) did not work as well as I had hoped they would. I would say the success-rate was around 40%.

Again, these downsides were not something that couldn’t be addressed but I thought it would only be fair to mention them along with the things that did work out. I’ll not speculate about how it would have turned out otherwise, but I sure was happy to be back in person to the Jukedeck office in London after three months.

If You’re Interested in Remote Work Too…

There is an excellent Hacker News post that answers several questions related to remote work, and also contains some very handy links to websites that facilitate remote work and create opportunities for those seeking to make a career out of working remotely. Here I came across two very well-written books on working remotely. The first is called “Remote: Office not Required” by the founders of 37Signals (now Basecamp) which is a company that has seemingly mastered the art of effective remote work. And the second is “The Ultimate Guide to Remote Work” by the folks at Zapier. You can even get a free PDF/MOBI copy of this book on their website. While there is some degree of (I should note, reasonable) self-marketing that went into both these books, they are definitely well worth reading for anyone wanting to get an insight into the pros and cons of remote work. Essentially, what all the different resources gathered in this Ask HN page suggest is that thanks to technology, we’re heading towards a world where remote work (at least in the tech sector) is becoming more and more feasible for those seeking a change from the 9-5 office work. It certainly gave me something to relate to, tips to follow and a feeling of being a part of a larger (but not large in an absolute sense) movement.

In Retrospect

As much as I had my apprehensions (as I often do with many things), I think this was a fantastic experience overall – I got to spend time with my parents after nearly two years of being away in the UK busy with my PhD, meet old friends, get married to my lovely wife Nina who I must thank for insisting on moving to India for three months, and last but obviously not the least be a part of a work arrangement that was indeed something new and unique in my experience. It got me interested and researching about making a career working remotely, which is something I believe I’m likely to follow up on at some point later in my career.

Jukedeck @ The Science Museum Lates

I had the opportunity to join my colleagues at Jukedeck – Patrick, Lydia, Eliza, Matt, Katerina and Gabriele – at the Science Museum Lates last night. For those of you that are unfamiliar with the concept, Lates are adults-only, after-hours theme nights that take place in The Science Museum (in London) on the last Wednesday of every month. It is attended by various organisations that would like to showcase their work relating to a chosen theme to an audience, as well as an audience that is keen on learning more about the science and technology underlying the theme. On the last day of August 2016, it was Jukedeck’s turn to show-off its awesome technology at the museum and some of us volunteered to tag along.

Lydia and me (in the background) explaining what Jukedeck and its technology is about to curious visitors at our stall.

The museum was packed with visitors, and it was great to see so many people interested in our technology! I hardly had the time to go grab some dinner amidst the constant stream of people wanting to listen to our music and know more about the underlying algorithms. To me, as someone who does the research and writes the code that generates our music, this was an incredibly rewarding experience to see first-hand the appreciation people had for our work. It’s, in many ways, like having a poster presentation at a conference but with a non-technical audience. I enjoyed it very muchIn the future, I’ll try my best not to let such opportunities pass. And I look forward to attending the event myself in the future as a spectator! If you happen to be in London around the time this event is on, I highly recommend attending it if you’re interested in science and technology.

Music and Connectionism

The many contributions made during the past three decades to computer-assisted analysis and generation of music with the aid of Connectionist architectures can be seen to have occured in two waves, in parallel with developments in Connectionist research itself. During the first wave, the founding principles of Connectionism were introduced (Rumelhart et al., 1986) through the idea of Parallel Distributed Processing according to which mental phenomena occur as a result of simultaneous interactions between simple elementary processing units, as opposed to the then prevailing notion of Sequential Symbolic Processing which explained the same phenomena in terms of sequential interactions between complex goal-specific units. Its significance is largely theoretical, with a few experimental and empirical results to support the feasibility of the theory. Following several years of reduced interest, the second wave further strengthened the claims made by its precursor through a series of successful high-impact real-world applications. This was owing to both the proposal of newer theories, and the availability of greater computational power and vast amounts of data that enabled the demonstration of the efficacy of these theories nearly two decades on (Bengio, 2009; LeCun et al.,2012). The innovations that came about as a result of these two phases trickled down to several application domains (Krizhevsky et al., 2012; Hinton et al., 2012;Collobert et al., 2011) of which music is one (Todd and Loy, 1991; Griffith and Todd,1999; Humphrey et al., 2012). This section reviews notable contributions among the many that demonstrated the application of connectionism to symbolic music modelling during these two waves in order to present a historical perspective together with an overview of the techniques employed.

The First Wave

The first set of notable approaches which apply Connectionism to the analysis and generation of symbolic music were proposed in the years following the publication of the influential text on Parallel Distributed Processing (Rumelhart et al.,1986). While the breadth of contributions to the field during this period is indeed vast, I present a brief historical perspective only on work involving Feedforward Neural Networks, Recurrent Neural Networks and Boltzmann Machines, and refer the reader to (Rumelhartet al., 1986; Medler, 1998) for more in-depth and comprehensive reviews. Many of the inventions and algorithms proposed during this period persisted through the decades that followed and significantly impacted research in Artificial Intelligence, and the now thriving field of Machine Learning. These were the years that saw the maturation of the previously proposed Perceptron (Rosenblatt, 1958) into the Multi-Layer Perceptron (also known as the Feedforward Neural Network) and the invention of the Backpropagation algorithm (Rumelhart et al., 1988) which offered a simple and efficient means to train this model on data, thus leading to a surge in its popularity. The architecture of the Feedforward Neural Network (FNN) was further adapted to deal with sequential data into the Recurrent Neural Network(RNN) (Elman, 1990; Jordan, 1986), and likewise, the Backpropagation algorithm extended into the Backpropagation Through Time (BPTT) (Werbos, 1990) to train this new architecture. Other algorithms were also proposed around the same time to carry out real-time learning in the RNN architecture (Williams and Zipser, 1989). Another significant innovation from this period is the Boltzmann Machine family of models (Smolensky, 1986; Hinton et al., 1984), which consists of undirected graphical models that learn joint probability distributions of sets of visible and latent variables through a process of minimisation of an energy function associated with configurations of these variables. Probabilistic inference can be carried out in these models to determine conditional distributions, typically of interest in various prediction tasks.

Contributions to Connectionist theory and Artificial Intelligence, such as the above, generated interest in their adoption into several application domains that foresaw their potential benefits. This included the computer-assisted analysis and synthesis of music. One of the first systems for this purpose, known as HARMONET (Hild et al., 1992), was designed for harmonising chorales in the style of J S Bach. It consists of a symbolic (rule-based) component together with a recurrent neural network, and generates four part harmonisations of a given chorale melody. The role of the neural network is to generate human-like harmonisations within the rules dictated by music theory, which when taken literally tend to result in “aesthetically offensive musical output”. HARMONET divides the harmonisation task into three subtasks. In the first, a harmonic skeleton of the chorale melody is generated for every quarter note of the given melody (which essentially involves determining the bass voice of the chorale) using a recurrent neural network. The network takes as inputs harmonies generated at previous time-steps, and also the local context and global position (with respect to the beginning of the melody) of the note at the current time-step to generate a harmony for it. A novel representation for the pitch of each musical note was introduced at this stage which encodes the harmonic functions that contain the note, thus introducing hand-crafted musicological information as input to the network. This is followed by the generation of the alto and tenor voices taking into account the given soprano voice in the melody, and the bass voice generated in the previous step. Finally, ornamenting eighth notes are added to the result at each chord by another network which takes into account the local harmonic context. The system was evaluated by an audience of music professionals who judged the quality of the harmonisations. By treating each of the possible harmonizations of the first network above as classes and changing its output units to softmax (Specht, 1990), the system can be used for predicting harmonic expectation over time.

The work initiated in the context of HARMONET was later extended to create MELONET (Feulner and Hörnel, 1994) – a system comprised of a hierarchy of neural networks operating at different time-scales which models melodies assequences of harmony-based motifs and varies one of the chorale voices generated by HARMONET. It uses, what are known as delayed-update neurons in a recurrent network which, by integrating their inputs over a certain time-span reflectlong-term information about the melody input. It works hand-in-hand with HARMONET to generate the said variations. In a subsequent publication, a committee of such neural networks, each of which has learned a specific harmonisation style, was used to recognise different styles of harmonisation according to how expected it is to each network (Hörnel and Menzel, 1998).

Chorale harmonization has also been the focus in (Bellgard and Tsang, 1994) where, in contrast to HARMONET, the approach relies solely on a connectionist model — the Boltzmann Machine (BM) (Hinton et al., 1984; Smolensky, 1986). Four-part writing in practice is regarded here as being the result of a unique set of choices made by the composer between various competing harmonization techniques, which is not clearly defined in practice and is thus essentially an imprecise and noisy process. This is where the stochastic nature of the model employed is highlighted as an advantage over the deterministic nature of models from previous work i.e., HARMONET, CHORAL (Ebcio ̆glu, 1988). The task is viewed as one of pattern completion (or gap-filling) where a given chorale melody is only the partial specification of a complete piece of information which is the harmonized chorale. A BM learns local harmonic constraints through a series of overlapping time-windows extracted at each time-step in the chorale. Harmonization is achieved in an identical fashion, but with the learned model slid (in time) along a given chorale melody. Its visible units are comprised of a mixture of multinomial and binomial units which represent three octaves of musical pitch, musical rest and phrase-control variables. The energy-function associated with a Boltzmann machine to assess the quality of learning in the model is also used to assess the quality of the harmonies generated by the model. Sliding the BM in time along a temporal input gives, what is referred to by the authors as the Effective Boltzmann Machine (EBM).

As music is inherently temporal in nature, recurrent neural networks (RNNs) are a natural choice for modelling musical structure. In one of the first applications of neural networks to music (Todd, 1989), a special case of the RNN known as the Jordan network (Jordan, 1986) was made to memorize and interpolate between melodies of different styles. The network consists of an input, hidden and an output layer. A part of the input layer consist of a set of plan units that indicate the style of the melody being learned or produced. The rest is a set of context units which maintain the memory of the sequence produced so far by combining the effects of the most recently predicted output in the sequence (which is fed back as input) and an exponentially decreasing sum of all of the network’s previous inputs. The input layer is fully connected to the hidden layer which is in turn fully connected to the output layer. The network models sequences of pitches and durations, and uses a fixed size time-window of notes in its input context units and predicts the same number of notes as those in the input time-window for the next time-step which are fed back to its input layer. All melodies are transposed into the key of C, and a binary one-hot representation (a vector containing all 0s and a single 1 corresponding to a particular value) is used for pitch. A time-slice representation is used for duration where the length of a note is given by the number of consecutive evenly spaced time-slices (of eighth-note duration), with additional information about its onset. The purpose of the network is to memorize melodies that it has come across, associating each melody with a plan so that it can also interpolate between melodies when plans are interpolated, and change melodies dynamically as well when plans are changed.

An often cited work in connectionist music composition is that of Mozer (1991), where an RNN named CONCERT is empoyed for learning structure in melodies to generate novel variations on them. In contrast to the above described approach in (Todd, 1989), this network uses an Elman RNN (Elman, 1990) and also contains a learning stage (absent in the other) where the backpropagation through time (BPTT) algorithm (Werbos, 1990) is applied to tune the weights of the network to the prediction task. The task is to predict the next note, given the previous one and the state of its hidden layer in the most recent time-step which accounts for the notes further back in time that are not dealt with explicitly. The shortcomings of the network’s architecture in dealing with long-term memory and global structure of a musical piece are addressed by taking into account the notes in the melody at multiple time-resolutions, and also employing an additional parameter that enabled controlling its sensitivity to recent versus not so recent notes in a melody. With the generation of aesthetically pleasing melodies being the focus of the network, the task-unaware one-hot representation of notes in it is abandoned (or retained only for the sake of interpreting results) in favour of a perceptually motivated one, based on earlier empirical observations by Shepard (1982). The model was evaluated by having it extend a C major diatonic scale, learn the structure of diatonic scales, learn random walk sequences of pitches, learn specific kinds of phrase patterns and generating new melodies in the style of J S Bach.

A different approach inspired by the Target-note Technique in Bebop jazz is explored by Toiviainen (1995), wherein given a typical jazz chord progression an auto-associator network emulates the creativity of an improviser. The melodies generated by the model rely on the starting notes at any given point in time, together with the current chord to determine the possible melodic patterns, and the next chord in the progression to determine the possible target notes to follow. Several constraints that reflect the typical practices in jazz improvisation, such as the relationship between the musical pitch of a note in the melody and the root of the current chord, typical chord-types occurring in jazz progressions, typical syncopation in improvised melodies, etc. influenced the design choices for the architecture of the network. The network relies on the Hebbian learning rule for updating its connections while learning from data. A moving time-window approach was adopted for representing time, where each window corresponded to one half-measure. Thus in each step of its operation during the generative process, the network generated a melody of length equal to a half-measure, which was fed back into it in order to generate the next one, and so on. The fact that such a network learns to generate music from examples in a dataset, much like a typical jazz musician who improvises based on the repertoire that she/he has paid attention to overtime is what motivates this approach. The author concludes that “the melodies produced by the network resemble those of a beginning improviser”, based on a qualitative assessment of its generations learned from excerpts of solos played by the trumpet player Clifford Brown, over chord changes in George Gershwin’s “I’ve Got Rhythm”.

The above list of connectionist systems for the analysis and synthesis of symbolic music consists of notable contributions among those that laid the foundations for future work on the subject. It is, by no means exhaustive, and there exist several others that explore other musical phenomena with connectionist architectures considered beyond the scope of this review. I point the inquisitive reader to (Todd and Loy, 1991; Griffith and Todd, 1999) for a comprehensive summary of work carried out in the field during, what I refer to here as, the first wave of connectionism.

The Second Wave

The second wave of interest in neural networks and connectionism, which has prevailed for nearly a decade (with hardly any signs of subsiding) at the time of writing of this post, can be said to have come about towards the end of what is generally known as the AI Winter (Hendler, 2008). Its success has been attributed to the culmination of three key factors — theoretical and empirical advances in connectionist research, the presence of very powerful hardware in modern computers, and the availability of vast amounts of data. This wave brought with it several new innovations in connectionist architectures and algorithms which also fueled a revival in the study and application of older ones brought about by its precursor. The theoretically known, but often practically infeasible concept of a deep neural network (a feedforward neural network with more than one hidden layer) was made into reality during this period with the introduction of new methods for pre-training these networks layer-by-layer in an unsupervised fashion before training on a certain task in a supervised manner (Bengio et al., 2007; Hinton et al., 2012). The Restricted Boltzmann Machine (RBM), a generative unsupervised model which was, in part responsible for this turnaround, was extended in many different ways to serve as a supervised learning model and a classifier (Salakhutdinov et al., 2007; Larochelle and Bengio, 2008), a sequence learning model (Sutskever and Hinton, 2007; Sutskever et al., 2009; Taylor et al., 2007) and generalised to handle different types of data (Welling et al., 2004). The RBM, in turn, soared in popularity thanks to the Contrastive Divergence algorithm (Hinton, 2002; Tieleman,2008) which made it possible to train this model more efficiently than was previously possible. Likewise, the limitation of recurrent neural networks in modelling very long-term memory was also addressed to increase their effectiveness as sequence models (Martens and Sutskever, 2011). A previously proposed architecture to address the same issue of long-term memory — the Long Short Term Memory (LSTM) network (Hochreiter and Schmidhuber, 1997) was also revisited and is now even more widely used as a sequence model, with proposals of other models inspired by it (Chung et al., 2014). Another architecture that underwent a breakthrough is the Convolutional Neural Network which is now the de facto standard for object recognition and related image recognition and classification tasks (Krizhevsky et al., 2012). All these advances had a significant impact on three application areas — Natural Language Processing, Speech Processing and Computer Vision (Lecun et al., 2015), the very tasks in which the failure of Artificial Intelligence to perform well in the past was an important reason for a drop in interest in the field, i.e. the AI Winter.

This revival of interest in connectionist research inspired a body of work that deals with a diverse set of musical tasks using symbolic music. One such application was in modelling melodies by capturing short melodic motifs in them using a Time Convolutional RBM (TC-RBM) (Spiliopoulou and Storkey, 2011). In contrast to other RBM-based sequence models (Sutskever et al., 2009; Taylor and Hinton, 2009), the TC-RBM does not make use of any recurrent connections and relies on the idea of convolution through time over fixed-length subsequences within a window centered at each time-step (Lee et al., 2009). Furthermore, a weight-sharing mechanism which features in this model helps it achieve translation invariancealong time, which is desirable as motifs can occur anywhere in a musical piece. The approach models both the pitch and duration of notes, and uses an implicit representation of time by discretising it in eighth-note intervals. A two-fold evaluation of this model was carried out with the model on the Nottingham Folk Music Database. A qualitative evaluation involved the analysis of the latent distributed representations learned by the TC-RBM when presented with musical data in its visible layer, which were found to convey information about the scale, octave and chords. In a quantitative evaluation, the model was made to predict the next k time-steps given a fixed-length context. The prediction log-likelihood was computed approximately by sampling from the model, and the Kullback-Leibler divergence was used to determine the closeness of the model’s predictions to the empirical distribution.

As a continuation of a previously proposed probabilistic grammar based approach for generating Jazz solos known as the Impro-visor (Keller and Morrison, 2007), a Deep Belief Network (Hinton et al., 2006) (DBN, a probabilistic generative model made up of a stack of the aforementioned Restricted Boltzmann Machines) was experimented with for the same purpose (Bickerman et al., 2010). As modelling entire melodies, or solos requires dealing with long-term dependencies that are not feasible with a non-recurrent model such as the DBN, only 4-bar jazz licks (short, coherent melodies) are modelled at each time-step. As in some of the approaches outlined above, a sliding-window is used to model temporal information, with a window-size of one measure (4 beats) of the piece of music, and a step-size of 1 beat. The visible (input) layer of the DBN simultaneously modelled the joint distribution of the chromatic pitch-class, duration and onset, and octave of the melody note, and the chord underlying the melody, thus allowing the model to associate chords with various melodic features which is a key factor to consider in jazz music. The model was trained generatively using the Contrastive Divergence algorithm (Hinton, 2002; Tieleman, 2008) on a large corpus of 4-bar jazz licks. With the DBN being a stochastic generative model, novel jazz licks could be sample done beat at a time from it in generative mode. While it could be demonstrated that the model does indeed generate the desired licks, the authors conclude in favour of their previous grammatical approach to lick generation over the DBN stating the subjective quality of the generated licks and the large training time of the DBNs to support this choice.

The approaches described above use non-recurrent models which have largely been superceded, when it comes to the modelling of sequential data, by recurrent models that are a more natural fit for temporal data. In an attempt towards style-independent polyphonic music generation in (Boulanger-Lewandowski et al., 2012), an RNN-RBM is made to model sequential information directly from the piano-roll notation (Orio, 2006b). The reason for dealing with this notation is to avoid making any kind of prior assumptions regarding the nature of the modelling task that would simplify it, thus leaving much for the model to determine by itself. The RNN-RBM is a stochastic model and can be understood as a sequence of RBMs, which at each time-step of the sequence are conditioned by the hidden layer of an RNN. Thus in addition to the RNN modelling sequential information, the RBM models correlations between variables (MIDI note values) that occur simultaneously at each time-step. The latter is often ignored in standard RNNs, and can be viewed as an advantage of this model given sufficient data since it also entails the need for a greater number of model parameters. The model is targeted at the task of automatic music transcription and is thus required to model time in seconds incontrast to other symbolic music modelling approaches that represent time relative to the musical score, thus requiring an additional step of alignment between the audio and symbolic formats. Time, in this model, is represented in terms of consecutive slices of the quantised musical signal. It is trained using the mini-batch gradient descent and the Backpropagation Through Time algorithms. It was found that this model outperforms others addressing the same task. This work has also inspired other very close extensions with the same goal, that claim improved performance (Goel et al., 2014; Lyu et al., 2015).

A previous approach by Eck and Schmidhuber (2002) for modelling Blues music with a Long Short-Term Memory (LSTM) RNN can be said to have influenced the above described one (Boulanger-Lewandowski et al., 2012) in its choice to not incorporate any prior musicological information in order to simplify the modelling task. As mentioned earlier, the LSTM is an enhanced version of the basic RNN and has been shown to be able to successfully model longer temporal dependencies than the latter. Here, once again, successive slices of the musical signal are treated as time-steps. A quantisation step-size of 8 notes per measure was used, and thus the 12-bar blues musical segments used for training the model were each 96 time-steps in length. The first experiment carried out with this model involved having it learn and generate a musical chord structure, from which the authors conclude that this is a fairly straightforward task for the model, and also expected given its previous success in tasks involving counting. In the second experiment both melody and chords are learned, leading to a conclusion that the LSTM is indeed able to generate a blues melody constrained by the learned chord structure that sounds better than a random walk across the pentatonic scale and are faithful to the examples in the training set. The evaluation in this case is left to the listener who is encouraged to visit a webpage containing the pieces of music generated by the network.

A more recent study with the LSTM (Franklin, 2006) carried out further experiments with this model on jazz-related tasks. Here, various note representations were studied in order to incorporate musical knowledge into the network. This can be contrasted with the approach adopted in (Eck and Schmidhuber, 2002; Boulanger-Lewandowski et al., 2012) that avoids making any music theoretic assumptions. A pitch representation based on major and minor thirds known as the circle-of-thirds representation, and a duration representation known as the modular-duration representation which extends that proposed in (Mozer, 1991) were used to train the dual pitch/duration LSTMs. Two experiments were carried out. The first focused on short musical tasks, and only sequences of musical pitch were considered. These included outputting in sequence the four chord tones given a dominant seventh chord as input, determining whether or not a given sequence of notes are ordered chromatically, and reproducing a specific 32 note melody of the form AABA given only the first note as input. A single network was used for all these tasks. In the second experiment, which focused on long musical tasks, the objective was to learn the melody of the song Afro Blue composed by the jazz percussionist Mongo Santamaria. Two separate networks are used to learn musical pitch sequences and note duration sequences respectively. The study concludes in favour of the LSTM and a detailed qualitative analysis of the results with respect to the authors’ expectations.

Lambert et al. (2015) trained a two-layered RNN on the Mazurka dataset (MAZ), an audio dataset of expressively performed piano music. The first layer for the system is a Gradient Frequency Neural Network (GFNN) (Large et al., 2010), which uses nonlinear oscillators to model metre perception of a periodic signal. The second layer contains LSTM units which model the output of the GFNN and predict rhythmic onset as a time-series activation function. This work builds on previous experiments involving a symbolic data in which the authors find that the LSTM performs time-series modeling significantly better when GFNNs are used exclusively (Lambert et al., 2014a,b). Their GFNN-LSTM model was able to predict rhythmic onsets with an f-measure of 71.4%.

As stated before, there seems to be very little work focusing on connectionist models for information theoretic music modelling. One such attempt is presented in (Cox, 2010), where the relationship between entropy and meaning in music inspired by (Meyer, 1956, 1957) is explored with the help of Recurrent Neural Networks that estimate instantaneous entropy for music with multiple parts in the analysis of a string quartet piece composed by Joseph Haydn. The model considered here contains two components – a long-term model (LTM), and a short-term model (STM) (Conklin and Witten, 1995). The parameters of each model are learned through exposure to appropriate data. The LTM models global stylistic characteristic acquired by a listener over a longer time-span. The STM models context-specific information, available in a melody while it is being processed by the listener, in the generation of expectations. Predictions made by each modelare combined using ensemble methods, and this has been shown previously to improve the quality of predictions over individual models in the past (Conklin and Witten, 1995; Pearce, 2005). The work demonstrates that the entropies as predicted by the model are sensitive to the effects of cadences, resolutions, textural change, and interruptions in music.

References

  1. Matthew I. Bellgard and C P Tsang. Harmonizing Music the Boltzmann Way. Connection Science, 6(2):281–297, 1994.
  2. Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, et al. Greedy Layer-wise Training of Deep Networks. Advances in Neural Information Processing Systems, 19:153, 2007.
  3. Yoshua Bengio. Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning, 2(1):1–127, 2009.
  4. Greg Bickerman, Sam Bosley, Peter Swire, and Robert Keller. Learning to Create Jazz Melodies using Deep Belief Nets. In International Conference On Computational Creativity, 2010.
  5. Nicolas Boulanger-Lewandowski, Yoshua Bengio, and Pascal Vincent. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In International Conference on Machine Learning, 2012.
  6. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555, 2014.
  7. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (almost) from Scratch. The Journal of Machine Learning Research, 12:2493–2537, 2011.
  8. Darrell Conklin and Ian H Witten. Multiple viewpoint systems for music prediction. Journal of New Music Research, 24(1):51–73, 1995.
  9. Greg Cox. On the Relationship Between Entropy and Meaning in Music: An Exploration with Recurrent Neural Networks. In Annual Conference of the Cognitive Science Society, pages 429–434, 2010.
  10. Kemal Ebcio ̆glu. An Expert System for Harmonizing Four-Part Chorales. Computer Music Journal, pages 43–51, 1988.
  11. Douglas Eck and Juergen Schmidhuber. Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks. In IEEE Workshop on Neural Networks for Signal Processing, pages 747–756. IEEE, 2002.
  12. Jeffrey L Elman. Finding Structure in Time. Cognitive science, 14(2):179–211, 1990.
  13. Johannes Feulner and Dominik Hörnel. MELONET: Neural Networks that Learn Harmony-based Melodic Variations. In Proceedings of the International Computer Music Conference, pages 121–121. INTERNATIONAL COMPUTER MUSIC ASSOCIATION, 1994.
  14. Judy A Franklin. Recurrent Neural Networks for Music Computation. INFORMS Journal on Computing, 18(3):321–338, 2006.
  15. Kratarth Goel, Raunaq Vohra, and JK Sahoo. Polyphonic Music Generation by Modeling Temporal Dependencies using a RNN-DBN. In Artificial Neural Networks and Machine Learning–ICANN 2014, pages 217–224. Springer, 2014.
  16. Niall Griffith and Peter M Todd. Musical Networks: Parallel Distributed Perception and Performance. MIT Press, 1999.
  17. James Hendler. Avoiding Another AI Winter. IEEE Intelligent Systems, 2(23):2–4,2008.
  18. Hermann Hild, Johannes Feulner, and Wolfram Menzel. Harmonet: A Neural Net for Harmonizing Chorales in the Style of JS Bach. In Advances in Neural Information Processing Systems, pages 267–274, 1992.
  19. Geoffrey E Hinton, Terrence J Sejnowski, and David H Ackley. Boltzmann Machines: Constraint Satisfaction Networks that Learn. Carnegie-Mellon University, Department of Computer Science Pittsburgh, PA, 1984.
  20. Geoffrey E Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  21. Geoffrey E Hinton, Simon Osindero, and Yee-Whye Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18:1527–1554, 2006.
  22. Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. Signal Processing Magazine, IEEE, 29(6):82–97, 2012.
  23. Sepp Hochreiter and Jürgen Schmidhuber. Long Short-term Memory. Neural computation, 9(8):1735–1780, 1997.
  24. Dominik Hörnel and Wolfram Menzel. Learning Musical Structure and Style with Neural Networks. Computer Music Journal (CMJ), 22(4):44–62, 1998.
  25. Eric J Humphrey, Juan Pablo Bello, and Yann LeCun. Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics. In International Society for Music Information Retrieval Conference, pages 403–408, 2012.
  26. Michael I Jordan. Serial Order: A Parallel Distributed Processing Approach. Technical report, Institute for Cognitive Science, University of California San Diego, 1986.
  27. Robert M Keller and David R Morrison. A Grammatical Approach to Automatic Improvisation. In Sound and Music Computing Conference, pages 11–13, 2007.
  28. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems, volume 1, page 4, 2012.
  29. Andrew Lambert, Tillman Weyde, and Newton Armstrong. Beyond the Beat: Towards Metre, Rhythm and Melody Modelling with Hybrid Oscillator Networks. In Joint 40th International Computer Music Conference and 11th Sound & Music Computing conference, Athens, Greece, 2014a.
  30. Andrew Lambert, Tillman Weyde, and Newton Armstrong. Studying the Effect of Metre Perception on Rhythm and Melody Modelling with LSTMs. In Tenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2014b.
  31. Andrew J. Lambert, Tillman Weyde, and Newton Armstrong. Perceiving and Predicting Expressive Rhythm with Recurrent Neural Networks. In 12th Sound & Music Computing conference, Maynooth, Ireland, 2015.
  32. Edward W. Large, Felix V. Almonte, and Marc J. Velasco. A Canonical Model for Gradient Frequency Neural Networks. Physica D: Nonlinear Phenomena, 239(12):905–911, June 2010.
  33. Hugo Larochelle and Yoshua Bengio. Classification using discriminative restricted Boltzmann machines. In International Conference on Machine Learning, pages 536–543. ACM Press, 2008.
  34. Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. Efficient Backprop. In Neural networks: Tricks of the trade, pages 9–48. Springer, 2012.
  35. Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, (521):436–444, 2015.
  36. Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In International Conference on Machine Learning, pages 609–616. ACM, 2009.
  37. Qi Lyu, Zhiyong Wu, and Jun Zhu. Polyphonic Music Modelling with LSTM-RTRBM. In ACM Conference on Multimedia, pages 991–994. ACM, 2015.
  38. David A Medler. A Brief History of Connectionism. Neural Computing Surveys, 1:18–72, 1998.
  39. James Martens and Ilya Sutskever. Learning Recurrent Neural Networks with Hessian-free Optimization. In Proceedings of the 28th International Conferenceon Machine Learning (ICML-11), pages 1033–1040, 2011.
  40. Leonard B Meyer. Emotion and Meaning in Music. University of Chicago Press,1956.
  41. Leonard B Meyer. Meaning in music and information theory. The Journal of Aesthetics and Art Criticism, 15(4):412–424, 1957.
  42. Michael C Mozer. Connectionist music composition based on melodic, stylistic and psychophysical constraints. Music and Connectionism, pages 195–211,1991.
  43. Nicola Orio. Music Retrieval: A Tutorial and Review, volume 1. Now Publishers Inc., 2006a.
  44. Marcus Thomas Pearce. The Construction and Evaluation of Statistical Models of Melodic Structure in Music Perception and Composition. PhD thesis, City Uni-versity London, 2005.
  45. Frank Rosenblatt. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological review, 65(6):386, 1958.
  46. David E. Rumelhart, James L. McClelland, and the PDP Research Group, editors. Parallel Distributed Processing: Explorations in the Microstructure of Cognition,Vol. 1: Foundations. MIT Press, 1986.
  47. Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. Restricted Boltzmann Machines for Collaborative Filtering. In International Conference on Machine Learning, pages 791–798. ACM, 2007.
  48. Roger N Shepard. Geometrical Approximations to the Structure of Musical Pitch. Psychological Review, 89(4):305, 1982.
  49. Paul Smolensky. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. chapter Information Processing in Dynamical Systems: Foundations of Harmony Theory, pages 194–281. MIT Press, 1986.
  50. Donald F Specht. Probabilistic neural networks. Neural networks, 3 (1):109–118,1990.
  51. Athina Spiliopoulou and Amos Storkey. Comparing probabilistic models for melodic sequences. In Machine Learning and Knowledge Discovery in Databases, pages 289–304. Springer, 2011.
  52. Ilya Sutskever and Geoffrey E Hinton. Learning Multilevel Distributed Representations for High-dimensional Sequences. In International Conference on Artificial Intelligence and Statistics, pages 548–555, 2007.
  53. Ilya Sutskever, Geoffrey E Hinton, and Graham W Taylor. The Recurrent Temporal Restricted Boltzmann Machine. In Advances in Neural Information Processing Systems, pages 1601–1608, 2009.
  54. Graham W Taylor, Geoffrey E Hinton, and Sam Roweis. Modeling Human Motion using Binary Latent Variables. In Advances in Neural Information Processing Systems, pages 1345–1352, 2007.
  55. Graham W Taylor and Geoffrey E Hinton. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In International Conference on Machine Learning, pages 1025–1032. ACM, 2009.
  56. Tijmen Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine Learning, pages 1064–1071. ACM, 2008.
  57. Peter M Todd. A Connectionist Approach to Algorithmic Composition. Computer Music Journal, 13(4):27–43, 1989.
  58. Peter M Todd and D Gareth Loy. Music and Connectionism. MIT Press, 1991.
  59. Petri Toiviainen. Modeling the target-note technique of bebop-style jazz improvisation: An artificial neural network approach. Music Perception, pages 399–413,1995.
  60. Max Welling, Michal Rosen-Zvi, and Geoffrey E Hinton. Exponential Family Harmoniums with an Application to Information Retrieval. In Advances in Neural Information Processing Systems, pages 1481–1488, 2004.
  61. Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
  62. Ronald J Williams and David Zipser. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation, 1(2):270–280, 1989.

This above post is an excerpt from my doctoral thesis (with minor modifications so that it makes sense outside the context of the manuscript), accepted in July, 2016.