Completed the Course “Machine Learning with Big Data” offered by UCSD on Coursera

I successfully completed this course with a 98.9% mark. This course was relatively more focused than the others so far. The machine learning theory that was covered in it was very basic and good for beginners so I skimmed through it fairly quickly. Nevertheless, it was a good refresher of models such as Naive Bayes, Decision Trees and k-Means Clustering. What I found particularly useful was the introduction to the KNIME and Spark ML frameworks and the exercises where one had to apply these ML models to some example datasets.

I think this course and the last one were more hands-on and what I was looking for when I first started this module with a greater focus on ML in the context of Big Data.

And here’s the certificate that I was awarded on completing the course.

Completed the Course “Big Data Integration and Processing” offered by UCSD on Coursera

I successfully completed this course with a 97.7% mark. This course was once again broad and touched upon some big data technologies through a series of lectures, assignments and hands-on exercises. The focus was mainly on querying JSON data using MongoDB, analysing data using Pandas, and programming in Spark (Spark SQL, Spark Streaming, Spark MLLIB and Spark GraphX). All these were things I was curious about and it was great that they introduced these in the course. There were also an exercise on analysing tweets using both MongoDB and Spark. They had one section on something called Splunk which I thought was a waste of time but I guess they have to keep their sponsors happy.

This specialisation so far (I’m halfway through) has been fairly introductory and lacking depth. It’s been good to the extent that I feel like I’m aware of all these different technologies and would be able to know where to start if I was to use them for some specific application. As I expected, this course was more hands-on which was great!

And here’s the certificate that I was awarded on completing the course.

New Guitar Video – Man in the Box by Alice in Chains!

This is one of the first AiC songs I heard that got me into the band. While I had the broken finger, I also took some time off the fast licks and finger-intensive playing to learn some wah coordination and this was a great song to begin with. I used a Vox wah pedal here that I bought years ago. A bit squeaky but it worked alright. There’s still a few rough edges in this final recording and this is the result of the time I was willing to spend on perfecting it. Another Rush song coming up afte this!

The Tensorflow Datasets API for Sequence Data (Code Examples)

This post was originally meant to be an entire tutorial (with a link to the GitHub repository) on how to use the Tensorflow Datasets API and how this contrasts with the placeholder approach for passing data into Tensorflow graphs that is generally more widely used. Unfortunately, I’m unable to set aside the time for writing about it in detail as I had originally intended, and thus I’m sharing the code with a few notes to help one make use of it.

First off, here is the link to the GitHub repository. It contains two main scripts – placeholder_vs_iterators.py and generator_vs_tfrecord.py. The first script implements three ways in which data can be passed into the Tensorflow graph. Note that in all cases this is sequence data. The first is the standard placeholder approach that most are familiar with. The second uses iterators and the third uses feedable iterators respectively to input data to the graph. The latter two are what I gathered to be the new methods to pass data into the graphs that the new Tensorflow Datasets API introduces. The script can be invoked with an integer command-line argument (1, 2 or 3) that chooses between the three approaches.

The second script is generator_vs_tfrecord.py. Having played around a bit with the two new data input approaches – iterators and feedable iterators, I decided to stick with the former in examining three different ways in which one can iterate through data while passing it into the graph during training. The first takes unbatched sequences via a generator function and applies certain standard preprocessing steps to it (zero-padding, batching, etc.) to it before using the data to train the model. The second approach begins with data that has been zero-padded and batched and passes that to the model via a generator function during training. The final approach first creates a Tensorflow Record file following the SequenceExample Protocol Buffer and reads sequences from this file, zero-pads and batches them before passing them to the graph during training. The third approach is what I would consider the most dependent on the Tensorflow Datasets API whereas the other two are to a greater extent reliant on Numpy. This script is also invoked with an integer command-line argument (1, 2 or 3) that chooses between the three approaches.

So there you have it! The rest of the code contains global constants for batch size, sequence length, etc. that can be changed as needed, basic training loops and a simple LSTM model to get the above examples working. I found, when I first got started with using this API around the time Tensorflow 1.4 was released, that there were few fully working examples that use it (and this is often what one is looking for when getting started with it) so I decided to share this code. One can realise more complex data pipelines using this handy API by building on these examples.

I refer the readers to the following useful links to understand more about the API, and Google Protocol Buffers while going through the code:

The Tensorflow Datasets API Blog Post

The Official Tensorflow Documentation

A blog post that helped me understand the SequenceExample Protocol Buffer format better.

A useful StackOverflow post on fixed and variable length features.

The definition of the Example and SequenceExample ProtoBufs

New Guitar Video – Jacob’s Ladder by Rush!

I just finished recording and uploading the video of me playing this piece on YouTube, and I’m happy to share the link here! I’ve been learning to play it for a while now, and it was one of the pieces I could actually play as my broken little-finger was recovering after a minor bicycle accident. It’s so much fun to play with the 11/8 (or 11/16, whatever) time sections early on and the 13/8 (or 13/16, whatever) time sections towards the end of the piece! It’s fairly straightforward otherwise, but really requires some attention due to the repetitions that one tends to get carried away with. I love the solo composed by Alex Lifeson on this one, as I do most of his very clever guitar work! And as much as I could perhaps play it better, I’ll go easy on myself and settle with what I have in this video.

I’ve been learning to play another awesome song by Rush, that I’ll hopefully record and upload soon, stay tuned!

Completed Andrew Ng’s “Convolutional Neural Networks” course on Coursera

I successfully completed this course with a 100.0% mark. Unlike the other two courses I had done as a part of this Deep Learning specialisation, there was much to learn for me in this one. I had only skimmed over a couple of papers on conv. nets in the past and hadn’t really implemented any aspects of this class of models except helping out colleagues in fixing bugs in their code. So I was stoked to do this course. And I was not disappointed. Andrew Ng designs and delivers his lectures very well and this course was no exception. The programming assignments and quizzes were engaging and moderately challenging. The idea of 1D, 2D and 3D convolutions was explained clearly and in sufficient depth in the lectures. They also covered some state-of-the-art convolutional architectures such as VGG Net, Inception Net, Network-in-Network and also applications such as Object and Face Recognition and Neural Style Transfer net, to all of which convolutional networks are a cornerstone. The reading list for the course was also very useful and interesting. All in all, a great resource in my opinion for someone interested in this topic! And as usual, here’s the certificate I received on completing this course.

Completed the Course “Big Data Modeling and Management Systems” offered by UCSD on Coursera

I successfully completed this course with a 100.0% mark. It was quite broad and covered a range of topics somewhat superficially, from Relational Databases, their relation to Big Data Management Systems, the various alternatives that exist for processing different types of big data. As with the first course, there were a lot of new names to grasp and connections to be made between the things they represented. The assignments were straightforward and involved running a few specific command-line tools and spreadsheet commands to process data and carry out some basic analysis just to get a feel for data tables and how one might go about extracting information from them. The final assignment involved completing an incomplete relational database design for a game. In my opinion, its goals could have been more precise, its connection to the course material more clear, and being a peer-graded assignment the evaluation criteria more well-defined. Quite a few learners seem to have lost out due to someone else not being able to evaluate their assignment properly due to the latter shortcoming. And as usual, here’s the certificate that I was awarded on completing the course.

It looks like the upcoming courses in this specialisation contain more practical and hands-on exercises, so looking forward to that in the coming weeks!

(Automated) Curriculum Learning

I’ve lately spent some time reading about Curriculum Learning and experimenting with the algorithms described in two of the papers in this domain

Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009, June). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41-48). ACM.

Graves, A., Bellemare, M. G., Menick, J., Munos, R., & Kavukcuoglu, K. (2017). Automated Curriculum Learning for Neural Networks. arXiv preprint arXiv:1704.03003.

The first of the above can be considered important given how with empirical results supporting Curriculum Learning, it revived the interest among researchers in this technique. The second is one of the recently proposed approaches for Curriculum Learning that I thought would be interesting to understand in greater depth.

I’ve summarised my thoughts on these in a short presentation. I hope to share my code and results not too long from now as well.

Completed the Course “Introduction to Big Data” offered by UCSD on Coursera

I successfully completed this course with a 98.9% mark. It was easy and covered mostly definitions, some history of big data, big data jargon and very basic principles. There was an emphasis on what constitutes big data (in terms of size, variety, complexity, etc.), what kinds of analyses one can carries out on big data, what sources they can be from, and what tools one could use to analyse them. When it came to the latter, the course offered a brief introduction to the Hadoop  ecosystem that I found particularly interesting as I hadn’t ever worked with any of the software that is a part of this ecosystem. And there was also a simple assignment that gave one a taste of what working with Hadoop could be like. Here’s a link to the certificate I received from Coursera on completing this course.

Looking forward to the remaining courses in the Big Data specialisation!

Completed Andrew Ng’s “Improving Deep Neural Networks” course on Coursera

I successfully completed this course with a 100.0% mark. Once again, this course was easy given my experience so far in machine learning and deep learning. However, as with the previous course I completed in the same specialisation there were a few things that were worth attending this course for. I particularly found the sections on Optimisation (exponential moving averages, Momentum, RMSProp and Adam optimisers, etc.), Batch Normalisation, and to some extend Dropout useful. Here’s a link to the certificate from Coursera for this course.

I’m looking forward to the course on Convolutional Neural Networks!