Completed Learn TypeScript Course on Codecademy

Back in December, I completed the C# course on Codecademy. This was mainly in order to be able to broaden the scope of my contribution to the project I was involved with at Unity and take a step outside the Python bubble I had for so long limited myself to. This was shortly before I went on paternity leave. On returning from my leave, however, I found that a lot had changed. Most relevant to this post, the team had decided to move away from the Unity-oriented C# implementation of the product to one in TypeScript.

I, for one, had never programmed before in TypeScript but this was not going to deter me from giving it a shot. Having had a positive experience in learning C# on Codecademy, I was pleased to see that they had a course on TypeScript as well and signed up for it immediately. What was different this time was that I wasn’t learning TypeScript solely from the course. I already had a team implementing a product with this language, and this served as a very valuable means to get hands-on experience with it while I got myself familiar with the fundamentals and features of the language through the course. I noticed quite a few similarities to Python (focus on scripting, pyenv versus nvm for managing language versions and packages, similarity between npm / npx commands and the python command), and this helped me hit the ground running. What unexpectedly helped me a lot was my use of mypy in making my Python code type-safe over the past year or so thanks to my colleague Matti’s insistence. I felt totally at home in adding and manipulating the types of variables in TypeScript, which is something that would’ve taken me a while to get used to otherwise. Anyway, now I’m actively contributing to our new codebase thanks to the kind feedback and review of my colleagues and it feels good to be getting better at a new way of expressing myself in code :-).

Over the past three years, I have (and might I add, serendipitously) ended up programming in Standard ML, Racket (both as part of an excellent series of courses on functional programming offered on Coursera), C# and now TypeScript. It’s hard to measure how better a programmer this has made me, but it has certainly broadened my perspective to what one can do across programming languages and how it’s only a matter of getting used to some basic (often superficial) differences in how one reads and writes code before being able to apply what one has learnt or used in a previous language. The theoretical concepts, of course, are very similar. It’s just that some languages make it easier to do certain things than others. And this was one of the things that was emphasised in the Coursera courses on Programming Languages. Over the past year I have worked with some excellent programmers at Unity, some of whom have been academically involved with Programming Languages, and I bet any wisdom I have to offer in this little post would only scratch the surface of what they might have to say on the subject. Anyway, I really look forward to see where things go from here for me!

Oh, and of course, I did get a shiny new certificate of completion from Codecademy!

Completed Learn C# Course on Codecademy

It’s been just over a year since I started working at Unity, and it might come as a surprise to some that I haven’t coded in C# during all this time. Why? Because Unity is written in C# and so is much of what is created using Unity. To be fair, I was able to get on with my work using just Python given that it was mostly research and prototyping. I did try learning to code in C# through some Unity tutorials back in December 2020, but those turned out to be more advanced than I could handle and didn’t cover the absolute basics well enough (at least what I had come across) so I shelved that effort then.

Fast-forward to a year later in December 2021, and I found myself having to go on a month long leave to be in India while my wife and I expect our first child. Other than taking care of her and catching up with my parents who I hadn’t visited since around two years owing to the pandemic, I had quite a bit of free time on my hands. So I decided to take up the task of learning C# once again. This time, I came across Codecademy, where I found introductory courses for different programming languages including C#. And I’m talking real beginner stuff – how to declare variables, conditionals, loops, etc. This was great because, although I could understand these basics well enough without a course, it seemed to gracefully lead on to more advanced concepts such as classes, interfaces, inheritance, LINQ, etc.

So I got started with the course. Codecademy provides one with a browser-based editor and terminal to write and compile one’s code for convenience. I actually found it annoying because it was quite sluggish, didn’t have auto-completion and gave me no idea of how I would write the code on my own machine having set up everything I need to run C# code locally. So I took some time to research how this is done – installing the Mono Compiler and the necessary Dotnet libraries for Linux to write and compile C# and Dotnet projects respectively. Not just that, it took a while to then install the necessary Omnisharp libraries in order to make auto-completion work in my editor of choice – Vim! I’ll try to write a more detailed post on this sometime later, but no promises – the baby is here now ;-).

Once all that was done, I was ready to go! I got done with the course in about a month, at my own steady pace. Repetition was the key – I made sure to write every piece of code myself in my local machine although a lot of it was the same, such as imports, base class and main function declarations and so on. That really helped with developing fluency and also getting a sense of what is needed and not needed in different scenarios. The course, I must say, is very well-designed. The first few chapters were bordering boring for me because I was already familiar with much of what was there, having programmed in C / C++ in the long past but once I got to the chapter on Classes and Objects, things started to get more interesting! What helped was that over the past year, I maintained some discipline in writing typed Python code with the help of MyPy and also using abstract classes via the ABC Python module. C# being strongly typed, the practice of using types in Python obviously helped. But working with abstract classes in Python certainly made it easier to understand interfaces and inheritance in C# better! Same when it came to references, because I did read up quite a bit on mutability in Python. And finally, the section on LINQ was a lot of fun, and bore resemblance to the kind of step-by-step data-processing code I wrote about a year ago in PySpark.

So, now that I’m done with this course, I have this shiny certificate acknowledging my effort!

No alternative text description for this image

Next, my plan is to head over to CodeWars and get started with a few Katas to further consolidate what I have learned so far! And perhaps think of an idea that would help me better understand how to organise my C# code into a larger project. Still a long, exciting way to go!

Getting Started with Python Pandas

I finally decided to get myself familiar with pandas while working on a recent side-project related to recommender systems. When I got started with it, I was still stubborn that I could achieve most things I needed to do in relation to data pre-processing with Python modules like tools like glob, json, numpy and scipy. True as that may be, I found myself spending way too much time writing routines to process the data itself and not getting anywhere close to working on the actual project. This was very reminiscent of the time a few years ago when I got immersed in writing code to manually compute gradients for various neural network architectures while getting nowhere in developing a music prediction model before finally deciding to make my life easier with theano! And so, this seemed like the perfect time to get started with learning pandas.

In the past I’ve found that, especially when it comes to learning useful features of new modules in Python, a hands-on and practical approach is much better than reviewing documentation and learning various features of a module without much of an application context, so I started looking around for such tutorial introductions to pandas. In the process I came across two invaluable resources that I thought I’d highlight here in this blog post. These really aren’t much, but gave me a surprisingly thorough (and quick) start to employ pandas in my own project.

Kaggle Learn

Kaggle Learn has a bunch of very well-organised and basic introductory Micro-courses on various Data Science topics from Machine Learning, to Data IO and Visualisation. I get started with the Pandas Micro-course which proved to be the ideal starting point for someone like me that had never used the module previously. This can be followed up with some of the other micro-courses, such as the one on data visualisation or embeddings which help one understand various concepts better through application. In fact, it’s what I’m planning to do as well!

Pandas Exercises on GitHub

So the Pandas Micro-course was a great starting point, but still left me wanting more practice on the topic as I still didn’t feel totally fluent. It was then that I stumbled upon a fantastic compilation of Pandas exercises on GitHub by Guilherme Samora. So I cloned the repository, loaded these exercises up on Jupyter Notebook and got down to solving them one after another! This really did help with getting more fluent with the rich set of tools that Pandas has to offer.

By the time I was done with Guilherme’s exercises (only a couple of days after starting with the Kaggle micro-course), I felt ready to apply my newly acquired pandas skills to my own project, and to discover more about the module through it. There certainly were plenty more resources that a quick Google search returned, but none appealed as much to me at a first glance, as the two I finally went with.

I’m sure I have only scratched the surface when it comes to useful pandas learning resources, and I’m very curious to hear about those that others have found useful, and why, so that I can look them up as well! So do feel free to share them in the comments below.

Tensorflow Tip: Pretrain and Retrain

I recently ran into a situation where I had to initially train a neural network first on one dataset, save it and then load it up later to train it on a different dataset (or using a different training procedure). I implemented this in Tensorflow and thought I’d share a stripped down version of the script here as it could serve as an instructive example on the use of Tensorflow sessions. Note that this is not necessarily the best way of doing this, and it might indeed be simpler to load the original graph and train that graph itself by making its parameters trainable, or something else like that.

The script can be found here. In the first stage of this script (the pre-training stage) there is only a single graph which contains the randomly initialised and trained model. One might as well avoid explicitly defining a graph as Tensorflow’s default graph will be used for this purpose. This model (together with its parameters) is saved to a file and then loaded for the second re-training stage. In this second stage, there are two graphs. The first graph is loaded from the saved file and contains the pre-trained model whose parameters are the ones whose values we wish to assign to those of the second model before training the latter on a different dataset. The parameters of the second model are randomly initialised prior to this assignment step. In order for the assignment to work, I found it necessary to assign parameters across graphs and this could be done by saving the parameters of the first model as numpy tensors and assigning the values of these numpy tensors to the right parameters of the second model.

The Tensorflow Datasets API for Sequence Data (Code Examples)

This post was originally meant to be an entire tutorial (with a link to the GitHub repository) on how to use the Tensorflow Datasets API and how this contrasts with the placeholder approach for passing data into Tensorflow graphs that is generally more widely used. Unfortunately, I’m unable to set aside the time for writing about it in detail as I had originally intended, and thus I’m sharing the code with a few notes to help one make use of it.

First off, here is the link to the GitHub repository. It contains two main scripts – placeholder_vs_iterators.py and generator_vs_tfrecord.py. The first script implements three ways in which data can be passed into the Tensorflow graph. Note that in all cases this is sequence data. The first is the standard placeholder approach that most are familiar with. The second uses iterators and the third uses feedable iterators respectively to input data to the graph. The latter two are what I gathered to be the new methods to pass data into the graphs that the new Tensorflow Datasets API introduces. The script can be invoked with an integer command-line argument (1, 2 or 3) that chooses between the three approaches.

The second script is generator_vs_tfrecord.py. Having played around a bit with the two new data input approaches – iterators and feedable iterators, I decided to stick with the former in examining three different ways in which one can iterate through data while passing it into the graph during training. The first takes unbatched sequences via a generator function and applies certain standard preprocessing steps to it (zero-padding, batching, etc.) to it before using the data to train the model. The second approach begins with data that has been zero-padded and batched and passes that to the model via a generator function during training. The final approach first creates a Tensorflow Record file following the SequenceExample Protocol Buffer and reads sequences from this file, zero-pads and batches them before passing them to the graph during training. The third approach is what I would consider the most dependent on the Tensorflow Datasets API whereas the other two are to a greater extent reliant on Numpy. This script is also invoked with an integer command-line argument (1, 2 or 3) that chooses between the three approaches.

So there you have it! The rest of the code contains global constants for batch size, sequence length, etc. that can be changed as needed, basic training loops and a simple LSTM model to get the above examples working. I found, when I first got started with using this API around the time Tensorflow 1.4 was released, that there were few fully working examples that use it (and this is often what one is looking for when getting started with it) so I decided to share this code. One can realise more complex data pipelines using this handy API by building on these examples.

I refer the readers to the following useful links to understand more about the API, and Google Protocol Buffers while going through the code:

The Tensorflow Datasets API Blog Post

The Official Tensorflow Documentation

A blog post that helped me understand the SequenceExample Protocol Buffer format better.

A useful StackOverflow post on fixed and variable length features.

The definition of the Example and SequenceExample ProtoBufs