Scratching the surface of Artifical Intelligance, Machine Learning & Transfer Learning

In November 2019, I started to learn about artificial intelligence, and machine learning. I had found an article
through Google, which talked about a new website they were promoting called “teachable machine”.

The goals of teachable machine are to introduce machine learning, which is a branch of artificial intelligence, to everyday people. All the training and capture and output is all done within the browser. The images that you train the model with, or the sounds or the poses, never leave your browser, they only get processed on the browser.

I remember getting on a train from Kings Cross to York, and thinking, could I train an artificial intelligence on the train while we were heading North. I had bought two different types of drink, and my colleague had a drink as well. We spent about 20 minutes waving drink cans in front of my laptop on the train, to the amusement of all the passengers around us who inquired what we were doing. When I said I was training an artificial intelligence, the looks on the train were very funny.

Within 20 minutes, we had trained an artificial intelligence to recognise three different types of drink can, and I was very pleased with myself, and very excited to show my wife.

That week, I bought a lot of different types of beer, and took them around to my friend's house, where we attempted to train an AI to understand the differences between the different types of beer cans and replicate the experience I'd had on the train. This proved hard because one of my friends was wearing a jumper which was very similar to the colour of the tins of beer. And the joke at the end of the night was we had just made a jumper model instead of a beer model.

I started researching as everyone does on YouTube, and I came across “The coding train” by Dan Schiffman. He did a talk on the version of teachable machine (v2). I watched his first video and, and was I bowled away by his exuberant style, and how he could teach quite complicated things very easily.

I watched, re-watched and played around with P5.JS and ML5.JS throughout Christmas.

Machine Learning is a huge field and thankfully one of my colleagues Heather Dawe (https://www.linkedin.com/in/heatherldawe/) is a data science expert. I spent a few hours asking questions to get my head around all the field.


Dan Schiffman has a lot of great content and I eventually caught up with his series and eagerly awaited the videos on convolution neural networks – these seemed to be the most useful for an idea I was starting to form. Once I realised that I could train my own Convolutional Neural Network (CNN) with anything any input, then I got me really thinking.

In February, I had watched a video of someone who had used a built a magic wand using the new Arduino Nano BLE sense board. They had uploaded into the Arduino Nano, a TensorFlow light model. And I thought this was clever, you could wave a magic wand round, and it would understand what spells you were trying to cast. But when I read this, and I looked at other people's examples, I realised that actually, it was going to be quite complicated, they had to train the model that you had to press a button to indicate, you want to train the model with a spell. I thought, this isn't very practical, plus the model is on the wand and this makes it harder to train for a kid.
 

I realised that what Dan had illustrated within the coding train series, was the intelligence could be put into the browser, instead of onto the device itself. So, I then started to look at the Arduino Nano with the BLE, and the BLE sense, this has both a low energy Bluetooth chip in it, as well as nine degrees of freedom inertial measurement unit (IMU). If I could get the data off the Nano into the browser, at a fast enough rate, then I could start to get a feed a CNN in the browser.

One of the issues that I found was that the Bluetooth Low Energy (BLE) could only transmitted up to 251 bytes, this was must slower than the amount of data being produced by the IMU. This wasn't a problem for the other projects that I'd seen, because they were all doing the processing on to the board itself.

So I experiment it for a week, to find out what the best way to transmit the data, I opted to sample the IMU at 100Hz (100 times per second), the IMU is configured in a FIFO pattern so the movements would come out in the same order, without this movements weren’t being classified correctly. The data was sent over BLE at 10Hz (10 times per second) with each piece of data containing 10 movements from the IMU. This had the effect of losing no motion data.

I could now transmit the information from the wand to my browser.

The web site that I wrote, has two pages  The first is a training pages which allows you to train the convolutional neural network with the spells that you are trying to cast. The second page, is the post training stage where the fully trained model can be used to determine what spells you are casting.

One of the biggest things that I found in the training was that the model initially didn't understand the difference between a wand at rest and when it was moving. I spent a lot of time recording data of just having the wand down by my side, moving the wander around, but not in a way that was a spell. This effectively creates the resting state of the one model classifications.

The model has this single classification to start with, and then you can add new classifications to it, which are the spells.

In order to make it more interesting, I took inspiration from fantasy characters that you go through the different levels of sorcery. This was to encourage people to train each spell 10 times. Once you've done that, you could save that spell and then create the next spell, you could get the spells names, as well.

So initially, I trained the one with two spells “circle” and “Zorro”

One thing that I was worried about was that the model would take too long to classify each spell. There are certain constraints on the spells. Each spell has to be cast within a two second window, in order to effectively fill up a buffer of information.

The buffer store the x, y, and z coordinates from IMU,  constantly sending the these to the model in the same sequence they were presented during training.

X1,Y1,Z1,X2,Y2,Z2,…..X200,Y200,Z200

 The ring buffer is constantly written to as new Bluetooth data is received and the ring buffer is constantly presented to the model for classification.

I initially worried that classification would take a long time, however, once I tuned the ring buffer to the correct length (600) and understood how many nodes the model would work with to the input nodes on the model.

There is no real sophistication here, the ring buffer can store 2 seconds of movements, at 100Hz from the IMU. This is 600 values of X,Y,Z coordinates. The CNN also has 600 input nodes and the number of output nodes matches the number of spells + 1.

When the spells are recorded they are normalised using a min/max normalisation method in the JavaScript (https://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range).

The results were quite spectacular considering it was an old laptop, despite only having a fifth generation i5 processor, the model is classifying within six milliseconds. With the confidence level set to 0.8 (80%) the model was good at determining the spell being cast.

Here are the videos I watched to learn all of this

https://www.youtube.com/watch?v=jmznx0Q1fP0
https://www.youtube.com/watch?v=TOrVsLklltM


https://www.youtube.com/watch?v=fFzvwdkzr_c


https://www.youtube.com/watch?v=yNkAuWz5lnY


https://www.youtube.com/watch?v=KTNqXwkLuM4


https://www.youtube.com/watch?v=j-ZLDEnhT3Q


https://www.youtube.com/watch?v=EnblyAdZG8U


https://www.youtube.com/watch?v=OIo-DIOkNVg


https://www.youtube.com/watch?v=psfZzffno3k


https://www.youtube.com/watch?v=UaKab6h9Z0I


https://www.youtube.com/watch?v=FYgYyq-xqAw


https://www.youtube.com/watch?v=8HEgeAbYphA


https://www.youtube.com/watch?v=qPKsVAI_W6M


https://www.youtube.com/watch?v=Lfv3WJnYhX0

Creative Commons "Attribution-Share Alike"  Target Architecture