What about using microcontrollers to run machine learning models? Microcontrollers are less resource constrained than they used to be, and can be augmented with single-board computers. What does this mean for the field?

This is a transcript of Phase Dock LIVE, December 12, 2020 with Chris Lehenbauer and Brandon Satrom.

It has been edited for length, clarity and readability.  Occasional time stamps are shown in square brackets [4:52] so you can watch the action and animations as they are described in the video.

We’ve split it into three parts for easy access.

Part Three: Machine Learning at the edge with microcontrollers and single-board computers.

[29:30] Brandon: When I started getting into machine learning a couple of years ago, the idea of using microcontrollers to run machine learning models seemed impossible. You might think, “Gosh, how does that even work?,” because microcontrollers are very resource constrained.

There are two things that are interesting here. One is that most of us are no longer using Arduino Unos, which I think have like 64kb of RAM, whereas a modern ESP32 has a megabyte. Devices have gotten beefier, and their processors have gotten faster. The ESP32 is a dual core microcontroller which is kind of nuts to think about even 3, 4, or 5 years ago.

But, in addition to that, there has been a ton of work by the team at Google and others to make it easy for machine learning to work on microcontrollers.

[30:58] I’ll go back to sharing my screen.

This is another book I recommend, is called TinyML. This is by Pete Warden, who is at Google and Daniel Situnayake who is at a machine learning start-up called Edge Impulse. I recommend checking their stuff out as well, especially Pete Warden’s blog.

[Editor’s note: all referenced resources can be found at the end of the blog post.]

They wrote an entire book on how you can take a model and pair it with an accelerometer or speaker or a MEMS microphone or something along those lines and actually do machine learning on microcontrollers.

This is fascinating for me. How this works is that frameworks like TensorFlow can actually pack models smaller than they were even two years ago, so you can fit the models on resource-constrained devices.

If you’re building something on a microcontroller, you follow a very similar process to building the model. Remember from my earlier slide, you create the data, not the algorithm. You’re walking the machine through the process of training itself based on the framework you set out.

[32:12] If I were to build a machine learning model for a microcontroller, basically like doing a gradient descent, I’d use a very simple y = mx + b. I’m just doing this slope of a line. This is a “toy” machine learning model because we know the algorithm for this, but it’s a nice way to look at machine learning on devices because you’re taking an algorithm and laying it over [your model] so you can see what it looks like on constrained devices.

What you do is you figure out, like if I have a set of x’s and y’s, you’re asking the computer to tell me what b needs to be to give me the correct gradient.

Like I did before, if you use a framework like TensorFlow you build the model, then you use TensorFlow Lite to convert it into a much smaller file than you would have in a desktop environment. I’m getting into the weeds here, but for microcontrollers, for folks who are doing MCU development this is the important part.

The whole reason this works on microcontrollers is because most machine learning models are nothing more than numbers.

They are just arrays and matrices of numbers. Because that is true, if you can pack a model into an array (in this case a C-Array), then you can run it on a microcontroller as long as the microcontroller has enough memory and enough RAM.

Like my regression model here, models can look very large, but still be small enough to fit easily on a device.

[34:21] Then you have an array of numbers that you can load into your C program. You can capture real-time data. In this case we’re just choosing a random number between zero and one. I’m not running the algorithm–I’m running the model to say: “If I give you this random number, knowing what you know now that you’ve trained yourself on the regression model, tell me what that’s going to look like.” Run inference is what that’s called: running a model to make a prediction based on the data. Then you get the results, the X values and the Y values. That’s my slope.

[35:00] And because we’re on a microcontroller you can run your regression model on a connected screen or something along those lines.

Another way where it ends up getting applied is gesture control. I don’t have source [code] for this demo but I’ll share a GIF of it here. [35:15] If you have an accelerometer on a device and you have a microcontroller trained to that model, you can train a model to detect circle, or slope, or a “W” or something along these lines. This is actually in the TinyML book I referenced earlier.

That’s a very powerful set of ideas. You can take that much insight and you can pack it down so small that it can fit inside of a smart speaker or a smart watch or something.

Chris: That is what blew me away. Ultimately, we’re talking about two or three things. You need enough compute power, you need enough storage, and ultimately as Pete Warden would say, you need enough power [to run the whole thing]. But assuming that you can solve that last problem, the other two have only become viable over the last few years as microcontrollers have gotten faster and with more storage crammed onto them. All of a sudden this is viable and it’s stunning.

Brandon: These things have certainly been important: the compute power, the storage on devices, the fact that there are frameworks like TensorFlow Lite for microcontrollers. [36.49] But, one of the things Pete Warden and the team at Google discovered that is so fascinating… and this is bit counter-intuitive… is that another way that they can make these models work is with something called quantization, where instead of putting very large float[ing point] numbers on a microcontroller you can actually use straight up integers, whole numbers link ones and zeros; 500 or 35,000. You can use whole numbers and you still get a reasonably accurate model that is more likely to fit on a small device.

This is still an emerging space. We’re not at the point where we’re doing live streaming video on microcontrollers, but you can do person detection with an Arducam. It’s a snapshot. You can detect in a frame that the camera is watching if a person is there or not there. Models are evolving and our ability to run them faster is evolving. It’s really cool that we’re getting to that place.

Chris: I remember what Pete said in one of his blogs was that he envisions a point at which even very small, stupid devices should be able to hear a voice, respond to a limited set of instructions and even, perhaps, know that you are looking at them and that they should pay attention and execute your instructions. Which is really kind of astonishing. Even that level of interactivity is both good and scary.

Brandon: It is scary. You’re right. But it’s also encouraging. I’m glad you brought up Pete again because one of the stories I heard from him is this. We say that we think our phones are listening to us when the reality is that they are not. What is listening for us to say “Hey Siri” is not the core processor on the phone, it’s something called a digital signal processor or DSP. The reason that DSP is constructed like that is so your phone’s battery doesn’t die while listening 24/7 for the “wake” order. It’s not recording; it’s only listening for a digital signal that matches the magic words to wake it up. Anything else gets processed, but not stored. Not interpreted. Not looked at. It’s just gone. I find it a bit comforting to know that it’s not the core processor of the phone listening. It’s just this tiny little low-power DSP.

That realization is what sent Pete and Google on this journey of really trying to unlock the potential of machine learning on microcontrollers. Because effectively a DSP is just a microcontroller. It’s a very inexpensive part of your phone.

In a world where we worry about privacy…and I do, like everyone does… it’s comforting to know that even our phones are structured in a way that is advantageous for batteries, but it has the side effect of being good for us from a privacy standpoint.

Chris: We’ve done a pretty good arc here, where we’ve gone from the theory (“What is it?”) to implementation (“How do you do this?”) to practical application (“How does that look on really small devices like microcontrollers?”).

So we’ve talked about microcontrollers…and it’s astonishing that you can do anything meaningful on microcontrollers. But you’ve got a lot more horsepower available to you on single-board computers. It blows me away that you can get a Jetson Nano for $99. I don’t understand how that can be. Just size-wise I’m going to show my screen. [41:34]

This is a Jetson Nano (left) and this is a Particle Argon (right center) mounted on a breakout board that my friend Andre Rossouw designed. You can see the size relative to my hand. The Nano is a bigger than a microcontroller but it’s so affordable and so much more powerful. And then compare that to an ESP32 in terms of size and of course, things are getting a lot smaller than even the ESP.

I want to juxtapose these two thoughts. Yes, microcontrollers are small and cheap, but you’ve got this range. Plus, a lot of times people couple single-board computers and microcontrollers so really there is a lot of potential.

Brandon: [42:40] Absolutely. [Jetson] Nanos are a great example. I’ve done a bunch of work using the Google Coral. The Nano and the Coral are both Raspberry Pi-compatible. They’ve got a 40 pin header so you can slot any Pi HAT on top of them. The Coral is a little closer to being Pi sized. [Other options are the] Google  [USB Accelerator or the IBM] USB Compute Stick which can be used with a Raspberry Pi 4 that has USB-C.

In any of these cases, for around $100, you are able to do some pretty serious machine learning. That includes doing video streaming and being able to do facial detection, person detection, of car detection on streaming video from a detached camera. That’s absolutely amazing.

Chris: It’s crazy and a tremendous amount of fun. Phase Dock has an opportunity now to collaborate to do some robotics work in the educational space. I’m hoping the opportunity pans out, because robotics with machine learning would be great. For example, Kuka has a robot. It’s basically a mobile platform with a 4 or 5 degree-of-freedom arm on it. It’s small. It’s called a youBot. It’s smart enough that it can walk around a room and recognize all the red blocks and pick them up and take them away. So now you have this autonomous vehicle being sent into an environment where it doesn’t know what it’s going to find. You give it some nebulous instructions. It will figure stuff out, go do that and go home. So it’s exciting and being a lot of fun to talk about it.

OK. We should start to wind down now. What parting thoughts do you have for someone who wants to get into machine learning?

Brandon: I have some parting thoughts, but before we get into that, I want to talk a bit more about privacy. This is one of my favorite aspects of machine learning with microcontrollers. It creates this world where…when you are doing your prediction or inferencing on a microcontroller… privacy is built in. I know this is something we all think about in relation to our smart phones. Let me talk about it in the context of personally identifiable information.

[46:40] One of the examples I like to give is with machine vision, especially in that a lot of the conversation about vision, image detection, even smart speakers involves using the cloud. They use their own servers for processing. Think about this in the context of something like person detection. Or in my case, I built an emotion detection demo that uses Azure cloud services.

When I’m running this demo, I want to see a prediction of the emotion on my face. [47:28]

I guess it thinks I was angry when I took that picture. But in addition to predicting the result, now the cloud has the photo. They have a picture of me. And yes, it’s individual pixels, but it’s pixels that can be reconstructed into a likeness of me and used potentially in a negative way. And yes, I did this myself. But if I were constructing a system that used photos of other people, I would be inadvertently violating their privacy.

If instead, I were to do vision detection at the edge, on a microcontroller or something like a Raspberry Pi and then only sending up the prediction without sending the image itself, what I get out of my model is something like this. [49:29] This is just a JSON object, but this object has what I want to know, which is “What emotions were detected?” I see a 66% anger and 33% disgust. I get information about facial hair, gender, age, head pose, whatever. But I’m not sending any pixels back to the cloud. I’m sending the JSON, basically just the string. None of this information can be used against me or reconstructed to say “Oh that’s Brandon Satrom.” Based on this data, you could not determine that it is about me.

One of my favorite things about machine learning with microcontrollers is that most of the time, we just want the output. We don’t need the raw data. In fact, even if we are using wifi or powerful computers in the cloud, do we really want to use all the bandwidth to send streaming video or high resolution photos? Chances are, we don’t.

I love this idea that we can respect privacy and create solutions that allow us to get what we want and that people can stay safe as a result.

Chris: That’s a good way to close the circle I think. The promise of IOT has been “Oh you can gather all this data.” And the disappointment of IOT has been “What do you do with all this data?” If you don’t analyze it and extract some meaning from it, why did you get it?

So despite gathering all the data, we’re basically throwing the value away. And what I just heard you say was: If we were to analyze the data at the edge and extract the meaning there, you could throw away the input data. You don’t care about that because you’ve already extracted what you want, which is the meaning. That takes up a lot less space as well as addressing a lot of the privacy concerns you mentioned so it solves a lot of problems at one time.

Maybe that’s why we really care about machine learning. It helps us get more benefit out of this whole IOT thing.

Brandon: I agree. That’s a great way to put it, Chris.

The video wraps up with thanks, future LIVE events and a sneak peak at a project Brandon is working on for Blues Wireless.

Resources to learn more about machine learning:

Books:

Articles/Blog posts:

    • Machine Learning on Microcontrollers by Helen Leigh; Volume 75 of Make: Magazine (should be available on newsstands until February 2021); An extensive introductory/overview and how-to article.  Available online to subscribers.
    • Why the future of machine learning is tiny : blog post by Pete Warden, staff researcher at Google; leads their TensorFlow Mobile team.

Videos: