The Fundamental Guide to Machine Learning Hardware for the Edge

One of the best parts of rolling out Kosmos has been seeing the benefits of our smart insight features powered by machine learning.

As data is passed to the cloud, our models make predictions about the future state of a user’s application, as well as flag any data points that appear anomalous.

However, as we contemplate the future of Kosmos, we are increasingly interested in pushing these tools from the cloud to the edge. We thought we’d share the five devices we’re most excited to experiment with.

Taking Machine Learning Hardware to the Edge

But before we dive in, why Machine Learning on the edge in the first place?

The Edge from U2
Not this guy – we mean edge devices

Reduction in Network Bandwidth

One of the strongest motivators is the reduction in network bandwidth. Instead of pushing data up to the cloud to be evaluated, the device can do it locally. This leads directly to a reduction in power consumption.

Consider the fact that encrypting data to be evaluated remotely (and decrypting the reply) can consume 5 times more power than evaluating locally. While power consumption may not be as large a concern in traditional systems, it quickly becomes important when deploying devices in less accessible and controllable environments.

Earlier Evaluation

It follows that evaluation will also occur earlier. This decrease in latency means reducing the time between the edge registering an anomalous value and the user receiving an alert. This is because evaluation occurs on your hardware, where data is received at higher rates than it is pushed to the cloud for storage.

Reducing the time to an alert can translate into real dollar savings for our customers who use Kosmos to monitor assets like their manufacturing processes.

Better Security

Another consideration is security. A user has more control over the privacy and security of their models when they reside on the edge.

The use of online algorithms provides even more privacy, with there being no need for the model to touch the cloud as it is updated by the edge-specific data it is receiving. For cases where privacy is integral, this can prevent any data at all from being externalized.

5 Hardware Options for Machine Learning on the Edge

Now that we’ve convinced you the edge is the future, let’s explore the machine learning hardware options currently available:

Coral machine learning hardware products from Google

Coral by Google

First up is Coral, a range of products from Google that bring ML to the edge via TensorFlow Lite.

The USB Accelerator is particularly exciting, as it is easy to integrate into existing systems. Each TPU is able to perform 4 tera-operations per second, and while your mileage may vary depending on what you are trying to accomplish, our research showed that this is the fastest chip for inferencing on our list.

Although not an issue here at Temboo, one major drawback is that it only supports TensorFlow Lite. Another issue is that the edge TPU is not capable of backward propagation, which limits its ability to train models.

Intel Neural Compute Stick 2 machine learning hardware device

Intel Neural Compute Stick 2

If you don’t want to be locked into TensorFlow, the Intel Neural Compute Stick 2 provides an alternative.

Using Intel’s OpenVINO toolkit, you can run models from multiple frameworks and exchange formats via its intermediate representation format. Their developer guide has a comprehensive list of technologies supported.

One reason that the NCS2 doesn’t top our list is because the focus is heavily on computer vision, which is not currently a priority at Temboo. Like the Coral, NCS 2 is designed for inferencing not training. 

NVIDIA Jetson TX2 machine learning hardware device


The NVIDIA Jetson TX2 is exciting because all aspects of machine learning can be executed on the edge, not just inferencing. It’s a fully fledged ARM multi-core CPU with 256 NVIDIA CUDA cores, making it one of the better suited systems for training models on the edge.

The Jetson TX2 consumes more power than the devices we’ve discussed so far, but this is to be expected because it is not an accelerator like the NCS2 or Coral USB Accelerator. The Jetson combines the main CPU and the accelerator into one board, making it a great gateway with built-in model training and execution. It’s also the most expensive option on our list.

Sispeed MAIX-I chip with notations

Sipeed MAIX-I

The Sipeed MAIX-I is a low-cost chip with AI acceleration built right in. This single chip combines a dual core processor, support for neural networks through a KPU, and wifi connectivity into one convenient package.

While the throughput is lower than other accelerators, it represents pushing ML as far to the edge as possible. Currently the MAix-I appears optimized for video and audio application, which could fundamentally change the way our users are deploying sensors. Instead of a sensor just tracking motion, it could instead couple that with a model that categorizes the source of the motion (e.g. human detected).

Xilinx FPGA

Xilinx FPGAs

We’re ending our list with Xilinx, which specializes in FPGAs (field-programmable gate arrays) like the Zynq-7000 SoC. While an FPGA may be slower than a modern ASIC (application-specific integrated circuit), they allow you future-proof your product.

By planning for an FPGA accelerator in early chip design stages, the FPGA can then be configured with the best ML accelerator for the application when the product is actually ready for release. In a field as rapidly changing as ML, this flexibility is critical.

And the Winner is…

While we’re itching to get our hands on all of these devices, we’re most excited for the Coral USB Accelerator. It’s powerful and should integrate well within our existing framework, making it easier to get a meaningful model up and off the ground quickly.

Any that we missed? Let us know the machine learning hardware you’re most excited about in the comments below! Or check out our job postings if you’re ready to help us build out the next iteration of Kosmos.