Site icon The Temboo Blog

Make Smart Predictions with Amazon Machine Learning

Machine Learning in Context | Machine Learning Fundamentals | Types of Machine Learning | Terms to Know | Supervised ML with Amazon Machine Learning | Machine Learning for IoT | Further Reading

The Temboo Choreo library just got a new addition: Amazon’s Machine Learning service. It’s an excellent way to get started with data-driven predictions in any application without bringing on a Machine Learning specialist. If you’re looking for a straightforward supervised Machine Learning solution and your application doesn’t call for a custom implementation, Amazon Machine Learning may be just the tool you’re looking for. It’s a version of one of the Machine Learning implementations that Amazon itself uses internally, so scalability is certainly a feature.

In this article, we’ll give you a crash course in the basic concepts of Machine Learning, tell you where to start with Amazon Machine Learning and its API, and give you a few pointers on how you might approach applying Machine Learning to your Internet of Things application.

Amazon Machine Learning is designed for supervised Machine Learning tasks. Not so sure what we mean by that? No worries, just read on.

Putting Machine Learning into Context

Here’s an example of just one of the infinite tasks that can be done with Machine Learning:

Let’s say you make your living selling houses and you want to make pricing the houses you sell an easier task. You might decide to train a Machine Learning model to predict the price at which you should sell the house. To do so, you collect all the data you can find about houses sold in your area in the last five years or so.

Your data set includes several data points describing each house’s features, like its area in square feet and the number of bedrooms it has. For each house, you also include the price at which it was sold. You make sure your data is clean and ready for processing, and you feed it into your Machine Learning model to “teach” it how to think about house prices. Then you test your model out on some other house data whose price you already know. How well it guesses their actual sale prices lets you know how representative your initial data set was, and thus, how helpful your Machine Learning model will be.

If your model is effective, you can give it data for a house you’re hoping to sell, and then get a good prediction for the price you should sell it for. What you do with that prediction is up to you.

Machine Learning Fundamentals

The underlying basis of a Machine Learning application is a statistical model that improves itself as it is given more data. In Google AI researcher François Chollet’s book, Deep Learning with Python, he describes the distinction between classical programming and Machine Learning as follows: classical programming uses rules and data to produce answers. Machine Learning, on the other hand, uses data and answers to produce rules.

It’s important to understand that not every problem is a good candidate for Machine Learning. Many of the questions you may have for your data set are perfectly answerable using classic analytical methods. So when is Machine Learning a good choice to solve the task at hand? Amazon Machine Learning’s documentation helpfully identifies the following two simple rules for deciding whether ML is the right choice:

Types of Machine Learning

The major types of Machine Learning are characterized by the kind of feedback given to the Machine Learning algorithm during its “learning” phase. Though Amazon Machine Learning is for performing supervised Machine Learning tasks, it is useful to have a basic understanding of the capabilities of each of the following major types:

Terms to Know

Area (sqft) Floors BR BA Yard (sqft) Garage Pool School District Exterior Roof Material Year Built Year Sold Price
1601 1 3 2 11902 y n 109 Wood Asphalt 1974 2008 145111
1840 1 3 2 128840 n y 108 Brick Asphalt 1946 2017 220000

In Context: If we want our house pricing model to tell us the number of thousands of dollars a given house might sell for, then our prediction task is a regression task.

On the other hand, if you would like to predict the value of a target that can only be one of a finite set of values, that’s known as a classification task. There are two types of classification tasks: binary and multiclass classification, where binary classification tasks predict a target that has two possible values, and multiclass classification tasks predict a target that has three or more potential target values.

In Context: If for some reason we only needed to know whether the given house would sell for more or less than $300,000, then that’s a binary classification task. If we had “bucketed” the house prices into sub-ranges of something like, “less than $200K”, “$200-$400k”, “$400k-$700k”, and “more than $700k”, and we only wanted predictions about which of those ranges the given house would fall into, that would be a multiclass classification task.

The Supervised Machine Learning Process with Amazon Machine Learning

The quickest way to understand the process is hands-on with their interactive Amazon Machine Learning tutorial, but we’ll give you a quick rundown and show which Temboo Choreos you can use at each step along the way.

1. Prepare a Training Dataset

The majority of your effort in performing any type of Machine Learning task will be spent on the most important step of all: preparing a dataset for training your machine learning model.

Collecting and cleaning the data requires care, and the time you take to plan ahead will have a strong influence on the worth of the Machine Learning model that results from training with this data.Before you do anything else, you should determine what question you’re asking of your data. Consider whether you really need a numerical value as an answer, or whether your problem is actually a classification task.

The principle of garbage in, garbage out applies to Machine Learning as much as it does to anything that relies on statistical methods or data processing. It’s important to make sure that your training data is made up of observations consisting of data points that are relevant to your target.

It’s possible that not all of the data you have available will be meaningful features for your eventual training dataset. Giving your data a thorough check-up with traditional analytical methods will help you determine which features should stay in your training data and which ones should get thrown out. 

Here’s what to look for when analyzing your potential training data.You may find it necessary to perform some preliminary computations on your initial feature values in order to turn them into useful values for ML model training purposes.

2. Train the Machine Learning Model

Once your data is collected, cleaned, and properly formatted, it’s time to upload it to your AWS S3 bucket and create a new data source from the Amazon Machine Learning console.There are Temboo Choreos for every step of the Training process, but unless you’re regularly creating multiple Machine Learning models, it’s probably more efficient to take care of these steps in the ML Console than to do it programmatically. Here are the Choreos you would need:

Temboo also has Choreos for AWS S3, which can come in handy when uploading new datasets for creating data sources.

3. Evaluate the Accuracy of the Machine Learning Model

When training a machine learning model in supervised machine learning, we set aside a portion of the training data set for testing the model after we have trained it. This way, we can get an approximate idea whether our Machine Learning model turned out to be accurate after training.

Amazon Machine Learning does this for you automatically and provides simple utilities for evaluating your ML model in the future, should you have new labeled data available to use for the evaluation.If you would like to regularly and programmatically evaluate your ML Model using a new labeled dataset, you’ll want to use these Choreos:

4. Use the Model to Make Predictions

Now you’re ready to benefit from all your hard work and generate predictions from your data.

In Amazon Machine Learning, there are two ways to make predictions. You can make individual predictions in realtime, or you can make batch predictions for multiple observations all at once. You should use the method that’s appropriate for the level of urgency for your application in accessing those predictions. The difference between the two is that batch predictions can handle multiple rows of observation data and take more time to produce.

The dataset you’ll send to Amazon Machine Learning to generate predictions should look exactly the same as your training dataset, with one exception: it won’t include the target value.

For batch predictions, you’ll need to upload a properly formatted CSV file containing rows of observations to AWS S3, then you’ll create an Amazon Machine Learning data source from that file.

For realtime predictions, you can either manually enter observation data in the Amazon Machine Learning console, or you can use the Amazon Machine Learning API. When using the API, you’ll need to build a JSON string containing your set of observations. It may take a bit of experimentation to get your JSON string properly formatted to match the schema of your training dataset.

Machine Learning for the IoT

Using supervised Machine Learning in conjuction with the Internet of Things presents some exciting possibilities and interesting challenges. IoT has tremendous potential to generate vast amounts of data, and the opportunity is ripe for Machine Learning to play an impactful role.

The first challenge of Machine Learning for IoT is the relatively limited amount of computational resources found on embedded devices. This is the very reason that the cloud and edge device model has become so prevalent: we’re offloading intensive processing to a more powerful, more central computer. It’s the perfect context for a Machine Learning cloud service.

Perhaps the greatest hurdle for any ML application, IoT or otherwise, is collecting a meaningful and representative training dataset. Understand that this step may be very time consuming, depending on the nature of the data you’re collecting.

To build your initial training dataset, consider the following: depending on what your model will be predicting you may need to collect datapoints from multiple devices, perhaps in disparate physical locations. You may also consider gathering some of your data from third-party sources, such as local weather data from a weather service API, or generating Natural Language metrics using a service like the Google Cloud Natural Language API.

Depending on the number of sources that pool to create your dataset, you may need to gather all of your data points in a central location to then properly format and send to AWS. It’s up to you whether you do that locally on your own server or a gateway device, or you choose to do it through a cloud services database API, or something as simple as Google Sheets.

Applications of Machine Learning in IoT include predictive maintenance and optimizing equipment performance and system efficiency. For example, to understand factory equipment, you might begin gathering data about energy expenditure, vibration, temperature, product scrap rates, other product metrics, and machinery malfunctions. That’s just the beginning. Anywhere you place devices to monitor physical conditions could have potential as a Machine Learning data source.

Further Reading on Machine Learning

Without a doubt Amazon Machine Learning’s documentation is among its best features. With thorough and accessible explanations of what you need to know about how machine learning works, as well as guides on every step of the process and an interactive tutorial project, you’ll be prepared to make the most of Amazon Machine Learning like a pro.

The beauty of Amazon Machine Learning is that you really don’t need to know all of the ins and outs of Machine Learning in order to use it, but it is a fascinating field and there are many excellent free resources out there to understand it better. Here are just a few of them:

Exit mobile version