Skip to main content

Command Palette

Search for a command to run...

Build your own prediction software

A small and quick taste of what you can achieve using supervised machine learning with Create ML

Published
6 min read
Build your own prediction software

Can computers do predictions? Yes.
How do they do them?
Well, let’s roll up our sleeves!

First of all, what is a prediction?
A prediction is a statement about something that has not happened yet. For example, a prediction is saying the result of a soccer match before it finishes.
In my hometown (Napoli) you can hear a recurring prediction, usually in late September:

“Chist è l’anno buon!” (This is “the good” year!)

This is about the local soccer team trying to win the soccer league. Sadly last time this happened was 30 years ago.

I know what you are thinking. You are wondering if there is a way to make more accurate predictions. You can do it if you know the law.

No, you don’t have to be a lawyer.

Let’s try this: suppose that a triangle has the base measuring 10 cm, and its height is 5 cm. Predict his area.

Correct! The area is 25 cm². Did you measure the area? No? How did you do that? Ahh, I see you know the law. You did (10 x 5)/2 = 25.

What computers can do is something related to this. With machine learning computers can extract a law from inputs and output. In the triangle example, 10 and 5 are the inputs, 25 is the output. The more (inputs, output) couples a computer can learn from, the more accurate will be the law it will extract. That’s because there can be a lot of laws, or relationships, between inputs and output.
If I’d come to you asking what’s the relationship between 10, 5 and 25, you could have answered me (10 x 2) + 5 = 25.
This is a correct answer, but it is not the correct relationship for our purpose of calculating the area of each triangle. If we use this relationship to predict the area of each triangle, we will be wrong for sure. That’s the point. Computers can find one common law, or relationship, that binds every (input, output) couple you gave it.

1 | 1
2 | 4
3 | 9

Can you find the common relation between inputs and outputs?
I’m sure you can!
( x | x² )

What do we do now? We will find a dataset, we will train and test a regressor model, and then we do a prediction with it.

dataset: a set of (input, output) couples
regressor model: a piece of code that can extract a common relation between elements in a dataset
train: let the regressor model search for the common relation between the elements in a dataset
test: check that the relation found by the regressor model is correct

You will need Xcode with Create ML, python 3 and a clone of this repository I made for you ❤.

What we will predict

We want to predict the durability of a gear. We start with some features of gears:

  • number of teeth: integer number between 10 and 50

  • heat dissipation: floating point number between 0.2 and 0.8

  • viscosity of lubricant: integer number between 1 and 5

  • temperature: floating point number between 18 and 120

These four are the inputs, durability is the output. The file training.csv in the repository contains enough data to start.

testing.csv file contains 160k rows and each row contains one (inputs, output) couple

The file testing.csv is very similar to training.csv. Its purpose is to let the computer verify its acquired knowledge. The computer extracts a common relation involving data contained in training.csv, and then verifies that the extracted relation works with the data contained in testing.csv .
Basically, the computer studies, takes a recap quiz on what it just studied and then checks how many correct/wrong answers it gave.

The file predict.csv contains our question. We want a prediction for the durability of a gear that has 30 teeth, with a heat dissipation value of 0.7, lubricated with a 4 stars premium oil, operating at 53° Celsius.

predict.csv should not contain durability. We are asking for a prediction of that!

Step 1

Open Xcode, in the menu bar search for
Xcode -> Open developer tool -> Create ML and then
File -> New Project

We will use the Tabular Regressor.

Step 2

It’s time for our model to study. Select the training and testing files, click on Select Features and be sure you are telling the model to use all of the inputs.

Training and testing will require time

Predict durability using number_of_tooth, heat_dissipation, . . .

As soon as you click on the play button in the upper part of the screen the model will start its training, and after that, it will start testing itself.

Step 3

Our model is now ready. In the Metrics, we can see how the model evaluated his freshly acquired knowledge. It’s measuring the errors it made, the lower the better.

Congratulations! Your model is now graduated!

Now its time for answering our question. Click on the Output rectangle, then drag and drop the file predict.csv in the left section or add it trough the plus button in the lower left corner. As soon as you add the file, the model will give you the prediction.

Abracain!

Bonus step

How can we be sure that the prediction is accurate? Usually, it’s not an easy check. We should somehow snoop inside the model and check that it found the right relation between inputs and output. This time we have an easiest way to do this.

30 x 0.7 x 4 / 53 = 1.58 very close to the prediction 1.40

If the correct relation is the one that gave us 1.58, we can be satisfied by the prediction the model made. I can assure you that this is the correct relation because I used it for generating data.

Let’s look inside the dataGenerator.py file:

As you can see, the output is bound to inputs by that relation. And our model predicted a value not far from the one we can obtain by applying that relation.

Wrap up

dataGenerator.py represents a gear simulator. It will produce datasets made by gears features and their durability. We used them to train a model. The model learned how the gear simulator works and can predict the outputs of the simulator.

The model looking at simulator inputs and output is able to replicate the simulator. Think about it.

Suppose that we never had the simulator and that we collected our data only from experiments and measurements. Now we can build a simulator using the trained model.

Can computers predict the future? Yes, but you have to tell them the past.

References

Don’t forget to checkout the code repository to do everything by yourself.

I wrote this article as part of a project done at the Apple Developer Academy in Naples (Italy) in 2020.