Giovanni Gorgone

Build your own prediction software

Giovanni Gorgone — Sun, 04 Jun 2023 10:22:57 GMT

Can computers do predictions? Yes.
How do they do them?
Well, let’s roll up our sleeves!

First of all, what is a prediction?
A prediction is a statement about something that has not happened yet. For example, a prediction is saying the result of a soccer match before it finishes.
In my hometown (Napoli) you can hear a recurring prediction, usually in late September:

“Chist è l’anno buon!” (This is “the good” year!)

This is about the local soccer team trying to win the soccer league. Sadly last time this happened was 30 years ago.

I know what you are thinking. You are wondering if there is a way to make more accurate predictions. You can do it if you know the law.

No, you don’t have to be a lawyer.

Let’s try this: suppose that a triangle has the base measuring 10 cm, and its height is 5 cm. Predict his area.

Correct! The area is 25 cm². Did you measure the area? No? How did you do that? Ahh, I see you know the law. You did (10 x 5)/2 = 25.

What computers can do is something related to this. With machine learning computers can extract a law from inputs and output. In the triangle example, 10 and 5 are the inputs, 25 is the output. The more (inputs, output) couples a computer can learn from, the more accurate will be the law it will extract. That’s because there can be a lot of laws, or relationships, between inputs and output.
If I’d come to you asking what’s the relationship between 10, 5 and 25, you could have answered me (10 x 2) + 5 = 25.
This is a correct answer, but it is not the correct relationship for our purpose of calculating the area of each triangle. If we use this relationship to predict the area of each triangle, we will be wrong for sure. That’s the point. Computers can find one common law, or relationship, that binds every (input, output) couple you gave it.

1 | 1
2 | 4
3 | 9

Can you find the common relation between inputs and outputs?
I’m sure you can!
( x | x² )

What do we do now? We will find a dataset, we will train and test a regressor model, and then we do a prediction with it.

dataset: a set of (input, output) couples
regressor model: a piece of code that can extract a common relation between elements in a dataset
train: let the regressor model search for the common relation between the elements in a dataset
test: check that the relation found by the regressor model is correct

You will need Xcode with Create ML, python 3 and a clone of this repository I made for you ❤.

What we will predict

We want to predict the durability of a gear. We start with some features of gears:

number of teeth: integer number between 10 and 50
heat dissipation: floating point number between 0.2 and 0.8
viscosity of lubricant: integer number between 1 and 5
temperature: floating point number between 18 and 120

These four are the inputs, durability is the output. The file training.csv in the repository contains enough data to start.

testing.csv file contains 160k rows and each row contains one (inputs, output) couple

The file testing.csv is very similar to training.csv. Its purpose is to let the computer verify its acquired knowledge. The computer extracts a common relation involving data contained in training.csv, and then verifies that the extracted relation works with the data contained in testing.csv .
Basically, the computer studies, takes a recap quiz on what it just studied and then checks how many correct/wrong answers it gave.

The file predict.csv contains our question. We want a prediction for the durability of a gear that has 30 teeth, with a heat dissipation value of 0.7, lubricated with a 4 stars premium oil, operating at 53° Celsius.

predict.csv should not contain durability. We are asking for a prediction of that!

Step 1

Open Xcode, in the menu bar search for
Xcode -> Open developer tool -> Create ML and then
File -> New Project

We will use the Tabular Regressor.

Step 2

It’s time for our model to study. Select the training and testing files, click on Select Features and be sure you are telling the model to use all of the inputs.

Training and testing will require time

Predict durability using number_of_tooth, heat_dissipation, . . .

As soon as you click on the play button in the upper part of the screen the model will start its training, and after that, it will start testing itself.

Step 3

Our model is now ready. In the Metrics, we can see how the model evaluated his freshly acquired knowledge. It’s measuring the errors it made, the lower the better.

Congratulations! Your model is now graduated!

Now its time for answering our question. Click on the Output rectangle, then drag and drop the file predict.csv in the left section or add it trough the plus button in the lower left corner. As soon as you add the file, the model will give you the prediction.

Abracain!

Bonus step

How can we be sure that the prediction is accurate? Usually, it’s not an easy check. We should somehow snoop inside the model and check that it found the right relation between inputs and output. This time we have an easiest way to do this.

30 x 0.7 x 4 / 53 = 1.58 very close to the prediction 1.40

If the correct relation is the one that gave us 1.58, we can be satisfied by the prediction the model made. I can assure you that this is the correct relation because I used it for generating data.

Let’s look inside the dataGenerator.py file:

As you can see, the output is bound to inputs by that relation. And our model predicted a value not far from the one we can obtain by applying that relation.

Wrap up

dataGenerator.py represents a gear simulator. It will produce datasets made by gears features and their durability. We used them to train a model. The model learned how the gear simulator works and can predict the outputs of the simulator.

The model looking at simulator inputs and output is able to replicate the simulator. Think about it.

Suppose that we never had the simulator and that we collected our data only from experiments and measurements. Now we can build a simulator using the trained model.

Can computers predict the future? Yes, but you have to tell them the past.

References

Don’t forget to checkout the code repository to do everything by yourself.

Code repository
Create ML documentation

I wrote this article as part of a project done at the Apple Developer Academy in Naples (Italy) in 2020.

Legge di Hyrum

Giovanni Gorgone — Mon, 03 Apr 2023 23:15:23 GMT

Stai facendo una modifica al codice, piccola, innocua. Ti fermi, sai che farai arrabbiare qualcuno. Mai provato questa sensazione? E’ esattamente ciò che prevede la legge di Hyrum:

“Non importa cosa espone un contratto, dato un numero sufficientemente grande di utenti di un’API, essi faranno affidamento su tutti gli aspetti osservabili del sistema esposto, anche su quelli non esplicitati nel contratto stesso.”

Per comprenderla al meglio, analizziamo gli elementi coinvolti.

Cosa è un’API? Un’API è il confine tra utente e prodotto/servizio.

L’API di un telecomando della TV è composta dai suoi pulsanti. Essi vengono usati dagli utenti, i quali non han bisogno di sapere cosa succede all’interno del telecomando. Quel che succede all’interno del telecomando è chiamata “implementazione”. Qual’è il vantaggio di questa distinzione? Il vantaggio è che l’implementazione può cambiare senza che l’utente debba preoccuparsene.

Il contratto di un API è l’insieme di garanzie sul prodotto/servizio fornite all’utente.

Semplificando, il telecomando in questione ci garantisce di avere un pulsante di accensione e spegnimento, due pulsanti per aumentare e diminuire il volume, due pulsanti per passare al canale precedente e successivo.

Supponiamo che la prima implementazione del telecomando non abbia controlli sulla carica delle batterie. Quando la carica è bassa il telecomando continua a funzionare a patto che ci si avvicini alla ricevente. Nella seconda implementazione il telecomando smette di funzionare se la carica delle batterie è inferiore al 25%.

Anche se questi aspetti non rientrano nel contratto, con un numero sufficientemente grande di utenti, avremo il seguente scenario:

un utente della prima implementazione intuisce che se il telecomando non funziona, può avvicinarsi alla ricevente per farlo funzionare. Se il telecomando continua a non funzionare, allora deve sostituire le batterie.

Lo stesso utente, passando alla seconda implementazione si aspetta lo stesso comportamento. Se il telecomando non funziona si avvicina alla ricevente per farlo funzionare. Il telecomando però continua a non funzionare perché in questo caso l’implementazione è basata sul livello di carica e non sulla distanza dalla ricevente. L’utente conclude erroneamente che il telecomando o la ricevente non funzionino, ma non che le batterie siano scariche.

Siamo nell’ipotesi della legge di Hyrum: un utente di un’API sta facendo affidamento su un comportamento osservabile del sistema, anche se il contratto dell’API non da garanzie su di esso.

Nell’ambito dell’ingegneria del software questa legge è anche conosciuta come “Legge delle interfacce implicite”. E’ bene tenerla in considerazione durante l’evoluzione di un sistema che deve garantire retrocompatibilità. Infatti sia le modifiche al contratto (interfaccia esplicita), che quelle fatte agli altri comportamenti del sistema osservabili dagli utenti (interfaccia implicita) possono farla perdere.

E’ possibile mitigare gli effetti di questa legge in due modi: il fornitore di un API dovrebbe limitare l’esposizione di comportamenti osservabili del sistema non esplicitati nel contratto, e l’utente dovrebbe fare affidamento solo sul contratto dell’API. In questo modo sarà più facile introdurre cambiamenti in un sistema.

https://www.hyrumslaw.com

Hyrum’s law

Giovanni Gorgone — Mon, 03 Apr 2023 23:12:30 GMT

You are making a small, harmless change to the code. You stop, you know that you'll upset someone. Have you ever felt this way? This is exactly what Hyrum's law states:

“With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.”

To understand it better, let's analyze the elements involved.

What is an API? An API is the boundary between the user and the product/service.

The API of a TV remote control is composed of its buttons. They are used by users who don't need to know what's happening inside the remote control. What happens inside the remote control is called "implementation". What's the advantage of this distinction? The advantage is that the implementation can change without the user having to worry about it.

The contract of an API is the set of guarantees about the product/service provided to the user.

Simply put, the remote control in question guarantees us to have an on/off button, two buttons to increase and decrease the volume, two buttons to switch to the previous and to the next channel.

Suppose the first implementation of the remote control doesn't have battery charge controls. When the charge is low, the remote control continues to work as long as you get close to the receiver. In the second implementation, the remote control stops working if the battery charge is less than 25%.

Even though these aspects are not covered by the contract, with a sufficiently large number of users, we will have the following scenario:

a user of the first implementation guesses that if the remote control doesn't work, he can approach the receiver to make it work. If the remote control continues not to work, then he must replace the batteries.

The same user, switching to the second implementation expects the same behavior. If the remote control doesn't work, he approaches the receiver to make it work. However, the remote control still doesn't work because in this case the implementation is based on the battery level and not the distance from the receiver. The user incorrectly concludes that the remote control or the receiver doesn’t work, but not that the batteries are dead.

That’s a case of Hyrum's Law: an API user is relying on one observable system behavior, even though the API contract doesn't guarantee it.

In software engineering, this law is also known as the "Law of implicit interfaces". It should be considered during the evolution of a system that needs to ensure backward compatibility. Both changes to the contract (explicit interface), and changes to other observable system behaviors (implicit interface) can break it.

The effects of this law can be mitigated in two ways: the API provider should limit exposure of observable system behaviors not described in the contract, and the user should rely only on the API contract. This makes it easier to introduce changes in a system.

https://www.hyrumslaw.com