Artificial Intelligence (AI) and Machine Learning (ML) are big topics these days, no matter what the domain is – finance, fintech, politics, health, education, science, blockchain, and so on. Would it still be possible to catch up if you missed the start and had only some basic knowledge in PHP or Javascript? It has even been used by some startups in the past five years as an opportunity to show off, so as to make their products and services more valuable to customers and investors.

ML, AI or ChatGPT can change everything if they are integrated into the value proposition. However, it is not something that should be taken lightly, ML is become more difficult to access as it requires specialists in AI and ML, and people who have studied the field, or who have dedicated time to be trained and certified on that subject, making it in certain cases harder for the general public to fully understand. There will be no mention of formulas, operations, series of numbers, or variances in this blog post.

Since I am not a math nerd, I did not take the time to fully understand the deep skeleton of Machine Learning programs prior to this article, the probabilities, the calculations, the components, etc. Perhaps I should but I am just a PHP developer, working with my partner Ben to sell a new innovative B2B timesheet solution, so if you tell me that by staying within that language world, I can do some Machine Learning relatively quickly without taking courses on Udemy, I am all ears.


Machine Learning introduction and existing types

We will begin with some context, theory, definition, and a brief overview of what Machine Learning is nowadays. Machine Learning (ML) is a field of inquiry that seeks to understand and develop methods for learning, i.e. methods that utilize data to improve performance on a particular task. Machine Learning is considered part of Artificial Intelligence (AI), so if you (or any program or script of your invention) engage in Machine Learning, you are engaged in Artificial Intelligence.

I found Jason Brownlee PhD’s blog Machine Learning Mastery to be very thorough and accurate, so I will include references from it to support my arguments in this first section. In my understanding, you should begin in Machine Learning with three main learning styles: supervised, unsupervised and semi-supervised, and then, once you determine the style, a series of specific algorithms are available to use in accordance with how you wish to perform the main activity of your Machine Learning: modeling a problem based on its interaction with an environment or input data.

There are already many things that can be done based on these two notions (learning style and algorithm). You can access a lot of free documentation available online that opens up a lot of potential, however, if you were to hear about the following terms one day, please keep in mind that they are part of what is referred to as “subfields” of Machine Learning: Computational intelligence, Computer Vision (CV), Natural Language Processing (NLP), Recommender Systems, Reinforcement Learning and Graphical Models.

AI, Artificial Intelligence concept,3d rendering,conceptual image, illustrating here Machine Learning and PHP

If you decide to stay on the page of the algorithms, you will find a lot explained online. Machine Learning is not in his infancy. It has been worked on and developed by a number of scientists and specialists since then, so even by focusing on one of them, things can become complicated. In this really comprehensive blog post, Jason Brownlee discusses a total of 12 families of algorithms, which will not be explained in detail here, but there:

A total of 63 different algorithms are thus available to you at minimum, quite a large number. However, this is by no means an exhaustive list; more are available if you become involved in ML communities and online groups. Here is an interactive circle chart illustrating Machine Learning’s world and sub-worlds. You can explore each subset and find out what’s inside or nearby.

Among the previous listing, there is one algorithm in particular that has caught my attention from this list, the Naive Bayes Machine Learning Algorithm. Why? This is due to the existence of a unique, well-documented and easy to approach PHP library I discovered on GitHub, which allows any developer to take full advantage of Naive Bayes in a very simple application and a lot of possibilities. Originally a Node.js library, a PHP version with 5 releases has been made available by Tom Noogen from Minneapolis.

Bayes’ theorem, which was first introduced by Reverend Thomas Bayes in 1764, provides a way to infer probabilities from observations. Bayesian Machine Learning utilizes Bayes’ theorem to predict occurrences.
Vitalflux.com

Using Bayes’ theorem, Bayesian Machine Learning makes inferences from data based on Bayesian statistics. An inference can be used to predict the weather more accurately, recognize emotional expressions in speech, estimate gas emissions, and much more! The Tom Noogen’s PHP library can be used for categorizing any text content into any arbitrary set of categories. For example: is an email spam, or not spam ? is a piece of text expressing positive emotions, or negative emotions? But for achieving this, the library needs a Dataset…

Let’s begin with a key component called a Dataset

Dataset is a collection of various types of data stored in a digital format. Data is the key component of any Machine Learning project. Datasets primarily consist of images, texts, audio, videos, numerical data points, etc., for solving various Artificial Intelligence challenges such as: image or video classification, object detection, face recognition, emotion classification, speech analytics, sentiment analysis, stock market prediction, etc.

In the case of this article, if you wish to test the Naive Bayes algorithm in PHP for instance by using a dedicated library, the first thing you should do is review the documentation of the library and determine what type of data is needed before beginning. The developers of the library will guide you to a specific Dataset format. You must first find the ideal Dataset, in other words, get the file with the data already in it and in the right format, before you can begin coding your PHP script.

Project Manager Does Motivational Presentation or Team of Electronic engineers, illustrating here Machine Learning and PHP

The Usage section of the niiknow/bayes PHP library on GitHub explains how to teach a machine to determine whether a text message is positive or negative. It can be an SMS, a tweet, or any other form of short text communication. The more a machine learns about your lessons and teaching in your code, the more accurate it will be in making Predictions in another piece of your code.

The notion of Prediction is also a key concept in Machine Learning; this means that if you come later to the same machine with a completely new and unpredicted message, it will be able to solve the same problem by telling you if that new message is positive or negative (or neutral), since it has been trained and taught to distinguish between positive and negative text messages.

The main objective of this PHP library is to demonstrate how Naive Bayes can be used. It is pretty basic and simple, and it has a wide variety of applications that can be imagined. The most important thing is to train your machine, to make it learn a lot of things, repetitively, in a similar manner to the training of a child. This is when Datasets come into play.

Machine learning , artificial intelligence, ai, deep learning schemas, illustrating here Machine Learning and PHP

It is not uncommon for these files to be huge, with millions of lines in them, and each line should be considered a lesson. Hence, if you make your machine learn one million times a single lesson, such as distinguishing between positive and negative messages, you can be confident that after that enormous amount of learning, the machine will be able to identify differences between them, not alone but in a supervised way.

Kaggle is one of the most interesting platforms where you can find free Datasets. I have decided to recommend you download the Twitter US Airline Sentiment Dataset in order to apply Naive Bayes to our case. This is due to the fact that it contains data and lines in an idealistic format compared to the documentation for the niiknow/bayes PHP library: each line represents a tweet with an indication as to whether it is positive or negative. Here is the file on GitHub.

The file is a CSV, so this is just a table with columns and lines. Here a program or someone has done this preliminary and subsequent job, which means to tell if a tweet is positive and another is negative; but you don’t care at this point, you’re interested in the final result, to use the already prepared information in this file to train a machine. I have created a lighter version of it here with only two columns: the message and the verdict (positive, negative, neutral).

This Dataset can be used to train a Model in one program

The niiknow/bayes PHP library suggests that each time you wish to teach something to the machine, which is still an abstract thing that you are unsure of its exact location, you will have to write a line of code detailing the text message and how it should be considered. This is not convenient because if you wish to teach 1000 lessons one after another directly in your PHP code, you will have to write each text message yourself, so your PHP file will have a corresponding size. That is unimaginable and absurd according to me. This is not something I recommend you do.

<?php
require_once (__DIR__ . '/vendor/autoload.php');

$classifier = new \Niiknow\Bayes();

// teach it positive phrases 

$classifier->learn('amazing, awesome movie!! Yeah!! Oh boy.', 'positive');
$classifier->learn('Sweet, this is incredibly, amazing, perfect, great!!', 'positive');

// teach it a negative phrase 

$classifier->learn('terrible, shitty thing. Damn. Sucks!!', 'negative');

It is for this reason that you should find a way in PHP to loop over each line in the Dataset file and perform the learn() operation at each iteration. Consequently, in that situation, you do not read the contents of the file, but rather trust it blindly. If there are 14 000 tweets for example, you may be able to perform some eye checks manually, but this can be overwhelming. Datasets are thus created for this purpose, as a database you wish to quickly exploit and see the results of, without reading each entry one by one yourself.

According to the previous chunk of PHP code, you must create an instance of the PHP class \Niiknow\Bayes, after which there is a method learn() for making your instance virtually learn something. It’s here the Model notion comes at play: your instance should be considered as a Model, defined here. Unfortunately, this is not an ideal solution since, regardless of how many lessons you wish to teach, the entire education will be lost once your script execution is terminated, so if you want to do something based on the teaching, you have to do it within the same piece of code, right after the call to the learn() method.

The next time your PHP code is executed, the same process will occur, first a training and then an action resulting from it. In a perfect scenario, the learning of the lessons for the machine/Model and the execution of actions based on that learning should be separate steps, ideally in two separate PHP scripts. The first will allow the $classifier to learn something, the second will inherit from that, keep the same instance, load what was previously learned, and perform some actions, such as making Predictions.

php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
php composer-setup.php
php composer.phar require niiknow/bayes
php -r "unlink('composer-setup.php');"
vi script-1.php

Now let’s rewrite that first piece of code with a function dedicated to performing the “learning action“, some new code with a classic PHP way to read a CSV file (Dataset) and parse each line. Prior to that, I think it is a good idea to give you the list of terminal commands to be executed, such as getting Composer to download the dependencies necessary to use the \Niiknow\Bayes library. The first code example block indicates that the class is already available by importing a file called autoload.php. This is the file generated by Composer once the dependencies have been retrieved locally. The second code block are the commands, leading to editing a new PHP script called script-1.php.

<?php
/* Filename: script-1.php */

require_once (__DIR__ . '/vendor/autoload.php');

// method for doing each time the same learning operation 

function teachSomethingToTheModel (
&$MLModelInstance,
$something,
$meaning
) {

$MLModelInstance->learn($something, $meaning);

}

// Model instantiation 

$classifier = new \Niiknow\Bayes();

// OPEN the Dataset file

$CSVFileReadingInstance = fopen(__DIR__ . '/tweets.csv','r');

// READ the Dataset file and TRAIN the Model with it

while (
(
$CSVFileReadData = fgetcsv($CSVFileReadingInstance)
) !== FALSE
) {

$CSVFileLineTextSplit = explode(';', $CSVFileReadData[0]);

if (
isset($CSVFileLineTextSplit[1]) &&
isset($CSVFileLineTextSplit[0])
) {

teachSomethingToTheModel(
$classifier,
$CSVFileLineTextSplit[1],
$CSVFileLineTextSplit[0]
);

}

}

Although this is now better, the Model $classifier lives along with the script execution, and dies with it. There is nothing persistent, nothing that can be stored somewhere awaiting another script to perform a task with it. For instance, you can plan to train and learn your Model one day, but its actual utility and use might be scheduled one year from that day, based on all the training that occurred during the entire year. It is therefore crucial to keep track of your machine’s progress and evolution.

That is the reason why this Naive Bayes PHP library is awesome by nature. There is a second method available once the class has been instantiated: toJson(); This means that following the training and learning of a lesson, a copy of the Model brain can be generated at any moment, which contains an account of everything the Model has learned to that point, and all that in JSON format. This has one significant advantage: the content can now be stored in a JSON file, as plain text, anywhere on your computer, or in a cloud storage service, and then be exchanged. Here is the new code with an additional custom method wrapping the library toJson() method:

<?php
/* Filename: script-1.php */

require_once (__DIR__ . '/vendor/autoload.php');

// method for doing each time the same learning operation 

function teachSomethingToTheModel (
&$MLModelInstance,
$someTextToLearn,
$meaning
) {

$MLModelInstance->learn($someTextToLearn, $meaning);

}

// method for saving a JSON screenshot of the Model into a file 

function backupTheModel (
&$MLModelInstance,
$backupFilePath
) {

file_put_contents($backupFilePath, $MLModelInstance->toJson(), LOCK_EX);

}

// Model instantiation 

$classifier = new \Niiknow\Bayes();

// Fill & train the Model and save it to a JSON backup file

if (
!file_exists(__DIR__ . '/model-brain.json')
) {

// OPEN the Dataset file and TRAIN the Model with it

$CSVFileReadingInstance = fopen(__DIR__ . '/tweets.csv','r');

// READ the Dataset file and TRAIN the Model with it

while (
(
$CSVFileReadData = fgetcsv($CSVFileReadingInstance)
) !== FALSE
) {

$CSVFileLineTextSplit = explode(';', $CSVFileReadData[0]);

if (
isset($CSVFileLineTextSplit[1]) &&
isset($CSVFileLineTextSplit[0])
) {

teachSomethingToTheModel(
$classifier,
$CSVFileLineTextSplit[1],
$CSVFileLineTextSplit[0]
);

}

}

// Save the trained Model into a JSON backup file

backupTheModel(
$classifier,
__DIR__ . '/model-brain.json'
);

exit ('Model trained.');

}
else {

exit ('Nothing to do.');

}

It is up to you to secure that file. JSON is a human-readable format, but after three months of training and exporting, the machine’s brain file may weigh six megabytes, ten megabytes after six months, and 100 megabytes at the end of a year for instance. There will always be one single file that will contain the entire learning efforts you may have conducted. In contrast to starting the process at zero each time, you can separate the process into multiple stages, then transmit it to a different PHP script which will re-use it or complete it.

Let’s make a Prediction program using the Model and 1 variable

The moment when Machine Learning begins to be interesting for me is when Predictions begin to be made. This should open up one’s mind to many practical applications and use cases. In accordance with what I stated previously, please create a new PHP file, logically called script-2.php, where you will start some new PHP code, while staying in the same folder of all created files till now.

It is important to always keep in mind that the previous PHP file, script-1.php, has only one specific purpose: to train a Machine Learning Model based on Naive Bayes Machine Learning Algorithm to discriminate between positive, negative, and neutral short text messages. As a result of this learning, we have stored the results in a brain file called model-brain.json, in JSON format.

Taking this brain file into consideration along with education, we now intend to determine, on a supervised basis, whether any new and unknown short text message is positive, negative, or neutral. The niiknow/bayes PHP library has now two final methods available to each instantiated class, in addition to the learn() and toJson() ones for a total of 4 PHP class functions: the fromJson() method and the categorize() method. And this is key for our latest file. Please take a look:

<?php
/* Filename: script-2.php */

require_once (__DIR__ . '/vendor/autoload.php');

// method for reading the content of a JSON Model screenshot file 

function loadTheModelBackupFile (
&$MLModelInstance,
$backupFilePath
) {

$_ML_model_instance->fromJson(
file_get_contents($backupFilePath, false)
);

}

// method for making Predictions and figuring out the category of a short text message 

function challengeTheModel (
&$MLModelInstance,
$shortTextMessage
): string {

return $_ML_model_instance->categorize($shortTextMessage);

}

// Model instantiation 

$classifier = new \Niiknow\Bayes();

// loading the education brain file of past trainings 

loadTheModelBackupFile($classifier, __DIR__ . '/model-brain.json');

// Retrieve unpredictable data from somewhere or from the user
// TODO: make it more secure, filtered...

$inputShortTextMessage = $_GET['message'] ?? 'Default short text message to be challenged'; 

// Challenge the ML Model with something to trigger it's response (classification, problem solving)

$MLprediction = challengeTheModel($classifier, $inputShortTextMessage);

exit (
'ML Model prediction: the short text message is ' . $MLprediction . '.'
);

It should be noted that without the fromJson() method, the $classifier instance is an empty resource, waiting for a fresh lesson, for a fresh teaching, something that needs to be learned, so you must provide it with information and the meaning of it with the learn() method. There is, however, no such thing in this second PHP file, the learning has to be performed in a previous operation.

The script wishes to retrieve everything that has been learned, so it is utilizing the education contained within the brain JSON file model-brain.json for the purpose of challenging what has been processed and understood, solving new problems by making Predictions on the basis of new information and short text messages submitted. This is a form of Artificial Intelligence, but via a supervised learning.

The Naive Bayes algorithms are commonly used for classification, or categorization, and it performs well when analyzing large amounts of data in many real-world situations. They can also be extremely fast compared to more sophisticated methods. For this reason, the PHP library provides a categorize() method. This is the time when you are testing your Model to see if he has retained what you have taught him. This is his test.

Cluster classification visualization big data stream, database connexions, illustrating here Machine Learning and PHP

The last code snippet above suggests that you take a variable from any classic ways, as a string input, and you will confront it to your Model (after loading its brain, its memory & not starting from zero), and then the Model will respond to you with a positive, negative or neutral opinion regarding that variable. If the brain JSON file is empty, the script will not be able to answer anything reliable.

By training with no limits your Model, you can more confidently trust his future Predictions. In these examples, training and predicting are done separately with a JSON file as a glue. This collection of PHP code snippets and examples has been uploaded to GitHub. Feel free to fork the repo or watch timeNough on that excellent online platform. There is also a way to Sponsor the repo, a new feature.

https://github.com/timenough/mvp/tree/master/mvp/machine-learning

Transpose all this into timeNough’s value proposal

At the time of this article, timeNough is only available as a prototype, which means that a number of features have yet to be developed. After seeing how straightforward it is to approach Machine Learning algorithms by staying in the PHP world, we (the timeNough’s founders, Arnaud and Ben) began to think about how to incorporate some AI and ML into that prototype and perhaps sell it to early adopters later on. All that while keeping end users’ comfort in mind.

Robot handyman with hand wrench. Fixing maintenance concept. Creative design mechanic two weels robotic character. Orange wall, gray floor background, illustrating here Machine Learning and PHP

Let me share with you what we have in mind based on what timeNough does. Maybe you are currently working on a similar application or are working on solving problem that could be addressed with the use of the Naive Bayes Machine Learning classification algorithm. The key to this will be to keep an open mind and demonstrate an ability to imagine, just as I did. Think about Workforce Behavior Prediction

timeNough was developed here to simplify the lives of employees in corporate environments when dealing with time-sensitive cases, such as clock-in and clock-out times, pauses, lateness situations, and vacation days. It’s made of 8 bots, with interactions via email or SMS, for reducing the interfaces. A large number of interfaces are often the defacto situation in the workplace, leading to stress, anxiety, burnout, a reduction in productivity, and a lower retention rate of companies (employees leaving toxic offices).

Now imagine each time a signal is sent to a bot, an event log is generated and directly sent to the script-1.php as a short text message, and in the same moment, based on the company’s policy and way of thinking, this event is automatically translated to a meaning which can be: good instead of “positive” and bad instead of “negative”. We can now have a system that will feed and train a Model, in a way that is unique to the company and that change from one company to another.

<?php
// event log message generated after Employee n°1 signals 

$classifier_n1->learn('John is a project manager, 5 years in the company and arrives at the office 30 minutes late, for the 1st time in 25 days', 'good');

$classifier_n1->learn('John is a project manager, 5 years in the company and arrives at the office 2 hours late, for the 3rd time in 2 days', 'bad');

// event log message generated after Employee n°2 signals 

$classifier_n1->learn('Lara is an intern, 1 month in the company and arrives at the office 1 hour late, for the 2nd time in 4 days', 'bad');

// relying on a specific method to teach the meaning 

$employee2EventMessage = 'Lara is an intern, 1.5 month in the company and arrives at the office 55 minutes late, for the 7th time in 3 days';
$classifier_n1->learn($employee2EventMessage, methodFiguringOutTheMeaning($employee2EventMessage));

Now, you may be wondering, where does this lead? Simply to a separate and independent program (made of 2 script files), driven by Machine Learning, that will be able to decide and act alone based on these event messages, after it has been trained enough. An employee, for example, may teach the program by these tardiness event messages that coming to work three hours late once in 3 months is bad but coming to work 10 minutes late during 5 consecutive days is good or neutral.

If the categorization and the fact to determine the meaning of each event message received are taking place in the script-2.php file, the script may be able to execute operations on his own after a while, depending on what keywords he will deduce and what patterns he will identify. If a series of 15 “bad” predictions is detected in the script-2.php about an Employee n°1, an alert email notification could be sent to Manager X and Manager Y for instance…

If the series reaches 30, a more severe action will be initiated, and so on if there is an escalade. After receiving an event message leading to a good Prediction keyword, the series will be broken and the count will be reset to zero for this Employee n°1. All that can be translated in a second Model based on Naive Bayes, where the Predictions will not be 3 adjectives such as good, bad and neutral, but specific commands to be executed.

<?php
// translation of the previous Model Predictions in new exploitable messages 

$classifier_n2->learn('John as generated 15 "bad" predictions in a row', '_do_action_1_');

$classifier_n2->learn('John as generated 30 "bad" predictions in a row', '_do_action_2_');

$classifier_n2->learn('Lara as generated 20 "good" predictions in a row', '_do_action_3_');

// relying on a specific method to teach the action 

$postPredictionMessage = 'Lara as generated 56 "good" predictions in a row';
$classifier_n2->learn($postPredictionMessage, methodFiguringOutTheActionToDo($postPredictionMessage));

// now based on the learning, let's execute the appropriate action 

$actionCodeToBeExecuted = $classifier_n2->categorize('Lara as generated 57 "good" predictions in a row');

Clearly this logic must be well defined and concretized by the client company (as we are here talking about supervised ML), but the concept is now clear. This eliminates the need for manual intervention by a member of HR, who would have to review the employee’s logs and raise an alarm accordingly himself. Data will automatically trigger actions in this case, without the need for human intervention or hard-coded conditions by a developer. We have one first Model feeding and training a second Model.

The logic of the prototype remains unchanged, but behind the scenes, there are machines, two ML engines able to do extra tasks based on the data produced by the usage of the prototype, and all that by staying compliant with the client organization rules and principles. It is to be further developed, of course, but you now have a better understanding of how Machine Learning can be applied to our existing prototype without interfering with its logic.

Final thoughts and perspectives

I believe that Machine Learning and Artificial Intelligence are no longer inaccessible topics, and I hope I was able to help you see opportunities and possible applications through my examples. I have listed just 63 existing algorithms. However, there are many more available online to discover, leading to rabbit holes and hours spent trying to fully understand them, so I have chosen to concentrate on only one type of algorithm here, the Bayesian algorithms for doing classification, since they can be handled relatively easily, with its positive/negative way of classifying messages which speaks for itself.

High-Tech Startup Concept: Innovative Female Software Engineer looking up, illustrating here Machine Learning and PHP

The developer of the niiknow/bayes PHP library, who has simplified the process through four basic but efficient methods, deserves a high five. The most common concepts in Machine Learning are: training a Model with a Dataset and making Predictions. I particularly appreciated the library’s ability to create a buffer between them, separating them so that what should be done first and what should be done later can be visualized with palpable impact.

At the end, it’s like wrapper or an API, you don’t have to deal with what’s behind the 4 methods, just to use it. Depending on the complexity of the problem you may have to solve, classification may not be a good fit for you, even with PHP as well. But if you have a solid background in mathematics, a reliable library by developers who tried to make your life and ML understanding easier, plus a legit Dataset, I am pretty sure you can accomplish great things.

I’m also convinced that using Naive Bayes for automated actions based on event message classifications isn’t the best Machine Learning algo. to use here, so if you have a better algorithm in mind, with a certain understanding of the bots system we want to propose through timeNough (you can test our prototype anytime here), please let me know in the comments below. My article does not mention RubixML/ML, who offers a different approach and perspective on the topic, available here.
Thank you for your attention.