When you start to peel away the layers of a machine learning model, you’ll find it’s just a bunch of matrix multiplications under the hood. Whether the input data to your model is images, text, categorical, or numerical it’ll all be converted into matrices.
The basic matrix multiplication seems to be i*w+b=p (i=inputs, w=weights, b=bias, p=prediction).
Free form text might be converted into matrices with the Bag-of-words model.
Learning paradigms
There are three major learning paradigms:
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised vs non-supervised learning
In supervised learning algorithms, a dependent variable is assigned the role of a target variable.
Then, known values are provided for the target variable.
Supervised learning
There are two categories of supervised learning:
Classification (which predicts a class (from a set of discrete classes) that data belongs to)
Regression (which predicts a number that data belongs to)
Some classification algorithms are
K-Nearest-Neighbors (K-NN, one of the simplest)
Naïve Bayes
decision trees
Logistic regression
Logistic regression is a special case of regression where the output is binary (a boolean yes or no). Logistic regression might be used to determine if an email message is spam or not. In a way, the logistic regression is classifier with two classes)
Unsupervised learning
The two main(?) categories of unsupervised learning are
Clustering
Association
Regression
Examples of regression include
Linear regression
Support vector regression (SVR)
Regression trees
Cost function
The cost function estimates how bad (or wrong) the performance of a model is.
The goal of machine learning is to minimize the cost function.
Automatic summarization
Automatic summarization tries to determine the semantic meaning of a text.
Approaches:
Extraction: select a subset of words, phrases or sentences
Abstraction: build an internal semantic representation
Automatic summarization is used, for examples, in search engines.
Deep learning
Deep learning is a class (or subfield) of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input.
Deep learning is also referred to as representation learning (see Jones, The learning machines, 2014).
Training the model, hyperparameters etc.
Setting so called hyperparameters allows to control the training process of a model.
Hyperparameters include
Number of epochs
How many times the entire training data set is passed through the neural net
Batch size
The number of samples that is processed processed in one forward/backward pass. The model's parameters are adjusted once per batch (rather than after each individual sample)
Learning rate
A scalar with which the gradient is multiplied to adjust the model's parameter.
An epoch consists of the following two parts
Train loop: Use the training samples to try to converge to the optimal parameters
Validation/test loop: Use the test samples to check if the model performance is improving.
A larger batch size may result in faster training (for example because of parallel processing), but also requires more memory.
A larger learning rate may lead to faster convergence at the risk of overshooting or the loss function becoming oscillating.
The samples of batches are typically reshuffled for each epoch to reduce model overfitting.
Maximum inner product search (MIPS)
MIPS refers to the problem of finding the vector («embedding») in a dataset that has the maximum inner product with a given query vector.
MIPS is essential to solve large scale classification and retrieval tasks such as recommendation systems.
MLOPs (Machine Learning Operations)
MLOps is a core function of Machine Learning engineering.
MLOps focuses on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them.
MLOps is a collaborative function, often comprising data scientists, devops engineers and IT.
Misc
The terms machine learning, pattern recognition, data mining and knowledge discovery in databases (KDD) overlap in scope. Thus, they are hard to separate.
RBLM = Rule based machine learning
See also
Over- and undertraining: Over- and underfitted models. Overfitting usually happens when training was performed too long.