Search notes:

Machine learning

Matrix operations

When you start to peel away the layers of a machine learning model, you’ll find it’s just a bunch of matrix multiplications under the hood. Whether the input data to your model is images, text, categorical, or numerical it’ll all be converted into matrices.
The basic matrix multiplication seems to be i*w+b=p (i=inputs, w=weights, b=bias, p=prediction).
Free form text might be converted into matrices with the Bag-of-words model.

Learning paradigms

There are three major learning paradigms:

Supervised vs non-supervised learning

In supervised learning algorithms, a dependent variable is assigned the role of a target variable.
Then, known values are provided for the target variable.

Supervised learning

There are two categories of supervised learning:
  • Classification (which predicts a class (from a set of discrete classes) that data belongs to)
  • Regression (which predicts a number that data belongs to)
Some classification algorithms are
  • K-Nearest-Neighbors (K-NN, one of the simplest)
  • Naïve Bayes
  • decision trees
  • Logistic regression
Logistic regression is a special case of regression where the output is binary (a boolean yes or no). Logistic regression might be used to determine if an email message is spam or not. In a way, the logistic regression is classifier with two classes)

Unsupervised learning

The two main(?) categories of unsupervised learning are
  • Clustering
  • Association

Regression

Examples of regression include

Cost function

The cost function estimates how bad (or wrong) the performance of a model is.
The goal of machine learning is to minimize the cost function.

Automatic summarization

Automatic summarization tries to determine the semantic meaning of a text.
Approaches:
Automatic summarization is used, for examples, in search engines.

Deep learning

Deep learning is a class (or subfield) of machine learning algorithms that uses multiple layers to progressively extract higher-level features from the raw input.
Deep learning is also referred to as representation learning (see Jones, The learning machines, 2014).

Training the model, hyperparameters etc.

Setting so called hyperparameters allows to control the training process of a model.
Hyperparameters include
Number of epochs How many times the entire training data set is passed through the neural net
Batch size The number of samples that is processed processed in one forward/backward pass. The model's parameters are adjusted once per batch (rather than after each individual sample)
Learning rate A scalar with which the gradient is multiplied to adjust the model's parameter.
An epoch consists of the following two parts
A larger batch size may result in faster training (for example because of parallel processing), but also requires more memory.
A larger learning rate may lead to faster convergence at the risk of overshooting or the loss function becoming oscillating.
The samples of batches are typically reshuffled for each epoch to reduce model overfitting.

Maximum inner product search (MIPS)

MIPS refers to the problem of finding the vector («embedding») in a dataset that has the maximum inner product with a given query vector.
MIPS is essential to solve large scale classification and retrieval tasks such as recommendation systems.

MLOPs (Machine Learning Operations)

MLOps is a core function of Machine Learning engineering.
MLOps focuses on streamlining the process of taking machine learning models to production, and then maintaining and monitoring them.
MLOps is a collaborative function, often comprising data scientists, devops engineers and IT.

Misc

The terms machine learning, pattern recognition, data mining and knowledge discovery in databases (KDD) overlap in scope. Thus, they are hard to separate.
RBLM = Rule based machine learning

See also

Over- and undertraining: Over- and underfitted models. Overfitting usually happens when training was performed too long.
pattern recognition
Spark
The Python library scikit-learn.
Azure Machine Learning
There are some experiments with 8 bit floats to improve the computational efficiency of machine learning.
Machine learning: Math
Generative pre-trained transformers (GPT)

Links

TensorFlow.js is a library for developing and training ML models in JavaScript, and deploying in browser or on Node.js
ManimML is a Python project focused on providing animations and visualizations of common machine learning concepts.

Index