Search notes:

model

A statistical model is a formal representation of a theory that embodies a set of statistical assumptions. It is typically expressed as a mathematically function ( probability functions) that relates random variables.
Thus, it is a non-deterministic mathematical model.
Purposes:

Variables

Types of variable

A variable is either
  • continuous
  • discrete or
  • categorical (qualitative).
Categorical variables come in three types:
  • Binary (mutually distinct: male or female)
  • Practical
  • Ordinal
Categorical variables (for example hair color or gender) don't have a natural order.
The number of children in a household is a discrete variable: there cannot be 2.3 children.
Continuous variables come in two types:
  • Ratio (Needs to have a 0-point, such as height)
  • Interval (Temperature is an interval variable because 2° is not twice as hot as 1°)

Independent variable

German: erklärende/unabhängige/prädiktor/eoxogene Variable
Goes on x-axis (abscissa)
Aka
  • explanatory variable
  • feature (machine learning and pattern recognition)
  • input variable
  • predictor variable
  • regressor
  • covariate
  • control variable (used in econometrics for covariate)
  • controlled variable
  • manipulated variable
  • exposure variable
  • risk factor (medical statistics)

Dependent variable

German: interessierende/endogene Variable - Zielvariable
Goes on y-Axis (ordinate)
Aka
  • response variable
  • regressand
  • predicted variable
  • measured variable
  • explained variable
  • experimental variable
  • responding variable
  • outcome variable
  • output variable
  • label

Over- and underfitted models

An overfitted model contains more parameters than can be justified by the data.
An underfitted model cannot adequately capture the underlying structure of the data, for example when fitting a linear model to non-linear data.
Over- and underfitting typically occur in machine learning (where it is called over- and undertraining) and regression analysis.
Techniques to reduce overfitting:

TODO

Neural networks are non-linear models.

Index