Regression is one of the core tasks in machine learning. In this task, you get some input and your target variable is a single floating point number. For example, predicting the price of a house, estimating the age of the universe or calculating the probability that an image shows a dog. The age of the universe example shows that it regression is not only used in machine learning and the dog image example shows that regression and classification can be very similar. A logistic regression can be converted to a classifier by choosing a threshold value (e.g. 0.5).

A big difference between regression and classification are scoring functions and targets. The targets in classification are just a few finite ones, while you have infinite possible targets for regression. Below, you can see a list of scoring functions.

## Scoring functions

In the following, \(y\) is the ordered list of target, \(y^P\) is the list of predictions in the same order and \(\bar{y}\) is the mean of \(y\).

Name | Image | X is better | Definition and Usage |
---|---|---|---|

MAE | $[0, \infty)$ | lower | $f(y, y^P) = \frac{1}{|y|} \sum_{y_i, y_i^P \in (y, y^P)} |y_i - y_i^P|$ |

MSE | $[0, \infty)$ | lower | $f(y, y^P) = \frac{1}{|y|} \sum_{y_i, y_i^P \in (y, y^P)} (y_i - y_i^P)^2$ |

$R^2$ | $[0, 1]$ | higher | $f(y, y^P) = 1 - \frac{\sum (y_i - y_i^P)^2}{\sum (y_i - \bar{y})^2}$ |

Explained Variance | $(-\infty, 1]$ | higher | $f(y, y^P) = 1 - \frac{Var(y - y^P)}{Var(y)}$ |

See also:

## Regression Models

### Trivial Models

There are some straight-forward "models" for regression. They do learn, but they ignore the input completely:

- Arithmetic mean: \(\frac{1}{n}\sum_{i=1}^n {y_i}\)
- Median: Sort all \(y_i\) and take the value in the middle
- minimum and maximum
- q-Quantile: Sort the \(y_i\) and take the first value after going through \(q \in [0, 1]\) of the input. For \(q = 0.5\), this is the median.
- Other "means" like the geometric mean

### Linear regression

Linear regression tries to fit a line to the input by minimizing the squared quadradic distance between the input points and the line. This usually gives pretty good results.

The model looks like this:

If one defines \(X \in \mathbb{R}^n\) one can also write it in a vectorized form:

It can be "learned" (calculated) with

### Logistic regression

Logistic regression tries to fit the logistic function \(y(x) = \frac{1}{1+e^{-x}}\) to the input. This function is nice as it is within \([0, 1]\) and thus can be used to represent a probability.

### Trees

You can also use trees for regression. One idea how to do that is by "bucketing" observations and applying one of the trivial models to each bucket. Such models can only predict values between what they observed before.

See also: