Formulation

Let and be the (input) feature and target space respectively. A linear regression model takes the form

where

Linear regression can also be formulated in a matrix form. Suppose there are total instances, i.e., and . Then,

where

The values are called error terms, or sometimes noise, and captures all other factors that influence other than . It is common to assume that it follows a Gaussian distribution:

Then, the likelihood function can be written as

or equivalently

in multidimensional cases, assuming the data points are drawn independently from the distribution.

Basis Functions

The linear regression model assumes a linear relationship between and . However, in some cases the relationship may be nonlinear. To accommodate such cases, it is possible to apply a nonlinear transformation (called a basis function) to the input , transforming the input into some other form . As long as the parameters of are fixed, the model remains linear in the parameters, even if it is not linear in the inputs. Common choices of basis functions include:

  • Polynomial:
  • Gaussian:
  • Sigmoidal:

Parameter Estimation

Analytic

Maximum Likelihood

To maximise the likelihood, we can alternatively minimise the negative log likelihood.

The MLE is the point where .

Solving the equation yields

and therefore,

This is also called the normal solution.

Mean Square Error

Alternatively, we can define a loss function and find the optimal point that minimises the loss. A common choice for the loss function is the mean square loss (MSE), which is given as follows:

Now it suffices to find the point such that .

Notice that the equation is the same as in the MLE scenario. The optimal parameters are given as:

Numerical

Solving the normal solution includes computing the inverse of a large matrix of size . This can be a very expensive computation if , the input dimension, becomes large. In such cases, finding the optimal solution numerically can be an alternative.

References

  • Wikipedia contributors. (2024b, January 15). Linear regression. Wikipedia. https://en.wikipedia.org/wiki/Linear_regression
  • Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press.
  • Bishop, C. M. (2016). Pattern recognition and machine learning. Springer.
  • Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine learning. Cambridge University Press.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with Applications in R. Springer Science & Business Media.