Formulation
Let and be the (input) feature and target space respectively. A linear regression model takes the form
where
Linear regression can also be formulated in a matrix form. Suppose there are total instances, i.e., and . Then,
where
The values are called error terms, or sometimes noise, and captures all other factors that influence other than . It is common to assume that it follows a Gaussian distribution:
Then, the likelihood function can be written as
or equivalently
in multidimensional cases, assuming the data points are drawn independently from the distribution.
Basis Functions
The linear regression model assumes a linear relationship between and . However, in some cases the relationship may be nonlinear. To accommodate such cases, it is possible to apply a nonlinear transformation (called a basis function) to the input , transforming the input into some other form . As long as the parameters of are fixed, the model remains linear in the parameters, even if it is not linear in the inputs. Common choices of basis functions include:
- Polynomial:
- Gaussian:
- Sigmoidal:
Parameter Estimation
Analytic
Maximum Likelihood
To maximise the likelihood, we can alternatively minimise the negative log likelihood.
The MLE is the point where .
Solving the equation yields
and therefore,
This is also called the normal solution.
Mean Square Error
Alternatively, we can define a loss function and find the optimal point that minimises the loss. A common choice for the loss function is the mean square loss (MSE), which is given as follows:
Now it suffices to find the point such that .
Notice that the equation is the same as in the MLE scenario. The optimal parameters are given as:
Numerical
Solving the normal solution includes computing the inverse of a large matrix of size . This can be a very expensive computation if , the input dimension, becomes large. In such cases, finding the optimal solution numerically can be an alternative.
References
- Wikipedia contributors. (2024b, January 15). Linear regression. Wikipedia. https://en.wikipedia.org/wiki/Linear_regression
- Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press.
- Bishop, C. M. (2016). Pattern recognition and machine learning. Springer.
- Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for machine learning. Cambridge University Press.
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: with Applications in R. Springer Science & Business Media.