Linear Regression Mathematics

The reason I wanted to learn the derivation of the linear regression formula began with reading “The 100-Page Machine Learning Book.” While it covered concepts concisely, the formulas felt dropped in without context, and that left me frustrated. Seeing equations appear without a clear explanation of their origin felt incomplete. I need to understand the logical path that leads them—it’s not satisfactory just to memorise the results and end equations, especially if I want to grasp what it’s actually trying to do.

Linear regression, despite being unimpressive in its utility today, is often the first topic introduced in machine learning because it sets the stage for understanding optimization, gradients, and the interplay between dependent and independent variables. The process of deriving the formulas is not strictly necessary, and I realize that many skip straight to implementation. But personally, if I can see how the pieces fit together mathematically, the formulas stop feeling like isolated artifacts and start making sense as part of a cohesive framework. This strengthens comprehension and allows you to approach implementation with a stronger foundation.

On the surface, it models a straight-line relationship, but digging into its math reveals the foundations of optimization and statistical reasoning. Learning the derivation shows exactly why minimizing squared errors leads to closed-form solutions for the slope and intercept. That process introduces the concept of cost functions, which later generalize into gradient-based methods like stochastic gradient descent.

This contains the derivations from the top. It’s mostly for me when I need to recall where some things come from. Writing LaTeX in the web is annoying and inefficient, hence the pdf.