Linear Regression in Go - Part 2

In the previous post we covered the hypothesis function, which is the function that will predict a value given a set of features for a new unknown case. In this post we’re going to build the cost function, a way to measure the error of the prediction function with a specific set of weights.

For convenience this is the function we discussed earlier:

$h\_\theta(x) = \theta^T x$

As we said linear regression is a supervised algorithm, this means that we need to train it with a list of examples in order to find the values of the vector $\theta$ where the average error is minimized (we’ll discuss a common bias called overfitting later).

The cost function is a function that calculates the error of a given set of $\vec{\theta}$ and a training set. In the following graph, there is a subset of the previous examples, just 5 houses. The green line is the result of plotting the hypothesis function; the thin red lines are the difference between a value in the training set and the predicted value by our hypothesis:

5 houses · green = hypothesis · red bars = prediction error

To calculate the error we’ll use the following function:

$J(\theta) = \frac{1}{2m}\sum\_{i=1}^{m}(y\_i - h\_\theta(x\_i))^2$

It’s basically the mean of the squares of the difference between the predicted value $h_\theta(x)$ and the actual value $y$ .

Consider that now $X$ is a matrix of $m,n$ where $m$ is the number of examples in our training set (in the graph plotted here we have 5 houses) and $n$ is the number of features (like house size, # of bathrooms, # of bedrooms and so on)

So here is our go implementation using gonum matrix:

func Cost(x *mat64.Dense, y, theta *mat64.Vector) float64 {
	//initialize receivers
	m, _ := x.Dims()
	h := mat64.NewDense(m, 1, make([]float64, m))
	squaredErrors := mat64.NewDense(m, 1, make([]float64, m))

	//actual calculus
	h.Mul(x, theta)
	squaredErrors.Apply(func(r, c int, v float64) float64 {
		return math.Pow(h.At(r, c)-y.At(r, c), 2)
	}, h)
	j := mat64.Sum(squaredErrors) * 1.0 / (2.0 * float64(m))

	return j
}

As usual the full code is here and a test is here.

In part 3 we’re going to build the method that minimizes the error by choosing the proper $\theta$ values. You can find part 3 here