Linear Regression in Go - Part 2
In the previous post we covered the hypothesis function, which is the function that will predict a value given a set of features for a new unknown case. In this post we’re going to build the cost function, a way to measure the error of the prediction function with a specific set of weights.
For convenience this is the function we discussed earlier:
As we said linear regression is a supervised algorithm, this means that we need to train it with a list of examples in order to find the values of the vector where the average error is minimized (we’ll discuss a common bias called overfitting later).
The cost function is a function that calculates the error of a given set of and a training set. In the following graph, there is a subset of the previous examples, just 5 houses. The green line is the result of plotting the hypothesis function; the thin red lines are the difference between a value in the training set and the predicted value by our hypothesis:
To calculate the error we’ll use the following function:
It’s basically the mean of the squares of the difference between the predicted value and the actual value .
Consider that now is a matrix of where is the number of examples in our training set (in the graph plotted here we have 5 houses) and is the number of features (like house size, # of bathrooms, # of bedrooms and so on)
So here is our go implementation using gonum matrix:
func Cost(x *mat64.Dense, y, theta *mat64.Vector) float64 {
//initialize receivers
m, _ := x.Dims()
h := mat64.NewDense(m, 1, make([]float64, m))
squaredErrors := mat64.NewDense(m, 1, make([]float64, m))
//actual calculus
h.Mul(x, theta)
squaredErrors.Apply(func(r, c int, v float64) float64 {
return math.Pow(h.At(r, c)-y.At(r, c), 2)
}, h)
j := mat64.Sum(squaredErrors) * 1.0 / (2.0 * float64(m))
return j
}
As usual the full code is here and a test is here.
In part 3 we’re going to build the method that minimizes the error by choosing the proper values. You can find part 3 here