Linear Regression in Go - Part 1
Python is becoming the de facto standard for Big Data and Machine Learning, in particular because of some amazing tools like IPython Notebook that help visualize your data or scikit learn that implement some of the most popular machine learning algorithms.
So implementing an ML algorithm in Go is a pure exercise.
What is Linear Regression
Linear regression is a supervised machine learning algorithm used to predict a continuous value; for example, it can be used to predict prices in the market.
The term supervised refers to the fact that the algorithm needs to be trained with a learning dataset; we’ll see more examples of supervised algorithms in the future.
Here is a plot of real data about house prices in Windsor, ON — X axis is lot size, Y axis is price.
As we can see, bigger lots tend to cost more. The red line is our best guess at the relationship:
This red line is called hypothesis function (or prediction) and looks like:
where is our feature (the lot size) and the result is our price prediction.
But we could have more features, like the number of bathrooms or the number of bedrooms; we can even use polynomial functions of the features. A more complicated example is:
Here is still our lot size but is now a quadratic function, and might be the number of bedrooms.
In this case, what the machine learning algorithm will do is find the right weights for this function to give the best results, so it will find the vector .
Using Matrices and Vectors
If we arbitrarily define a new value to be equal to 1 we can rewrite the hypothesis function as follows:
where is the number of features and is something like this:
But since the two equations are equal and we can get to the vectorized format:
This is not just easier to read, it’s also independent from the number of features and can benefit from computationally optimized functions like the ones you can find in packages like gonum matrix. gonum package uses BLAS and LAPACK implementations, you can find more details here.
This first post ends with the hypothesis function written in Go taking advantage of the mat64.Dot function:
func Hypothesis(x, theta *mat64.Vector) float64 {
return mat64.Dot(x, theta)
}
You can find the whole file here and its test here.
In the next post about linear regression we’ll implement the cost function and the gradient descent, the cost function is used to measure the error of a specific set of , while the gradient descent is a function that will converge to the optimal values.
You can find part 2 here