Linear Regression

Although it has "linear" in the name, this ensemble of techniques has much more to offer than estimating lines, in fact you can estimate a number of non-linear functions using generalized linear regression models.

The idea of simple linear regression to fit the model $y=w_0 + w_1 x$ is sometimes deceiving in its name. With a name like linear regression, it's easy to assume we're talking about that fact that $y(x)$ is a linear function of $x$. And in that equation it is. However, and more importantly, $y$ is also a linear function of the weights, $w_0$ and $w_1$. Assuming the model is a linear function in the weights, it can also (and more excitingly!) be used to estimate the parameters of non-linear functions.

Standard linear model

Let's start with the standard linear model: \begin{equation} y = w_0 + w_1 x \end{equation}

Here we have 2 parameters, the intercept, $w_0$, and the slope, $w_1$. This is linear a linear function in $x$ and in the weights $w_i$. What if we had reason to believe that the $y$-values were a non-linear function of the $x$-values, and that function was related to the square of $x$, i.e., $x^2$. So that would mean that $y= w_0 + w_1 x^2$. So you scratch your head and say, "that's not linear!" And you'd be right - with respect to $x$. However, it's perfectly linear in $w_0$ and $w_1$.

Linear regression of a non-linear function

This is where linear models can have lots of power. Instead of $y = w_0 + w_1 x$, let's replace x with the function $z(x)=x^3$. Now we have $y = w_0 + w_1 z(x)$, which is still a linear model in the weights, $w$. Here, $z(x)$ could be any function of $x$. To make it more interesting, we'll make this a polynomial function of $x$. Here our data will be drawn from the function $y = a_0 + a_1 x + a_2 x^2 + a_3 x^3$. What we'll do, is try fitting a number of different possible non-linear models for $x$ through polynomial functions of different orders,

\begin{equation} y_n(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_n x^n \end{equation}

We'll vary the polynomial order of our model and see how well it matches our data.

In [1]:
%config InlineBackend.figure_format = 'retina'
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

plt.close()

# Generate some data with noise
N = 10
size = (N,1)
interval = (0,2)
noise_std = 0.1
a = [1, 1.5, -2., 0.65]

x = np.random.random(size) * interval[1]
noise = np.random.standard_normal(size) * noise_std
f = lambda a, x: a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3
y = f(a, x) + noise

# Generate an example of the actual function
x_true = np.linspace(interval[0],interval[1],100)[np.newaxis].T
y_true = f(a, x_true)

# Plot the function and the data
fig = plt.figure(figsize=(8, 5), facecolor='w', edgecolor='k')
ax = plt.plot(x_true,y_true,
         color=0.5*np.array([1,1,1]),
         linewidth=3,
         linestyle='--',
         label='Target')
plt.plot(x,y,
         linestyle='none',
         linewidth=0.5,
         marker='o',
         markersize=4,
         markerfacecolor='r',
         markeredgecolor='k',
         zorder=3,
         label='Data')
plt.axis([0,2,0,2.5])
plt.xlabel('x')
plt.ylabel('y')

# Fit progressively more complex polynomial models
order_to_plot = [1,3,5,7]
nModels = len(order_to_plot)
def poly(x):
    ones = np.ones(x.shape)
    z_values = (ones,x,x**2,x**3,x**4,x**5,x**6,x**7,x**8,x**9)
    return np.concatenate(z_values,axis=1)

z = poly(x)
z_true = poly(x_true)


colors = plt.cm.viridis(np.linspace(0,1,nModels))
mdl = linear_model.LinearRegression(fit_intercept=False)
for i in order_to_plot:
    if i == 0:
        y_hat = np.mean(x) * np.ones(x_true.shape)
    else:
        cz = z[:,0:i + 1]
        cz_true = z_true[:,0:i + 1]
        mdl.fit(cz,y)
        a_hat = mdl.coef_[0]
        y_hat = np.matmul(a_hat,cz_true.T)

    plt.plot(x_true,y_hat,
             color=colors[order_to_plot.index(i)],
             label='Order = {}'.format(i))

# Plot legend outside axis
ax = plt.gca()
axes_bbox = ax.get_position()
ax.set_position([axes_bbox.x0, axes_bbox.y0, axes_bbox.width*0.8, axes_bbox.height])
plt.legend(bbox_to_anchor=(1.02, 0, .102, 1))
plt.show()

Closing thoughts

As you can see the correct order of the model tends to produce results closest to the underlying process. The higher order models overfit the data and result in a highly contorted polynomial function. This brings a number of further points to explore (and we will in future posts):

  1. Overfit - when the estimated model is pretty much fitting each point in the data perfectly, it's one example of overfit. This often occurs when the model is overly complex for the data, as the 7-th order polynomial is in this case. One way to prevent overfit is through model validation.
  2. Bias-variance tradeoff - the tradeoff between overfit and underfit, and how well the estimated model generalizes.
  3. Model validation - to avoid overfitting, avoid evaluating performance on the same data used to train the model. There are a number of methods for this including cross-validation

One tangential important distinction in regression terminology between simple linear regresion, multiple linear regression, and multivariate linear regression:

  • simple linear regresion - regression with one independent variable, ($x$)
  • multiple linear regression - regression with more than one independent variable, ($x$)
  • multivariate linear regression - regression with more than one dependent variable, ($\mathbf{y}$), is predicted In the example here, we started discussing the case of simple linear regression, and moved to multiple linear regression as we explored the polynomial functions, $z(x)$.