Here are some tools we can use to do a full linear regression:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import linregress
import pandas as pd
Here's one of our commonly used datasets:
df = pd.read_csv('https://www.marksmath.org/data/cdc.csv')
df.head()
Let's grab a sample from there and see how weight is related to height.
sam = df.sample(200, random_state=1)
sam.plot.scatter('height', 'weight')
Now, let's perform a linear regression:
lr = linregress(sam.height, sam.weight)
lr
Well, there's a number of things that we'll need to interpret here. The first is the regression line, which can be defined in terms of the slope and intercept:
def f(x): return lr.slope*x + lr.intercept
sam.plot.scatter('height', 'weight')
plt.plot([55,80], [f(55), f(80)], 'black')
For homework, you might want to just enter a small data set, like so:
x = [1,2,3,8]
y = [4,3,6,9]
plt.plot(x,y,'bo')
And do a regression:
lr = linregress([1,2,3,8], [4,3,6,9])
lr
And visualize it:
def f(x): return lr.slope*x + lr.intercept
plt.plot(x,y, 'bo')
plt.plot([1,8], [f(1), f(8)], 'black')