机器学习线性回归复现

预测函数

$$
y_\theta(x)=\theta_0x+\theta_1b
$$

n元线性回归

$$
h_\theta(x)=\theta_0+\theta_1x1+\theta_2x2
$$

$$
h_\theta(x)=\sum^n_{i=0}{\theta_ix_i}= \theta^Tx
$$

损失函数 cost function

$$
J(\theta_0,\theta_1,….,\theta_n) = {1\over{2m}}\sum^{m}_{i=1}(h\theta(x^i)-y^i)^2
$$

梯度下降法

$$
\theta_j := \theta_j-\alpha{\partial\over\partial\theta_j}(\theta_0,\theta_i)
$$

Python实现

安装相关库

1
2
3
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

准备数据

1
2
3
4
data = np.array([[32, 31], [53, 68], [61, 62], [47, 71], [59, 87], [55, 78], [52, 79], [39, 59], [48, 75], [52, 71],
[45, 55], [54, 82], [44, 62], [58, 75], [56, 81], [48, 60], [44, 82], [60, 97], [45, 48], [38, 56],
[66, 83], [65, 118], [47, 57], [41, 51], [51, 75], [59, 74], [57, 95], [63, 95], [46, 79],
[50, 83]])

读取数据

1
2
x = data[:,0]
y = data[:,1]

定义损失函数

1
2
3
4
5
6
7
8
def cost (w,b,data):
total_cost = 0
m = len(data)
for i in range(m):
x = data[i, 0]
y = data[i, 1]
total_cost += (w * x + b) - y
return total_cost / m

对数据进行reshape

1
2
x_new = x.reshape(-1,1)
y_new = y.reshape(-1,1)

使用模型获取w,b参数

1
2
3
4
lr = LinearRegression()
lr.fit(x_new,y_new)
w = lr.coef_[0][0]
b = lr.intercept_[0]

调用损失函数

1
cost = cost(w, b, data)

绘制图形

1
2
3
4
plt.scatter(x,y)
pred_y = w * x + b
plt.plot(x,pred_y,c='r')
plt.show()

运行结果