Multiple Regression
Multiple regression is like linear regression, but with more than one independent value,
meaning that we try to predict a value based on two or more variables.
Take a look at the data set below, it contains some information about cars.
Loading the CSV into a DataFrame:
import pandas as pd
df = pd.read_csv(‘Documents/Machine Learning/cars.csv’)
print(df.to_string())
Car Model Volume Weight CO2 0 Toyoty Aygo 1000 790 99 1 Mitsubishi Space Star 1200 1160 95 2 Skoda Citigo 1000 929 95 3 Fiat 500 900 865 90 4 Mini Cooper 1500 1140 105 5 VW Up! 1000 929 105 6 Skoda Fabia 1400 1109 90 7 Mercedes A-Class 1500 1365 92 8 Ford Fiesta 1500 1112 98 9 Audi A1 1600 1150 99 10 Hyundai I20 1100 980 99 11 Suzuki Swift 1300 990 101 12 Ford Fiesta 1000 1112 99 13 Honda Civic 1600 1252 94 14 Hundai I30 1600 1326 97 15 Opel Astra 1600 1330 97 16 BMW 1 1600 1365 99 17 Mazda 3 2200 1280 104 18 Skoda Rapid 1600 1119 104 19 Ford Focus 2000 1328 105 20 Ford Mondeo 1600 1584 94 21 Opel Insignia 2000 1428 99 22 Mercedes C-Class 2100 1365 99 23 Skoda Octavia 1600 1415 99 24 Volvo S60 2000 1415 99 25 Mercedes CLA 1500 1465 102 26 Audi A4 2000 1490 104 27 Audi A6 2000 1725 114 28 Volvo V70 1600 1523 109 29 BMW 5 2000 1705 114 30 Mercedes E-Class 2100 1605 115 31 Volvo XC70 2000 1746 117 32 Ford B-Max 1600 1235 104 33 BMW 216 1600 1390 108 34 Opel Zafira 1600 1405 109 35 Mercedes SLK 2500 1395 120
Tip: use to_string() to print the entire DataFrame.
By default, when you print a DataFrame, you will only get the first 5 rows, and the last 5 rows:
We can predict the CO2 emission of a car based on the size of the engine,
but with multiple regression we can throw in more variables,
like the weight of the car, to make the prediction more accurate.
Now we have a regression object that are ready to predict CO2 values
based on a car’s weight and volume:
Tip: It is common to name the list of independent values with a upper case X,
and the list of dependent values with a lower case y.
import pandas
from sklearn import linear_model
df = pandas.read_csv(“Documents/Machine Learning/cars.csv”)
X = df[[‘Weight’, ‘Volume’]]
y = df[‘CO2’]
regr = linear_model.LinearRegression()
regr.fit(X, y)
predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm3:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
[107.2087328]
We have predicted that a car with 1.3 liter engine, and a weight of 2300 kg, will release approximately 107 grams of CO2 for every kilometer it drives.
Coefficient
The coefficient is a factor that describes the relationship with an unknown variable.
Example: if x is a variable, then 2x is x two times. x is the unknown variable, and the number 2 is the coefficient.
In this case, we can ask for the coefficient value of weight against CO2,
and for volume against CO2. The answer(s) we get tells us what would happen if we increase,
or decrease, one of the independent values.
Print the coefficient values of the regression object:
import pandas
from sklearn import linear_model
df = pandas.read_csv(“Documents/Machine Learning/cars.csv”)
X = df[[‘Weight’, ‘Volume’]]
y = df[‘CO2’]
regr = linear_model.LinearRegression()
regr.fit(X, y)
print(regr.coef_)
[0.00755095 0.00780526]
Conclusion
Result Explained
The result array represents the coefficient values of weight and volume.
Weight: 0.00755095
Volume: 0.00780526
These values tell us that if the weight increase by 1kg, the CO2 emission increases by 0.00755095g.
And if the engine size (Volume) increases by 1 cm3, the CO2 emission increases by 0.00780526 g.
I think that is a fair guess, but let test it!
We have already predicted that if a car with a 1300cm3 engine weighs 2300kg, the CO2 emission will be approximately 107g.
What if we increase the weight with 1000kg?
import pandas
from sklearn import linear_model
df = pandas.read_csv(“Documents/Machine Learning/cars.csv”)
X = df[[‘Weight’, ‘Volume’]]
y = df[‘CO2’]
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[3300, 1300]])
print(predictedCO2)
[114.75968007]
We have predicted that a car with 1.3 liter engine, and a weight of 3300 kg, will release approximately 115 grams of CO2
for every kilometer it drives.
Which shows that the coefficient of 0.00755095 is correct:
107.2087328 + (1000 * 0.00755095) = 114.75968