DS – How to calculate Stats from an input CSV file using Pandas

What is Data?
Data is a collection of information.
One purpose of Data Science is to structure data, making it interpretable and easy to work with.

Data can be categorized into two groups:
Structured data
Unstructured data

How to Structure Data?
We can use an array or a database table to structure or present data.

Python Libraries
Python has libraries with large collections of mathematical functions and analytical tools.

We will use the following libraries:

Pandas – This library is used for structured data operations, like import CSV files, create dataframes, and data preparation
Numpy – This is a mathematical library. Has a powerful N-dimensional array object, linear algebra, Fourier transform, etc.
Matplotlib – This library is used for visualization of data.
SciPy – This library has linear algebra modules

Create a DataFrame with Pandas

A data frame is a structured representation of data.

Let’s define a data frame with 3 columns and 5 rows with fictional numbers:

import pandas as pd

d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)

print(df)

   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6    12
3     4     9     1
4     7     5    11

Count the number of columns: use df.shape[1] for column

import pandas as pd

d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)
count_column = df.shape[1]

print(“Number of columns:”)
print(count_column)

Number of columns:
3

Count the number of rows: use df.shape[0] for row

import pandas as pd

d = {‘col1’: [1, 2, 3, 4, 7], ‘col2’: [4, 5, 6, 9, 5], ‘col3’: [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)
count_row = df.shape[0]

print(“Number of rows:”)
print(count_row)

Number of rows:
5

Dataset of Full Health Data

import pandas as pd

full_health_data = pd.read_csv(“Documents/Data Science/data.csv”, header=0, sep=”,”)

print(full_health_data.to_string())

     Duration  Average_Pulse  Max_Pulse  Calorie_Burnage  Hours_Work  Hours_Sleep
0          60            110        130              409         0.0          8.0
1          60            117        145              479         0.0          8.0
2          60            103        135              340         8.0          7.5
3          45            109        175              282         8.0          8.0
4          45            117        148              406         0.0          6.5
5          60            102        127              300         0.0          7.5
6          60            110        136              374         0.0          7.5
7          45            104        134              253         0.0          9.0
8          30            109        133              195         8.0          8.0
9          60             98        124              269         7.5          8.0
10         60            103        147              329         8.0          7.0
11         60            100        120              250         0.0          5.0
12         60            106        128              345         8.5          7.5
13         60            104        132              379         8.5          7.5
14         60             98        123              275         9.0          7.0
15         60             98        120              215         6.5          7.0
16         60            100        120              300         9.0          8.0
17         45             90        112              180         8.0          8.0
18         60            103        123              323         8.0          8.0
19         45             97        125              243         8.0          7.0
20         60            108        131              364         8.0          7.0
21         45            100        119              282         0.0          7.0
22         60            130        101              300         8.0          7.0
23         45            105        132              246         0.0          8.0
24         60            102        126              334         8.0          8.0
25         60            100        120              250         8.0          7.0
26         60             92        118              241         7.0          6.0
27         60            103        132              353         9.0          8.5
28         60            100        132              280         3.0          8.0
29         60            102        129              380         3.0          8.0
30         60             92        115              243         7.5          8.0
31         60            101        124              299         8.5          7.5
32         60             93        113              223         8.0          7.0
33         60            107        136              361         2.0          8.0
34         60            114        140              415         9.0          8.5
35         60            100        120              300         8.0          7.0
36         60            100        120              300         7.0          6.0
37         45            104        129              266         8.0          7.0
38         60             98        126              286         7.5          7.0
39         60            100        122              329         8.0          7.0
40         60            111        138              400         8.0          8.0
41         60            111        131              397         0.0          8.0
42         60             99        119              273         8.0          7.0
43         60            109        153              387         8.0          7.0
44         45            111        136              300        11.0          7.0
45         45            108        129              298         5.0          7.0
46         60            111        139              397         0.0          8.0
47         60            107        136              380         8.5          8.0
48         80            123        146              643         8.5          8.0
49         60            106        130              263         8.0          8.0
50         60            118        151              486         7.0          7.0
51         30            136        175              238         0.0          7.5
52         60            121        146              450         5.0          8.0
53         60            118        121              413         8.0          7.0
54         45            115        144              305         5.0          7.0
55         20            153        172              226         0.0          8.0
56         45            123        152              321         5.0          8.0
57        210            108        160             1376         8.0          8.0
58        160            110        137             1034         8.0          8.0
59        160            109        135              853         8.0          8.0
60         45            118        141              341         8.0          8.0
61         20            110        130              131         0.0          8.0
62        180             90        130              800         0.0          8.0
63        150            105        135              873         8.0          8.0
64        150            107        130              816         3.0          8.0
65         20            106        136              110         3.0          8.0
66        300            108        143             1500         0.0          8.0
67        150             97        129             1115         0.0          8.0
68         90            100        127              700         0.0          8.0
69        150             97        127              953         0.0          8.0
70         45            114        146              304         0.0          8.0
71         90             98        125              563         0.0          8.0
72         45            105        134              251         0.0          8.0
73         45            110        141              300         7.0          7.5
74        120            100        130              500         7.0          7.5
75        270            100        131             1729         0.0          7.0
76         30            159        182              319         5.0          7.5
77         45            149        169              344         0.0          8.0
78         30            103        139              151         4.0          8.0
79        120            100        130              500         0.0          8.0
80         45            100        120              225         8.5          8.0
81         30            151        170              300         0.0          7.0
82         45            102        136              234         6.5          8.0
83        120            100        157             1000         0.0          8.0
84         45            129        103              242         5.0          8.0
85         20             83        107               50         9.5          8.0
86        180            101        127              600         0.0          7.5
87         45            107        137             1200         8.0          8.0
88         30             90        107              105         7.0          8.0
89         15             80        100               50         9.0          8.0
90         20            150        171              127         0.0          7.5
91         20            151        168              229         0.0          7.0
92         30             95        128              128         8.0          7.5
93         25            152        168              244         0.0          8.0
94         30            109        131              188         8.0          7.5
95         90             93        124              604         8.0          8.0
96         20             95        112               77         8.0          8.0
97         90             90        110              500         8.0          8.0
98         90             90        100              500         0.0          8.0
99         90             90        100              500         5.0          8.0
100        30             92        108               92         8.0          8.0
101        30             93        128              124         8.0          8.0
102       180             90        120              800         8.0          8.0
103        30             90        120               86         7.0          8.0
104        90             90        120              500         8.0          8.0
105       210            137        184             1860         0.0          7.5
106        60            102        124              325         0.0          8.0
107        45            107        124              275         0.0          8.0
108        15            124        139              124         5.0          6.0
109        60            108        131              367         0.0          8.0
110        60            108        151              351         0.0          8.0
111        60            116        141              443         0.0          8.0
112        60             97        122              277         0.0          8.0
113        60            105        125              353         0.0          8.0
114        60            103        124              332         0.0          8.0
115        30            112        137              193         0.0          8.0
116        45            100        120              100         0.0          7.5
117        60            119        169              336         0.0          8.0
118        60            107        127              344         0.0          8.0
119        60            111        151              368         0.0          8.0
120        60             98        122              271         0.0          8.0
121        60             97        124              275         0.0          8.0
122        60            109        127              382         0.0          6.0
123        90             99        125              466         0.0          7.5
124        60            114        151              384         0.0         12.0
125        60            104        134              342         0.0          7.5
126        60            107        138              357         0.0          7.5
127        60            103        133              335         0.0          7.5
128        60            106        132              327         0.0          7.5
129        60            103        136              339         0.0          7.5
130        20            136        156              189         8.5          7.5
131        45            117        143              317         0.0          8.0
132        45            115        137              318         0.0          7.5
133        45            113        138              308         0.0          7.0
134        20            141        162              222         0.0          7.5
135        60            108        135              390         0.0          8.0
136        60             97        127              288         0.0          8.0
137        45            100        120              250         0.0          8.0
138        45            122        149              335         0.0          7.5
139        60            136        170              470         8.0          8.0
140        45            106        126              270         8.0          8.0
141        60            107        136              400         9.5          7.0
142        60            112        146              361         9.0          7.5
143        30            103        127              185         9.0          7.5
144        60            110        150              409         8.0          7.0
145        60            106        134              343         8.0          8.5
146        60            109        129              353         8.0          8.0
147        60            109        138              374         8.0          6.0
148        30            150        167              275         0.0          9.0
149        60            105        128              328         9.0          8.0
150        60             97        131              270         9.0          8.0
151        60            100        120              270         8.0          8.0
152        60            114        150              382         0.0          8.0
153        30             80        120              240        10.0          7.0
154        30             85        120              250        10.0          7.0
155        45             90        130              260         8.0          7.0
156        45             95        130              270         8.0          7.0
157        45            100        140              280         0.0          7.0
158        60            105        140              290         7.0          8.0
159        60            110        145              300         7.0          8.0
160        60            115        145              310         8.0          8.0
161        75            120        150              320         0.0          8.0
162        75            125        150              330         8.0          8.0

Performing Functions

How to calculate Stats from an input CSV file using Pandas

mean value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
mean1 = df[‘Calorie_Burnage’].mean()

print (‘Mean Calorie_Burnage: ‘ + str(mean1))

Mean Calorie_Burnage: 382.3680981595092

max value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
max1 = df[‘Calorie_Burnage’].max()

print (‘Max Calorie_Burnage: ‘ + str(max1))

Max Calorie_Burnage: 1860

min value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
min1 = df[‘Calorie_Burnage’].min()

print (‘Min Calorie_Burnage: ‘ + str(min1))

Min Calorie_Burnage: 50

sum value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
sum1 = df[‘Calorie_Burnage’].sum()

print (‘Sum Calorie_Burnage: ‘ + str(sum1))

Sum Calorie_Burnage: 62326

count value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
count1 = df[‘Calorie_Burnage’].count()

print (‘Count Calorie_Burnage: ‘ + str(count1))

Count Calorie_Burnage: 163

median value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
median1 = df[‘Calorie_Burnage’].median()

print (‘Median Calorie_Burnage: ‘ + str(median1))

Median Calorie_Burnage: 320.0

mode value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
mode1 = df[‘Calorie_Burnage’].mode()

print (‘Mode Calorie_Burnage: ‘ + str(mode1))

Mode Calorie_Burnage: 0    300
dtype: int64

standard deviation (std) value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
std1 = df[‘Calorie_Burnage’].std()

print (‘Std Calorie_Burnage: ‘ + str(std1))

Std Calorie_Burnage: 274.2271059590154

variance (var) value for a specific column within a csv file

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
var1 = df[‘Calorie_Burnage’].var()

print (‘Var Calorie_Burnage: ‘ + str(var1))

Var Calorie_Burnage: 75200.50564265704

grouping sum by Hours_Sleep

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
groupby_sum1 = df.groupby([‘Hours_Sleep’]).sum()

print (‘Sum’ + str(groupby_sum1))

Sum             Duration  Average_Pulse  Max_Pulse  Calorie_Burnage  Hours_Work
Hours_Sleep                                                                 
5.0                60            100        120              250         0.0
6.0               255            534        642             1421        27.0
6.5                45            117        148              406         0.0
7.0              1790           3289       4082            10799       200.5
7.5              1710           3174       3946            10173        95.0
8.0              6300           9654      12083            37254       366.5
8.5               180            323        406             1111        26.0
9.0                75            254        301              528         0.0
12.0               60            114        151              384         0.0

grouping count by Hours_Sleep

import pandas as pd

df = pd.read_csv(“Documents/Data Science/data.csv”)
groupby_count1 = df.groupby([‘Hours_Sleep’]).count()

print (‘Count’ + str(groupby_count1))

Count             Duration  Average_Pulse  Max_Pulse  Calorie_Burnage  Hours_Work
Hours_Sleep                                                                 
5.0                 1              1          1                1           1
6.0                 5              5          5                5           5
6.5                 1              1          1                1           1
7.0                31             31         31               31          31
7.5                28             28         28               28          28
8.0                91             91         91               91          91
8.5                 3              3          3                3           3
9.0                 2              2          2                2           2
12.0                1              1          1                1           1

Three lines to make our compiler able to draw:

import sys
import matplotlib
%matplotlib inline #use this instead of natplotlib.use(‘Agg’), this will render plt.show() to work

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

full_health_data = pd.read_csv(“Documents/Data Science/data.csv”, header=0, sep=”,”)

x = full_health_data[“Average_Pulse”]
y = full_health_data[“Calorie_Burnage”]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x, y)
plt.plot(x, mymodel)
plt.ylim(ymin=0, ymax=2000)
plt.xlim(xmin=0, xmax=200)
plt.xlabel(“Average_Pulse”)
plt.ylabel (“Calorie_Burnage”)
plt.show()

Advertisement