Using ML Data Distribution

Data Distribution To create big data sets for testing, we use the Python module NumPy, which comes with a number of methods to create random data sets, of any size.

In [1]: import numpy  #Create an array containing 250 random floats between 0 and 5:
x = numpy.random.uniform(0.0, 5.0, 250)
[3.99547431e+00 4.05961982e+00 9.03364219e-01 1.16133196e+00
 1.62204487e-01 1.84084337e+00 1.15679072e+00 4.38792336e+00
 2.63030042e+00 2.57213530e-01 2.78454604e+00 2.12727368e+00
 3.93277980e+00 2.89864862e+00 2.50440580e+00 2.88083411e+00
 4.30203692e+00 4.47886694e+00 4.16871579e+00 1.27755199e+00
 1.18752988e+00 4.53121479e+00 1.06732858e+00 9.07497198e-01
 1.23154614e+00 4.79765528e-01 3.31731624e+00 3.88022886e+00
 4.12174136e+00 6.69026894e-01 2.81438968e+00 1.52868682e+00
 2.33222964e+00 2.00802215e+00 4.16463270e+00 2.70134839e+00
 3.81794141e+00 2.66755049e+00 2.05774835e+00 1.80003779e+00
 2.69573296e+00 2.15732462e+00 1.70625280e-01 6.09770833e-01
 9.43530201e-01 8.48967466e-01 8.28191697e-01 1.14552149e+00
 2.95070639e-01 2.85946667e+00 2.93982750e-01 3.31203548e+00
 2.08959629e+00 2.13944998e+00 4.79858546e+00 5.92033483e-01
 8.33240648e-01 1.37012743e+00 2.74934895e-01 3.00178970e+00
 1.27243664e+00 2.94463709e+00 3.27145798e+00 4.68124418e+00
 2.31610409e+00 4.84959757e+00 3.54134618e+00 2.09026934e+00
 1.61065969e+00 1.71635437e+00 4.78018733e-02 4.46462375e+00
 1.06684266e+00 2.94400947e-01 3.29836367e+00 1.33417879e-01
 3.56892520e+00 4.88974239e+00 1.42616554e+00 2.72642258e+00
 1.31838129e+00 1.62208124e+00 1.42352720e+00 2.51593218e+00
 4.43541326e+00 2.36180553e+00 3.60431175e+00 4.86129217e+00
 2.47818153e+00 3.72691349e+00 6.17690241e-01 2.03309151e+00
 2.97481422e+00 5.91907642e-01 4.07360320e-01 4.30536712e+00
 3.58954312e+00 2.64857957e+00 1.26225229e+00 2.32040381e+00
 2.03157692e+00 2.43729263e+00 1.16613073e+00 3.77629954e+00
 1.12478543e+00 3.22288798e+00 1.71405738e+00 6.05841214e-01
 1.71606327e-01 7.91139453e-01 1.45713808e+00 2.14855236e+00
 4.41655275e+00 9.20530679e-01 4.88554120e+00 3.66008597e+00
 3.73294707e+00 3.20111923e+00 6.44882516e-01 2.52544409e+00
 2.46878672e+00 2.98338086e+00 3.74588786e+00 2.56112906e+00
 4.26014112e-01 2.40354909e+00 5.07843037e-01 4.38220647e-02
 3.59857080e-02 3.49121116e+00 1.58486848e-01 2.66921347e+00
 1.20933590e+00 4.84000996e+00 2.83737790e+00 2.57174915e+00
 1.95972129e+00 4.70166575e+00 2.54803555e+00 3.18010269e+00
 1.33050298e+00 2.45983069e+00 3.99717567e+00 3.92165986e+00
 2.86285193e+00 4.68477185e+00 1.12114003e+00 3.56194687e+00
 2.03287888e+00 1.68692799e+00 7.17388279e-01 3.70722681e+00
 1.72481542e+00 2.50739982e+00 4.74636596e+00 9.61703640e-01
 3.18732301e+00 4.17580501e+00 8.57945812e-01 1.67550242e+00
 8.43739802e-01 1.77971245e+00 3.15662066e+00 4.83498076e+00
 3.12139211e+00 7.44331445e-01 1.40245289e+00 1.66685310e+00
 2.83658074e+00 3.75405086e+00 8.41989658e-01 2.47684927e+00
 1.45914659e+00 4.21908172e+00 4.80321327e+00 1.25171106e+00
 2.43830366e+00 1.25817228e+00 2.29947624e+00 1.31154312e+00
 3.65278217e+00 3.17431199e+00 3.09420816e+00 8.12591464e-01
 4.96659809e+00 1.74688272e+00 4.56096114e+00 3.87608103e+00
 2.84675101e+00 5.09378738e-01 4.09301280e+00 1.71099056e+00
 4.64595216e-04 1.14184388e+00 3.69090362e+00 4.88677779e-01
 9.24656601e-01 2.79637464e+00 1.49342826e-01 3.81429745e+00
 2.09787680e+00 1.39557998e+00 1.33986626e+00 2.73433574e+00
 4.96788789e+00 4.73995338e+00 1.45143329e+00 3.54423983e+00
 4.86476249e+00 2.81176675e+00 3.03730423e+00 4.81114308e+00
 1.60367384e+00 4.70313020e+00 2.88329148e+00 3.82982135e+00
 3.51311198e+00 3.29771993e+00 2.60541915e+00 1.11918676e+00
 2.53064741e+00 1.65505301e+00 2.28782772e+00 2.50014682e+00
 3.28284855e+00 1.87703200e+00 1.05429257e+00 2.98734760e+00
 1.79807242e+00 3.22159816e+00 4.88572755e-01 4.44186400e+00
 2.15384964e+00 3.68603296e+00 3.04314723e+00 5.26166939e-01
 2.98411399e+00 2.18538906e+00 1.30792532e+00 2.61774055e+00
 3.54752236e+00 1.10980784e+00 1.08187466e+00 1.02845816e+00
 3.89951143e+00 2.77408780e+00 3.68895805e+00 4.29142801e+00
 4.57842706e+00 1.92368873e+00]

In [2]: # Draw a Histogram

Draw a Histogram

import numpy
import matplotlib.pyplot as plt

x = numpy.random.uniform(0.0, 5.0, 250)

plt.hist(x, 5)

Conclusion: Histogram Explained We use the array from the example above to draw a histogram with 5 bars.

The first bar represents how many values in the array are between 0 and 1.

The second bar represents how many values are between 1 and 2.


Which gives us this result:

52 values are between 0 and 1 48 values are between 1 and 2 49 values are between 2 and 3 51 values are between 3 and 4 50 values are between 4 and 5

In [3]: # Create an array with 100000 random numbers, and display them using a histogram with 100 bars:
import numpy
import matplotlib.pyplot as plt
x = numpy.random.uniform(0.0, 5.0, 100000)
plt.hist(x, 100)

Normal Data Distribution

In probability theory this kind of data distribution is known as the normal data distribution, or the Gaussian data distribution, after the mathematician Carl Friedrich Gauss who came up with the formula of this data distribution.

In [4]: import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(5.0, 1.0, 100000)​
plt.hist(x, 100)

Note: A normal distribution graph is also known as the bell curve because of it’s characteristic shape of a bell.


Histogram Explained We use the array from the numpy.random.normal() method, with 100000 values, to draw a histogram with 100 bars. We specify that the mean value is 5.0, and the standard deviation is 1.0. Meaning that the values should be concentrated around 5.0, and rarely further away than 1.0 from the mean. And as you can see from the histogram, most values are between 4.0 and 6.0, with a top at approximately 5.0.

Today you’ve learned more about charts that can be used for visualizing data distribution. We encourage you to learn by doing and try creating such charts in your data analysis project.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s