Probability is about how Likely something is to occur, or how likely something is true.
Probability = Ways / Outcomes
Probability is about how Likely something is to occur, or how likely something is true.
Probability = Ways / Outcomes
This data visualization is based on Google search console from mobile devices / Computers / Tablets.
Data Studio allows you to create beautiful dashboards full of charts quickly and easily. It’s very easy to use for sharing reports and dashboards with your internal/external teams if they have a Google account. It enables collaboration within business groups.
With Data Studio, you can connect, analyze, and present data from different sources.
Data Studio is use for data visualization and as a reporting tool. It was created by Google in 2016. And it has gained a lot of traction from Data Scientists, Analysts, and Sales and Marketing Experts.
Data Studio is completely free. There’s no paid version of it. You can use it as an alternative to paid reporting tools such as Tableau and Power BI.
Data Studio is cloud-based:
It’s accessible through any browser and an internet connection. The reports you create are saved automatically into Google Data Studio framework, so they’re available anytime and anywhere. No worries about losing the files.
There are many pre-built templates in Data Studio, allowing you to create beautiful dashboards full of charts quickly and easily. It’s very easy to share reports and dashboards with your internal / external teams if they have a Google account. It enables collaboration within business groups.
With Data Studio, you can connect, analyze, and present data from different sources. You don’t even need to be tech-savvy or know programming languages to get started with Data Studio.
Google Data Studio: Data sources and connectors:
Every time you want to create a report, first, you’ll need to create a data source. It’s important to note that data sources are not your original data. To clarify and avoid confusion, see the explanation below:
When Data Studio was first released, there were only six Google-based data sources you could connect to. But a lot has changed since then!
As of this writing, there are 400+ connectors to access your data from 800+ datasets. Besides Google Connectors, there are also Partner Connectors (third-party connectors).
In the example below we’ll go through US Office Equipment Sample Dataset to visualize different charts representing the data.
File Upload / Locate File:
CSV files are called Unmapped data because their contents are unknown in advance.
Analyze and Visualize the Data:
Quick Steps to Set Up Data Visualization on Google Data Studio:
Conclusion:
Congratulations! We just went through how to create a Business Intelligence BI dashboard using Google Data Studio for visualizing and exploring a sample Office Equipment dataset.
Data Studio allows you to create beautiful dashboards full of charts quickly and easily. It’s very easy to use for sharing reports and dashboards with your internal/external teams if they have a Google account. It enables collaboration within business groups.
With Data Studio, you can connect, analyze, and present data from different sources. You don’t even need to be tech-savvy or know programming languages to get started with Data Studio.
In this example we’ll perform different Data Visualization charts on Population Data. There’s an easy way to create visuals directly from Pandas, and we’ll see how it works in detail in this tutorial.
To easily create interactive visualizations, we need to install Cufflinks. This is a library that connects Pandas with Plotly, so we can create visualizations directly from Pandas (in the past you had to learn workarounds to make them work together, but now it’s simpler) First, make sure you install Pandas and Plotly running the following commands on the terminal:
Install the following labraries in the this order – on Conda CMD prompt pip install pandas pip install plotly pip install cufflinks
import pandas as pd import cufflinks as cf from IPython.display import display,HTML cf.set_config_file(sharing='public',theme='ggplot',offline=True)
In this case, I’m using the ‘ggplot’ theme, but feel free to choose any theme you want. Run the command cf.getThemes() to get all the themes available. To create data visualization with Pandas in the following sections, we only need to use the syntaxdataframe.iplot().
The data we’ll use is a population dataframe. First, download the CSV file from Kaggle.com, move the file where your Python script is located, and then read it in a Pandas dataframe as shown below.
#Format year column to number with no decimals df_population = pd.read_csv('documents/population/population.csv')
#use a list of indexes: print(df_population.loc[[0,10]])
country year population 0 China 2020.0 1.439324e+09 10 China 1990.0 1.176884e+09
print(df_population.head(10))
country year population 0 China 2020.0 1.439324e+09 1 China 2019.0 1.433784e+09 2 China 2018.0 1.427648e+09 3 China 2017.0 1.421022e+09 4 China 2016.0 1.414049e+09 5 China 2015.0 1.406848e+09 6 China 2010.0 1.368811e+09 7 China 2005.0 1.330776e+09 8 China 2000.0 1.290551e+09 9 China 1995.0 1.240921e+09
This dataframe is almost ready for plotting, we just have to drop null values, reshape it and then select a couple of countries to test our interactive plots. The code shown below does all of this.
# dropping null values df_population = df_population.dropna()
# reshaping the dataframe df_population = df_population.pivot(index="year", columns="country", values="population")
# selecting 5 countries df_population = df_population[['United States', 'India', 'China', 'Nigeria', 'Spain']]
print(df_population.head(10))
country United States India China Nigeria Spain year 1955.0 171685336.0 4.098806e+08 6.122416e+08 41086100.0 29048395.0 1960.0 186720571.0 4.505477e+08 6.604081e+08 45138458.0 30402411.0 1965.0 199733676.0 4.991233e+08 7.242190e+08 50127921.0 32146263.0 1970.0 209513341.0 5.551898e+08 8.276014e+08 55982144.0 33883749.0 1975.0 219081251.0 6.231029e+08 9.262409e+08 63374298.0 35879209.0 1980.0 229476354.0 6.989528e+08 1.000089e+09 73423633.0 37698196.0 1985.0 240499825.0 7.843600e+08 1.075589e+09 83562785.0 38733876.0 1990.0 252120309.0 8.732778e+08 1.176884e+09 95212450.0 39202525.0 1995.0 265163745.0 9.639226e+08 1.240921e+09 107948335.0 39787419.0 2000.0 281710909.0 1.056576e+09 1.290551e+09 122283850.0 40824754.0
Let’s make a lineplot to compare how much the population has grown from 1955 to 2020 for the 5 countries selected. As mentioned before, we will use the syntax df_population.iplot(kind=’name_of_plot’) to make plots as shown below.
df_population.iplot(kind='line',xTitle='Years', yTitle='Population', title='Population (1955-2020)')
We can make a single barplot on barplots grouped by categories. Let’s have a look.
Let’s create a barplot that shows the population of each country by the year 2020. To do so, first, we select the year 2020 from the index and then transpose rows with columns to get the year in the column. We’ll name this new dataframe df_population_2020 (we’ll use this dataframe again when plotting piecharts)
df_population_2020 = df_population[df_population.index.isin([2020])] df_population_2020 = df_population_2020.T
Now we can plot this new dataframe with .iplot(). In this case, I’m going to set the bar color to blue using the color argument.
df_population_2020.iplot(kind='bar', color='blue', xTitle='Years', yTitle='Population', title='Population in 2020')
Now let’s see the evolution of the population at the beginning of each decade.
# filter years out df_population_sample = df_population[df_population.index.isin([1980, 1990, 2000, 2010, 2020])] # plotting df_population_sample.iplot(kind='bar', xTitle='Years', yTitle='Population')
Naturally, all of them increased their population throughout the years, but some did it at a faster rate.
Boxplots are useful when we want to see the distribution of the data. The boxplot will reveal the minimum value, first quartile (Q1), median, third quartile (Q3), and maximum value. The easiest way to see those values is by creating an interactive visualization. Let’s see the population distribution of the China.
df_population['China'].iplot(kind='box', color='green', yTitle='Population')
Let’s say now we want to get the same distribution but for all the selected countries.
df_population.iplot(kind='box', xTitle='Countries', yTitle='Population')
As we can see, we can also filter out any country by clicking on the legends on the right.
A histogram represents the distribution of numerical data. Let’s see the population distribution of the USA and Nigeria.
df_population[['United States', 'Nigeria']].iplot(kind='hist', xTitle='Population')
Let’s compare the population by the year 2020 again but now with a piechart. To do so, we’ll use the df_population_2020 dataframe created in the “Single Barplot” section. However, to make a piechart we need the “country” as a column and not as an index, so we use .reset_index() to get the column back. Then we transform the 2020 into a string.
# transforming data df_population_2020 = df_population_2020.reset_index() df_population_2020 =df_population_2020.rename(columns={2020:'2020'}) # plotting df_population_2020.iplot(kind='pie', labels='country', values='2020', title='Population in 2020 (%)')
Although population data is not suitable for a scatterplot (the data follows a common pattern), I would make this plot for the purposes of this guide. Making a scatterplot is similar to a line plot, but we have to add the mode argument.
df_population.iplot(kind='scatter', mode='markers')
Whaola! Now you’re ready to make your own beautiful interactive visualization with Pandas.
You must be logged in to post a comment.