Using Google Data Studio for Data Visualization and Exploration

Data Studio is use for data visualization and as a reporting tool. It was created by Google in 2016. And it has gained a lot of traction from Data Scientists, Analysts, and Sales and Marketing Experts.

Data Studio is completely free. There’s no paid version of it. You can use it as an alternative to paid reporting tools such as Tableau and Power BI.

Data Studio is cloud-based:

It’s accessible through any browser and an internet connection. The reports you create are saved automatically into Google Data Studio framework, so they’re available anytime and anywhere. No worries about losing the files.

There are many pre-built templates in Data Studio, allowing you to create beautiful dashboards full of charts quickly and easily. It’s very easy to share reports and dashboards with your internal / external teams if they have a Google account. It enables collaboration within business groups.

With Data Studio, you can connect, analyze, and present data from different sources. You don’t even need to be tech-savvy or know programming languages to get started with Data Studio.

Google Data Studio: Data sources and connectors:

Every time you want to create a report, first, you’ll need to create a data source. It’s important to note that data sources are not your original data. To clarify and avoid confusion, see the explanation below:

  • The original data, such as data in a Google spreadsheet, MySQL database, LinkedIn, YouTube, or data stored in other platforms and services, is called a dataset.
  • To link a report to the dataset, you need a data connector to create a data source.
  • The data source maintains the information of the connection credential. And it keeps track of all the fields that are part of that connection.  
  • You can have multiple data sources connected to a dataset, and this may come in handy when collaborating with different team members. For example, you may want to share data sources with different connection capabilities for different team members.

When Data Studio was first released, there were only six Google-based data sources you could connect to. But a lot has changed since then! 

As of this writing, there are 400+ connectors to access your data from 800+ datasets. Besides Google Connectors, there are also Partner Connectors (third-party connectors). 

In the example below we’ll go through US Office Equipment Sample Dataset to visualize different charts representing the data.

  • Open Google Data Studio from your browser by using this link.
  • Click Create button on the left
  • Open a connection to the data source of interest. In our case, we’ll use this link to the CSV file Dataset.

File Upload / Locate File:

  • Upload CSV file
  • On the next screen, you will be presented with a data file schema for the uploaded CSV file.
  • The data types can be changed on existing fields within the data file schema and new calculated fields added if needed.

CSV files are called Unmapped data because their contents are unknown in advance.

Analyze and Visualize the Data:

  • Add the data source and you will end up in the report canvas.
  • Use the appropriate charts from the Add Charts tool bar menu above to select the desire charts as shown below to create data visualization reports.

Quick Steps to Set Up Data Visualization on Google Data Studio:

  1. Open Data Studio.
  2. Familiarize yourself with the dashboard.
  3. Connect your first data source.
  4. Create your first report.
  5. Add some charts.
  6. Customize the formatting and add a title and captions.
  7. Share the report.

Conclusion:

Congratulations! We just went through how to create a Business Intelligence BI dashboard using Google Data Studio for visualizing and exploring a sample Office Equipment dataset.

Data Studio allows you to create beautiful dashboards full of charts quickly and easily. It’s very easy to use for sharing reports and dashboards with your internal/external teams if they have a Google account. It enables collaboration within business groups.

With Data Studio, you can connect, analyze, and present data from different sources. You don’t even need to be tech-savvy or know programming languages to get started with Data Studio.

How to Create a Python Dash Plotly Dashboard App

In this tutorial, I will discuss and go through a practical example on how to create a Python Dash Plotly App. I will create multiple charts for Data Visualization using Dynamic Callbacks which is also known as Pattern Matching Callbacks from Plotly.com. I will use data of The World Population to create the Dashboard App.

Introduction:

Pattern Matching Callbacks – Creating different charts for Data Visualization with callbacks. The users get much more power and control over the App. It gives the users much more flexibility to create callbacks for every set of inputs and outputs that doesn’t yet exist in the App.

MATCH will fire the callback when any of the component’s properties change. However, instead of passing all of the values into the callback, MATCH will pass just a single value into the callback. Instead of updating a single output, it will update the dynamic output that is “matched” with.

Install / Import Python necessary Libraries:

Let’s get started. Import the following libraries as listed below: I’m using Anaconda Jupyter Notebook, launch the CMD Prompt and install the following libraries if you don’t currently have them installed on your computer.

import dash     #pip install dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output, ALL, State, MATCH, ALLSMALLER
import plotly.express as px   #pip install plotly==5.2.2
import pandas as pd     #pip install pandas 
import numpy as np      #pip install numpy

Get Data:

We then read in the Panda data frame file. I have download the file to my computer but you can get it from my Github repository link.

df = pd.read_csv("Documents/Data Science/population.csv")     #https://github.com/Valnjee/datascience/blob/master/population.csv
print(df)
           country    year    population
0             China  2020.0  1.439324e+09
1             China  2019.0  1.433784e+09
2             China  2018.0  1.427648e+09
3             China  2017.0  1.421022e+09
4             China  2016.0  1.414049e+09
...             ...     ...           ...
4180  United States  1965.0  1.997337e+08
4181  United States  1960.0  1.867206e+08
4182  United States  1955.0  1.716853e+08
4183          India  1960.0  4.505477e+08
4184          India  1955.0  4.098806e+08

[4185 rows x 3 columns]

Cleanse Data:

Make sure to clean the data by dropping all the null values.

# dropping null values
df = df.dropna()
print(df.head(10))
  country    year    population
0   China  2020.0  1.439324e+09
1   China  2019.0  1.433784e+09
2   China  2018.0  1.427648e+09
3   China  2017.0  1.421022e+09
4   China  2016.0  1.414049e+09
5   China  2015.0  1.406848e+09
6   China  2010.0  1.368811e+09
7   China  2005.0  1.330776e+09
8   China  2000.0  1.290551e+09
9   China  1995.0  1.240921e+09

Form and App Layout Design:

Here we design the layout in HTML with the button. Every option will go into the children.

app = dash.Dash(__name__)
app.layout = html.Div([
    html.H1("The World Population Dashboard with Dynamic Callbacks", style={"textAlign":"center"}),
    html.Hr(),
    html.P("Add as many charts for Data Visualization:"),
    html.Div(children=[
        html.Button('Add Chart', id='add-chart', n_clicks=0),
    ]),
    html.Div(id='container', children=[])
])

First Callback:

The new child is append to the div_children. Every click triggers the callback, then you get another child to append to the div_children with everything created in it. The dcc.RadioItems have options of 4 charts.

Output – displays the chart.

State – saves the input of the children.

@app.callback(
    Output('container', 'children'),
    [Input('add-chart', 'n_clicks')],
    [State('container', 'children')]
)
def display_graphs(n_clicks, div_children):
    new_child = html.Div(
        style={'width': '45%', 'display': 'inline-block', 'outline': 'thin lightgrey solid', 'padding': 10},
        children=[
            dcc.Graph(
                id={
                    'type': 'dynamic-graph',
                    'index': n_clicks
                },
                figure={}
            ),
            dcc.RadioItems(
                id={
                    'type': 'dynamic-choice',
                    'index': n_clicks
                },
                options=[{'label': 'Bar Chart', 'value': 'bar'},
                         {'label': 'Line Chart', 'value': 'line'},
                         {'label': 'Scatter Chart', 'value': 'scatter'},
                         {'label': 'Pie Chart', 'value': 'pie'}],
                value='bar',
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-s',
                    'index': n_clicks
                },
                options=[{'label': s, 'value': s} for s in np.sort(df['country'].unique())],
                multi=True,
                value=["United States", "China"],
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-ctg',
                    'index': n_clicks
                },
                options=[{'label': c, 'value': c} for c in ['country']],
                value='country',
                clearable=False
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-num',
                    'index': n_clicks
                },
                options=[{'label': n, 'value': n} for n in ['population']],
                value='population',
                clearable=False
            )
            
        ]
    )
    div_children.append(new_child)
    return div_children
html.Br()

Second Callback and create Graphs:

  • The display_dropdowns callback returns two elements with the same index: a dropdown and a div.
  • The second callback uses the MATCH selector. With this selector, we’re asking Dash to:
    1. Fire the callback whenever the value property of any component with the id 'type': 'dynamic-dropdown' changes: Input({'type': 'dynamic-dropdown', 'index': MATCH}, 'value')
    2. Update the component with the id 'type': 'dynamic-output' and the index that matches the same index of the input: Output({'type': 'dynamic-output', 'index': MATCH}, 'children')
    3. Pass along the id of the dropdown into the callback: State({'type': 'dynamic-dropdown', 'index': MATCH}, 'id')
  • With the MATCH selector, only a single value is passed into the callback for each Input or State.
  • Notice how it’s important to design IDs dictionaries that “line up” the inputs with outputs. The MATCH contract is that Dash will update whichever output has the same dynamic ID as the id. In this case, the “dynamic ID” is the value of the index and we’ve designed our layout to return dropdowns & divs with identical values of index.
  • In some cases, it may be important to know which dynamic component changed. As above, you can access this by setting id as State in the callback.
  • You can also use dash.callback_context to access the inputs and state and to know which input changed. outputs_list is particularly useful with MATCH because it can tell you which dynamic component this particular invocation of the callback is responsible for updating. Here is what that data might look like with two dropdowns rendered on the page after we change the first dropdown.

The second callback renders the chart interactively. It uses a dictionary of ‘type and ‘index’. The dynamic part of the callback is the input – component_id and the component_property which is the value. Input will trigger when the value of the component_id is changed which refers to the dynamic-dpn-s. The index is going to be matched with the ‘index’ : MATCH = 1.

dff – Always make a copy of the data frame.

Sometimes the user wants to see the data in different charts. With the multiple charts and dropdown options, the user gets to select the different countries he/she is interested in.

@app.callback(
    Output({'type': 'dynamic-graph', 'index': MATCH}, 'figure'),
    [Input(component_id={'type': 'dynamic-dpn-s', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-ctg', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-num', 'index': MATCH}, component_property='value'),
     Input({'type': 'dynamic-choice', 'index': MATCH}, 'value')]
)
def update_graph(s_value, ctg_value, num_value, chart_choice):
    print(s_value)
    dff = df[df['country'].isin(s_value)]

    if chart_choice == 'bar':
        dff = dff.groupby([ctg_value], as_index=False)[['population']].sum()
        fig = px.bar(dff, x='country', y=num_value)
        return fig
    elif chart_choice == 'line':
        if len(s_value) == 0:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.line(dff, x='year', y=num_value, color=ctg_value)
            return fig
    elif chart_choice == 'scatter':
        if len(s_value) == 1:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.scatter(dff, x='year', y=num_value, color=ctg_value)
            return fig    
    elif chart_choice == 'pie':
        fig = px.pie(dff, names=ctg_value, values=num_value)
        return fig

Here is the link on how to setup a development server.

if __name__ == '__main__':
    app.run_server(debug=False)
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

Conclusion:


CONGRATULATIONS! You have just learnt how to develop Web apps. Dash Plotly gives data scientists the power to build web apps to interact with data, deep learning, artificial intelligence and machine learning models.

In this introductory article, we’ve explored how to develop dashboard apps using Dash Plotly. Although it’s a trivial application, it illustrates the core concepts of this technology. Besides development, we’ve also seen how effortless it is to code in Plotly.

Dash is the original low-code framework for rapidly building data apps in Python, R, Julia, and F# (experimental).

Written on top of Plotly.js and React.js, Dash is ideal for building and deploying data apps with customized user interfaces. It’s particularly suited for anyone who works with data.

Through a couple of simple patterns, Dash abstracts away all of the technologies and protocols that are required to build a full-stack web app with interactive data visualization.

Dash is simple enough that you can bind a user interface to your code in less than 10 minutes.

Dash apps are rendered in the web browser. You can deploy your apps to VMs or Kubernetes clusters and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready.

There is a lot behind the framework. To learn more about how it is built and what motivated Dash, read their announcement letter or their post Dash is React for Python.

Dash is an open source library released under the permissive MIT license. Plotly develops Dash and also offers a platform for writing and deploying Dash apps in an enterprise environment. If you’re interested, please get in touch.

Web Apps are great for Data Visualization and gives the clients more flexibilities to navigate and maneuver the data. It’s very user friendly and aid in simplifying the understanding of the DATA.

Data Visualization Using Python

Using Machine Learning Data Distribution

Powerful Exploratory Data Analysis in 2 lines of codes