Differences between Power BI, Tableau and Python Dash

Data visualization has gained massive popularity in recent years owing to the demand for data. In a business setup, these business intelligence tools can help in analyzing all the data and monitor performance to enhance growth for the firm, and productivity for the employees. With the world switching to digital means all together in the year that went by, data is now considered fuel for every small, medium, or big firm.

In such a scenario, what sounds better- a spreadsheet that mentions the date, time, sales, and profit OR a colorful, descriptive bar chart interactively explaining all the details? Our vote goes to the latter.

What is a data visualization tool?

An essential part of any business strategy, data visualization is the process of collecting data and transforming it into a meaningful visualization to support decision-making. These visualizations could be in the form of bar charts, maps, or anything that is visually appealing and interactive. They convey the information to the viewer by simply looking at them, whereas normally one needs to read spreadsheets or text reports to understand the data.

Talking of the best data visualization tools used by analysts in various industries according to their specifications and applications, they comprise Power BI, Tableau and Python Dash. All these software programs help businesses make decisions questions faster.

Power BI is a Data Visualization and Business Intelligence tool provided by Microsoft. It can collect data from different data sources like Excel spreadsheets, on-premise database, cloud database and convert them into meaningful reports and dashboards. Its features such as creating quick insights, Q&A, Embedded Report, and Self Service BI made it top among all BI tools. It is also robust and always ready for extensive modeling and real-time analytics, as well as custom visual development.

Tableau offers business analysts to take business decisions by its feature, data visualization available to all business users of any background. It can establish a connection with any data source (Excel, local/on-premise database, cloud database).

Tableau is the fastest growing Data Visualization Tool among all visualization tools. Its visualizations are created as worksheets and dashboards. The beauty of tableau is that it does not require any technical or programming knowledge to create or develop reports and dashboards.

Python Dash

Dash is a python framework created by plotly for creating interactive web applications. With Dash, you don’t have to learn HTML, CSS and Javascript in order to create interactive dashboards, you only need python. Dash is open source and the application build using this framework are viewed on the web browser.

Dash is Downloaded 600,000 times per month, it’s the original low-code framework for rapidly building data apps in Python, R, Julia and F#(experimental).

It’s written on top of Plotly.js and React.js. Dash is ideal for building and deploying data apps with customized user interfaces. It’s particularly suited for anyone who works with data.

Through a couple of simple patterns, Dash abstracts away all of the technologies and protocols that are required to build a full-stack web app with interactive data visualization.

Dash is simple enough that you can bind a user interface to your code in less than 10 minutes.

Dash apps are rendered in the web browser. You can deploy the apps to VMs or kubermetes clusters and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready.

There is a lot behind the framework. To learn more about how it’s built and what motivated Dash, read announcement letter or Dash is React for Python post.

Dash is an open source library released under the permissive MIT license. Plotly develops Dash and also offers a platform for writing and deploying Dash apps in an enterprise environment.

Python Dash is mostly suited for the quick and easy representation of big data which helps in analyzing and resolving issues. Power BI, on the other hand, has its data models focused on ingestion and building relatively complex models. Python is the best when it comes to handling streaming data.

Power BI vs. Tableau vs, Python Dash:

Power BITableauPython Dash
It is provided by MicrosoftIt is provided by TableauIt is a Python library, provided by the Python Software Foundation
It is available at a moderate priceIt is expensive than Power BIIt is an open-source programming language that is freely available for everyone to use.
Need a business/private email to open an accountNeed a business/private email to open an accountAny email address is acceptable
Uses DAX for Measures and Calculated columnsUses MDX for Measures and DimensionsUses dynamic, interpretive script programming language
Connect limited Data Sources but increases it Data Source connections in Monthly updatesIt can connect to numerous Data SourcesPython has an ecosystem of modules and tools to collect data from multiple sources.
Can handle large Datasets using Premium capacityCan handle large DatasetsCan handle large Datasets
It provides Account base subscriptionIt provides Key base subscriptionNo subscription necessary
Embedding report is easyEmbedding report is a Real time challengeDash is simple enough that you can bind a user interface to your code in less than 10 minutes
It is integrated with Microsoft Azure, which helps in analyzing the data and understanding the patterns of the productTableau has built-in machine learning capabilities which makes it suitable for doing ML operations on datasetsDash is integrated with Python which offers multiple libraries in graphics that are packed with different features. Python is preferred for data analysis of the highest levels
It supports R and Python language-based visualizationsIt provides full integrated support for R and PythonDash is ideal for building and deploying data apps with customized user interfaces.

Which one to choose, Power BI, Tableau or Python Dash?

Data Analytics field has been changed over time from traditional bi practice, embedded bi and collaborative bi. Initially, data analytics was led by companies like IBM, Oracle, SAP but now this is not the situation. Now, this led by companies like Microsoft, Tableau and Python because of their features like Embedded BI Collaborative BI, Data Blending, Data Binding and Multi Data Source Connection.

Power BI, Tableau and Python Dash have their own Pros and Cons. The right product can be chosen based on touchstones & priority.

TouchstonesPower BITableauPython Dash
DescriptionA cloud-based business intelligence platform which offers an overview of critical dataA collection of intuitive business intelligence tools used for data discoveryA python framework created by Plotly for creating interactive web applications. It is best when it comes to handling Streaming Data
VisualizationProvides various visualizationsProvides a larger set of visualizations than Power BIProvides numerous set of visualizations
OS SupportOnly WindowsWindows and Macintosh OSMac OS, Windows, Linux, AWS, and others
Graphical featuresRegular charts, graphs, and mapsAny category of charts, bars, and graphsDash is ideal for building and deploying data apps with customized user interfaces. Any category of charts, bars and graphs
CostCheaperCostlyFree
OrganizationSuitable for Small, Medium & Large type of OrganizationSuitable for Medium & Large type of OrganizationSuitable for Small, Medium & Large type of Organization

Data Analysis Expressions (DAX) is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.

In Multidimensional Expressions (MDX), a measure is a named DAX expression that is resolved by calculating the expression to return a value in a Tabular Model. This innocuous definition covers an incredible amount of ground.

 


Differences between Power BI and Tableau

Power BI is a Data Visualization and Business Intelligence tool provided by Microsoft. It can collect data from different data sources like Excel spreadsheets, on-premise database, cloud database and convert them into meaningful reports and dashboards. Its features such as creating quick insights, Q&A, Embedded Report, and Self Service BI made it top among all BI tools. It is also robust and always ready for extensive modeling and real-time analytics, as well as custom visual development.

Tableau offers business analysts to take business decisions by its feature, data visualization available to all business users of any background. It can establish a connection with any data source (Excel, local/on-premise database, cloud database).

Tableau is the fastest growing Data Visualization Tool among all visualization tools. Its visualizations are created as worksheets and dashboards. The beauty of tableau is that it does not require any technical or programming knowledge to create or develop reports and dashboards.

Power BI vs. Tableau:

Power BITableau
It is provided by MicrosoftIt is provided by Tableau
It is available at a moderate priceIt is expensive than Power BI
Need a business/private email to open an accountNeed a business/private email to open an account
Uses DAX for Measures and Calculated columnsUses MDX for Measures and Dimensions
Connect limited Data Sources but increases it Data Source connections in Monthly updatesIt can connect to numerous Data Sources
Can handle large Datasets using Premium capacityCan handle large Datasets
It provides Account base subscriptionIt provides Key base subscription
Embedding report is easyEmbedding report is a Real time challenge
It is integrated with Microsoft Azure, which helps in analyzing the data and understanding the patterns of the productTableau has built-in machine learning capabilities which makes it suitable for doing ML operations on datasets
It supports R and Python language-based visualizationsIt provides full integrated support for R and Python

Which one to choose, Power BI or Tableau?

Data Analytics field has been changed over time from traditional bi practice embedded bi and collaborative bi. Initially, data analytics led by companies like IBM, Oracle, SAP but now this is not a situation. Now, this led by companies like Microsoft & Tableau because of their features like Embedded BI Collaborative BI, Data Blending, Multi Data Source Connection.

Both Power BI and Tableau have their own Pros and Cons. The right product can be chosen based on touchstones & priority.

TouchstonesPower BITableau
DescriptionA cloud-based business intelligence platform which offers an overview of critical dataA collection of intuitive business intelligence tools used for data discovery
VisualizationProvides various visualizationsProvides a larger set of visualizations than Power BI
OS SupportOnly WindowsWindows and Macintosh OS
Graphical featuresRegular charts, graphs, and mapsAny category of charts, bars, and graphs
CostCheaperCostly
OrganizationSuitable for Small, Medium & Large type of OrganizationSuitable for Medium & Large type of Organization

What is Power BI? 

Power BI is a Data Visualization & Business Intelligence tools that offer us to connect to single or multiple data sources and convert that connected raw data into impressive visual and share insights across an organization. It also offers us to embed the report into our application or website.

Product Suite of Power BI:

  • Power BI Desktop 
    • Free to download and install.
    • Connect and access various of on-prem and cloud sources like Excel, CSV/Text files, Azure, SharePoint, Dynamics CRM, etc.
    • Prepare by mashing up the data and create a data model using power query which uses M Query Language
    • After loading data to Power BI Desktop can establish the relationship between tables.
    • Create calculated measures, columns, and tables using Data Analysis eXpression(DAX).
    • Drag & drop interactive visuals on to pages using calculated measures and columns.
    • Publish to Power BI Web Service.
  • Power BI Service
    • This is one of the ways to embed the reports within the Website under an organization.
    • In the Power BI service forum, there are a collection of sections like Workspace, Dashboards, Reports, and Datasets.
    • Can create our own workspace as My-Workspace which helps to maintain personal work in Power BI Service.
    • Can pin number of Reports to a Dashboard to get together a number of meaningful Datasets for clear insight.
    • In this we can interact with our data with the help of Q&A {natural language query.}
  • Power BI Report Server
    • This is one of the products to allow businesses to host Power BI reports on an on-premise report server.
    • Can use the server to host paginated reports, KPI’s, Mobile reports and Excel workbook.
    • Shared data sets and shared data sources are in their own folders, to use as building blocks for the reports.

  • Power BI Mobile
    • Over Power BI provides mobile app services for IOS, Android and Windows 10 mobile devices.
    • In the mobile app, you can connect to and interact with your cloud and on-premise data.
    • It is very convenient to manage dashboard and reports on the go with your mobile app to stay connected and being on the same page with the organization.
  • On-Premise Gateway
    • This is a bridge to connect your on-premise data to online services like Power BI, Microsoft flow, Logic App’s and Power App’s services, we can use a single gateway with different services at the same time.

e.g.: – If you are using Power BI as well as Power App’s, a single gateway can be used for both which is dependent on the account you signed with it.

  • The on-premises data gateway implements data compression and transport encryption in all modes.
  • On-premises data gateway is supported only on 64-bit Windows operating system.
  • Multiple users can be share and reuse a gateway in this mode.
  • For Power BI, this includes support for schedule refresh and Direct Query.

What is Tableau?

Tableau is a Business Intelligence & Data Visualization Tool that used to analyze our data visually. Users can create and share interactive reports & dashboards using it. It offers Data Blending to users to Connect multiple data sources.

Product Suite of Tableau:

  • Tableau Server
    • Tableau Server is an enterprise-wide visuals analytics platform for creating interactive dashboards.
    • It is essentially an online hosting platform to hold your entire tableau Workbooks, Data sources and more.
    • Being the product of tableau, you can use the functionality of tableau without needing to always be downloading and opening workbooks with tableau desktop.
    • Can give security level permission to different work in an organization to determine who can access and interact with what.
    • As a tableau server user, you will be able to access UpToDate content and gain quick insight without relying on static distributed content.   
  • Tableau Desktop
    • This is a downloadable on-premise application for Computers and it is used for developing visualization in the form of sheets, Dashboards, and Stories.
    • There are some useful functionalities of tableau desktop are: Data transformation, Creating Data Sources, Creating Extracts and Publishing Visualizations on tableau server.
    • Tableau desktop produces files with extensions twb and twbx.
    • It is a licensed product but comes with two weeks of the trial period.
    • Starting from creating reports and charts to combining them to form a dashboard, all this work is done in tableau desktop.
  • Tableau Prep
    • Tableau Prep is a personal data preparation tool that empowers the user with the ability to cleanse aggregate, merge or otherwise prepare their data for analysis in tableau.
    • Tableau Prep has a simple and clean user interface that looks and feels like a final form of tableau desktop data sources screen.
    • In Tableau Prep the data is stored in flow pane manner with has universal unique identifier [UUID] which can store big data sets in a secure way.
  • Tableau Reader
    • Tableau Reader is a free desktop application that you can use to open with data visualizations built in tableau desktop.
    • It required reading and interacting with tableau packaged workbooks.
    • Tableau reader has the ability to retain interaction with visualization created in tableau desktop but will not allow connections to data which can be refreshed.
    • It only supports to read tableau data files; without the reader, you may need to share it publicly or convert the workbook into a pdf format.
  • Tableau Online
    • Tableau online is an analytics platform which is fully hosted in the cloud.
    • It can publish Dashboards and share your discoveries with anyone.
    • It has a facility to empower your organization to ask any question from any published data source using natural language.
    • It can connect to any cloud databases at any time anywhere and it can automatically refresh the data from Web-App like Google analytics and salesforce.
    • It empowers site admins to easily manage authentication and permissions for users, content, and data.
  • Tableau Public
    • This is a free service that lets anyone public interactive data visualizations to the web.
    • Visualizations are created in the accompanying app Tableau Desktop Public edition which required no programming skills.
    • It is for anyone who’s interested in understanding data and sharing those discoveries as a data visualization with the world.
    • It has some features highlights those are: – Heat Maps, Transparent sheets, Automatic Mobile Layouts, and Google Sheets.
    • As visualization are public so anyone can access the data and make a change by downloading the workbook so it is totally unsecured.
    • It has limitations of 15,000,000 rows of data per workbook.
    • It has 10GB of storage space for your workbook which is kind of limitation towards storage.
    • It supports Python with Tableau public called ‘Tabpy’, A new API that enables evaluation of python code within a tableau workbook  

Here is a link to Tableau

Strengths & Weakness of Power BI:

Strengths:

  • Free Power BI Desktop application for authors to develop reports
  • Uses DAX expressions for data calculations
  • Free Training Modules available for users
  • Composite Model (Direct Query, Dual, and Import) to connect dispersed multiple data sources and create a model
  • Multiple visuals in a single page
  • Also has Drill Down-Drill Up in visuals, Drill through pages, Toggle page or visual using Bookmarks, selection pane & buttons
  • Ability to connect multiple data sources
  • It is affordable desktop – free and pro (Power BI Service to share and collaborate with other users in the organization) – $9.99
  • Can integrate with Cortana – Windows Personal Voice Assistant
  • Power BI has integrated with all Microsoft products (Azure, SharePoint, Office 365, Microsoft Dynamics, Power Apps, Microsoft Flow)
  • Dataflow in power BI Service to connect to Azure Data lake storage 2 and other online services.

Weakness:

  • It is difficult for users who do not have knowledge of Excel
  • Clients who use large data sets must opt for Premium Capacity services to avoid unpleasant experience with datasets and its users with performance and timeouts issues
  • Power BI service compatible with few database drivers
  • Power BI has got a large set of product options which make it complex to understand, which option is best suited for a Business.


Strengths & Weakness of Tableau:

Strengths:

  • Tableau provides much beautiful visualization for which it stood top in the market among all BI tools.
  • Quickly combine shape, & clean the data for analysis.
  • It provides Data Blending.
  • Capable of Drill Down-Drill Up in visuals, Drill through pages and filters.
  • It can handle a large amount of data.
  • Uses Scripting languages such as R & Python to avoid performance and for complex table calculations.
  • Can build reports, dashboards, and story using Tableau Desktop.

Weakness:

  • Tableau is expensive when compared to other tools.
  • Scheduling or Notification of reports & dashboards.
  • Importing Custom Visualization is a bit difficult.
  • Complexity in embedding report to other applications.
  • Tableau is suitable for Huge organization which can pay for licensing cost.


Benefits of Power BI

  • Microsoft is a Brand. I hope everyone remembers the school or college days, the time when we started learning and using Microsoft products as they are very simple to understand and user-friendly. Hence, obvious that our eyes and brain are trained on all Microsoft products.
  • One who has working experience excel can easily cope up with Power BI Desktop & Mobile in no time.
  • Pin the visual available in Excel to Power BI Service Using Excel Add-on.
  • Once can build swift & reliable reports by simply drag and drop both inbuilt/custom visuals and this URL for Best practices to make an optimum performance for the report.
  • Accessibility of Colossal Learning Assets available Guided Learning in this URL.
  • As Power BI belongs to Microsoft family, hence it has privileged with Single Sign-On (SSO) and also tight integration with Microsoft products like Dynamics 365, Office 365, SharePoint Online, Power Apps, MS Flow, Azure SQL Database, Azure SQL Data warehouse, Azure Analysis server database… etc.
  • Power Query Many options related to wrangling and clean the data bring it as a perfect data model.
  • Post publishing the data into Power BI web service can schedule refresh without manual intervention.
  • Power BI backed superpower of with Artificial intelligence and Machine learning
  • Microsoft introduced Power Platform (Power BI to Measure, Power Apps to Act & Microsoft Flow to automate) and you can find more details in this URL.
  • Forthcoming Road Map provided for Power BI by Microsoft available in this URL.
  • Power BI is integrated with both Python and R coding to use visualizations.
  • Power BI Desktop Free – $0.00 & Power BI Web Service (Azure) Pro – $9.99 Monthly

Disadvantages of Power BI

Power BI desktop is the best tool to analyze your data while you connect using Direct query (or) Live connections and might struggle handle huge if you import data into the application and at times it might get hung or simply crashes. However, in future monthly updates, Microsoft Product team will surely resolve this problem.

Benefits of Tableau

  • Tableau can connect various sources, can effortlessly handle huge data and is a very good tool for Data visualization and create dashboards by simply drag and drop.
  • Tableau supports Python and R languages for creating visuals.
  • Tableau has spent its term as Leader in Gartner’s report URL from 2012 – 2018 and now moved to second place.

Disadvantages of Tableau

Tableau Creator – $70.00 & Tableau Online – $35 Monthly

  • Tableau product team has not concentrated advanced technologies missed integrated with Artificial intelligence and Machine learning.
  • Once pushed the reports to tableau online, it does not support scheduled refresh and one must refresh the data manually.
  • Analyst must use only inbuilt visual available in Tableau and no option to import custom visuals from the portal. Instead, according to the requirement developers need to create custom visuals by themselves.
  • To create a data model, data preparation options in Tableau is limited. For advance data wrangling and cleaning one must take the help of other tools like Excel, Python, R, or Tableau Prep.
  • There is integration with other Microsoft products like Dynamics 365, Office 365, Power Apps, Microsoft Flow which uses Single Sign-On (SSO).

Power BI & Tableau are most happening BI tools among all tools in business intelligence because of their features and capabilities like Embedded BI, Data Blending, Multi Data Source connection like Cloud databases and on-premise databases. They make sharing of reports and dashboards for the users, easy. Business Analyst without even having to access these tools can access reports & dashboards and take critical business decisions.

These two tools stood top in the BI market because of the attractive visualizations available. Power BI offers a feature of import of custom visual and creation of custom visual which is its beauty. These facts have made these BI tools most happening BI tools in the market till the date.

According to Gartner Magic Quadrant for Analytics and Business Intelligence Platforms report, the 1st choice is Power BI and  2nd top choice is Tableau in BI Tool in the present market.

Data Analysis Expressions (DAX) is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.

In Multidimensional Expressions (MDX), a measure is a named DAX expression that is resolved by calculating the expression to return a value in a Tabular Model. This innocuous definition covers an incredible amount of ground.

The World Population Dashboard App

Pattern Matching Callbacks – Creating different charts for Data Visualization with callbacks

import dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output, ALL, State, MATCH, ALLSMALLER
import plotly.express as px
import pandas as pd
import numpy as np
df = pd.read_csv("Documents/Data Science/population.csv")
print(df)
           country    year    population
0             China  2020.0  1.439324e+09
1             China  2019.0  1.433784e+09
2             China  2018.0  1.427648e+09
3             China  2017.0  1.421022e+09
4             China  2016.0  1.414049e+09
...             ...     ...           ...
4180  United States  1965.0  1.997337e+08
4181  United States  1960.0  1.867206e+08
4182  United States  1955.0  1.716853e+08
4183          India  1960.0  4.505477e+08
4184          India  1955.0  4.098806e+08

[4185 rows x 3 columns]
# dropping null values
df = df.dropna()
print(df.head(10))
  country    year    population
0   China  2020.0  1.439324e+09
1   China  2019.0  1.433784e+09
2   China  2018.0  1.427648e+09
3   China  2017.0  1.421022e+09
4   China  2016.0  1.414049e+09
5   China  2015.0  1.406848e+09
6   China  2010.0  1.368811e+09
7   China  2005.0  1.330776e+09
8   China  2000.0  1.290551e+09
9   China  1995.0  1.240921e+09
app = dash.Dash(__name__)
app.layout = html.Div([
    html.H1("The World Population Dashboard with Dynamic Callbacks", style={"textAlign":"center"}),
    html.Hr(),
    html.P("Add as many charts for Data Visualization:"),
    html.Div(children=[
        html.Button('Add Chart', id='add-chart', n_clicks=0),
    ]),
    html.Div(id='container', children=[])
])
@app.callback(
    Output('container', 'children'),
    [Input('add-chart', 'n_clicks')],
    [State('container', 'children')]
)
def display_graphs(n_clicks, div_children):
    new_child = html.Div(
        style={'width': '45%', 'display': 'inline-block', 'outline': 'thin lightgrey solid', 'padding': 10},
        children=[
            dcc.Graph(
                id={
                    'type': 'dynamic-graph',
                    'index': n_clicks
                },
                figure={}
            ),
            dcc.RadioItems(
                id={
                    'type': 'dynamic-choice',
                    'index': n_clicks
                },
                options=[{'label': 'Bar Chart', 'value': 'bar'},
                         {'label': 'Line Chart', 'value': 'line'},
                         {'label': 'Scatter Chart', 'value': 'scatter'},
                         {'label': 'Pie Chart', 'value': 'pie'}],
                value='bar',
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-s',
                    'index': n_clicks
                },
                options=[{'label': s, 'value': s} for s in np.sort(df['country'].unique())],
                multi=True,
                value=["United States", "China"],
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-ctg',
                    'index': n_clicks
                },
                options=[{'label': c, 'value': c} for c in ['country']],
                value='country',
                clearable=False
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-num',
                    'index': n_clicks
                },
                options=[{'label': n, 'value': n} for n in ['population']],
                value='population',
                clearable=False
            )
            
        ]
    )
    div_children.append(new_child)
    return div_children
html.Br()
@app.callback(
    Output({'type': 'dynamic-graph', 'index': MATCH}, 'figure'),
    [Input(component_id={'type': 'dynamic-dpn-s', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-ctg', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-num', 'index': MATCH}, component_property='value'),
     Input({'type': 'dynamic-choice', 'index': MATCH}, 'value')]
)
def update_graph(s_value, ctg_value, num_value, chart_choice):
    print(s_value)
    dff = df[df['country'].isin(s_value)]

    if chart_choice == 'bar':
        dff = dff.groupby([ctg_value], as_index=False)[['population']].sum()
        fig = px.bar(dff, x='country', y=num_value)
        return fig
    elif chart_choice == 'line':
        if len(s_value) == 0:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.line(dff, x='year', y=num_value, color=ctg_value)
            return fig
    elif chart_choice == 'scatter':
        if len(s_value) == 1:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.scatter(dff, x='year', y=num_value, color=ctg_value)
            return fig    
    elif chart_choice == 'pie':
        fig = px.pie(dff, names=ctg_value, values=num_value)
        return fig
if __name__ == '__main__':
    app.run_server(debug=False)
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_favicon.ico?v=2.0.0 HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_dash-component-suites/dash/dcc/async-graph.js HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_dash-component-suites/dash/dcc/async-plotlyjs.js HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:37:47] "GET /_dash-component-suites/dash/dcc/async-dropdown.js HTTP/1.1" 200 -
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:37:48] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:01] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:01] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:01] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:04] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:04] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:04] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
['United States', 'China']
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:04] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:05] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:05] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:05] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
['United States', 'China']
['United States', 'China']
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:05] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:05] "POST /_dash-update-component HTTP/1.1" 200 -
127.0.0.1 - - [21/Nov/2021 04:38:09] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:18] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:24] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China']
127.0.0.1 - - [21/Nov/2021 04:38:39] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India']
127.0.0.1 - - [21/Nov/2021 04:38:53] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria']
127.0.0.1 - - [21/Nov/2021 04:39:14] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria', 'Russia']
127.0.0.1 - - [21/Nov/2021 04:39:26] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India']
127.0.0.1 - - [21/Nov/2021 04:39:30] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria']
127.0.0.1 - - [21/Nov/2021 04:39:37] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria', 'Russia']
127.0.0.1 - - [21/Nov/2021 04:39:51] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India']
127.0.0.1 - - [21/Nov/2021 04:40:06] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria']
127.0.0.1 - - [21/Nov/2021 04:40:12] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria', 'Russia']
127.0.0.1 - - [21/Nov/2021 04:40:20] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India']
127.0.0.1 - - [21/Nov/2021 04:40:26] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria']
127.0.0.1 - - [21/Nov/2021 04:40:33] "POST /_dash-update-component HTTP/1.1" 200 -
['United States', 'China', 'India', 'Nigeria', 'Russia']

An approach to Web Scraping in Python with BeautifulSoup

There are mainly two ways to extract data from a website:

Use the API of the website (if it exists). For example, Facebook has the Facebook Graph API which allows retrieval of data posted on Facebook. Access the HTML of the webpage and extract useful information/data from it. This technique is called web scraping or web harvesting or web data extraction. This article discusses the steps involved in web scraping using the implementation of a Web Scraping framework of Python called Beautiful Soup.

Steps involved in web scraping:

Send an HTTP request to the URL of the webpage you want to access. The server responds to the request by returning the HTML content of the webpage. For this task, we will use a third-party HTTP library for python-requests. Once we have accessed the HTML content, we are left with the task of parsing the data. Since most of the HTML data is nested, we cannot extract data simply through string processing. One needs a parser which can create a nested/tree structure of the HTML data. There are many HTML parser libraries available but the most advanced one is html5lib. Now, all we need to do is navigating and searching the parse tree that we created, i.e. tree traversal. For this task, we will be using another third-party python library, Beautiful Soup. It is a Python library for pulling data out of HTML and XML files.

Step 1: Installing the required third-party libraries

Easiest way to install external libraries in python is to use pip. pip is a package management system used to install and manage software packages written in Python. All you need to do is: # In your CMD.exe Prompt on Anaconda pip install requests pip install html5lib pip install bs4

Step 2: Accessing the HTML content from webpage

import requests URL = “https://www.geeksforgeeks.org/data-structures/” r = requests.get(URL) print(r.content) #the output is large so you don’t need to run it.

Let us try to understand this piece of code.

First of all import the requests library. Then, specify the URL of the webpage you want to scrape. Send a HTTP request to the specified URL and save the response from server in a response object called r. Now, as print r.content to get the raw HTML content of the webpage. It is of ‘string’ type.

Step 3: Parsing the HTML content

#This will not run on online IDE
import requests
from bs4 import BeautifulSoup

URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)

soup = BeautifulSoup(r.content, 'html5lib') # If this line causes an error, run 'pip install html5lib' or install html5lib
print(soup.prettify())
<!DOCTYPE html>
<html class="no-js" dir="ltr" lang="en-US">
 <head>
  <title>
   Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com
  </title>
  <meta charset="utf-8"/>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width,initial-scale=1.0" name="viewport"/>
  <meta content="The Foundation for a Better Life | Pass It On.com" name="description"/>
  <link href="/apple-touch-icon.png" rel="apple-touch-icon" sizes="180x180"/>
  <link href="/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
  <link href="/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
  <link href="/site.webmanifest" rel="manifest"/>
  <link color="#c8102e" href="/safari-pinned-tab.svg" rel="mask-icon"/>
  <meta content="#c8102e" name="msapplication-TileColor"/>
  <meta content="#ffffff" name="theme-color"/>
  
--------------
--------------
     
  https://cdnjs.cloudflare.com/ajax/libs/jquery/1.12.4/jquery.js
  https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js
  https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js
  /assets/pofo-5ce91fa595b7ba2b9d68d1e2ef84e002.js
 </body>
</html>

A really nice thing about the BeautifulSoup library is that it is built on the top of the HTML parsing libraries like html5lib, lxml, html.parser, etc. So BeautifulSoup object and specify the parser library can be created at the same time.

In the example above,

soup = BeautifulSoup(r.content, 'html5lib')

We create a BeautifulSoup object by passing two arguments:

r.content : It is the raw HTML content. html5lib : Specifying the HTML parser we want to use. Now soup.prettify() is printed, it gives the visual representation of the parse tree created from the raw HTML content.

Step 4: Searching and navigating through the parse tree

Now, we would like to extract some useful data from the HTML content. The soup object contains all the data in the nested structure which could be programmatically extracted. In our example, we are scraping a webpage consisting of some quotes. So, we would like to create a program to save those quotes (and all relevant information about them).

#Python program to scrape website
#and save quotes from website
import requests
from bs4 import BeautifulSoup
import csv

URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)

soup = BeautifulSoup(r.content, 'html5lib')

quotes=[] # a list to store quotes

table = soup.find('div', attrs = {'id':'all_quotes'})

for row in table.findAll('div',
						attrs = {'class':'col-6 col-lg-3 text-center margin-30px-bottom sm-margin-30px-top'}):
	quote = {}
	quote['theme'] = row.h5.text
	quote['url'] = row.a['href']
	quote['img'] = row.img['src']
	quote['lines'] = row.img['alt'].split(" #")[0]
	quote['author'] = row.img['alt'].split(" #")[1]
	quotes.append(quote)

filename = 'inspirational_quotes.csv'
with open(filename, 'w', newline='') as f:
	w = csv.DictWriter(f,['theme','url','img','lines','author'])
	w.writeheader()
	for quote in quotes:
		w.writerow(quote)

Before moving on, we recommend you to go through the HTML content of the webpage which we printed using soup.prettify() method and try to find a pattern or a way to navigate to the quotes.

It is noticed that all the quotes are inside a div container whose id is ‘all_quotes’. So, we find that div element (termed as table in above code) using find() method :

table = soup.find('div', attrs = {'id':'all_quotes'}) 

The first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the additional attributes associated with that tag. find() method returns the first matching element. You can try to print table.prettify() to get a sense of what this piece of code does.

Now, in the table element, one can notice that each quote is inside a div container whose class is quote. So, we iterate through each div container whose class is quote. Here, we use findAll() method which is similar to find method in terms of arguments but it returns a list of all matching elements. Each quote is now iterated using a variable called row. Here is one sample row HTML content for better understanding: Now consider this piece of code:

for row in table.find_all_next('div', attrs = {'class': 'col-6 col-lg-3 text-center margin-30px-bottom sm-margin-30px-top'}):
    quote = {}
    quote['theme'] = row.h5.text
    quote['url'] = row.a['href']
    quote['img'] = row.img['src']
    quote['lines'] = row.img['alt'].split(" #")[0]
    quote['author'] = row.img['alt'].split(" #")[1]
    quotes.append(quote)

We create a dictionary to save all information about a quote. The nested structure can be accessed using dot notation. To access the text inside an HTML element, we use .text :

quote['theme'] = row.h5.text

We can add, remove, modify and access a tag’s attributes. This is done by treating the tag as a dictionary:

quote['url'] = row.a['href']

Lastly, all the quotes are appended to the list called quotes.

Finally, we would like to save all our data in some CSV file.

filename = 'documents/inspirational_quotes.csv'
with open(filename, 'w', newline='') as f:
    w = csv.DictWriter(f,['theme','url','img','lines','author'])
    w.writeheader()
    for quote in quotes:
        w.writerow(quote)

Here we create a CSV file called inspirational_quotes.csv and save all the quotes in it for any further use.

Data Visualization Using Python

In this example we’ll perform different Data Visualization charts on Population Data. There’s an easy way to create visuals directly from Pandas, and we’ll see how it works in detail in this tutorial.

Install neccessary Libraries

To easily create interactive visualizations, we need to install Cufflinks. This is a library that connects Pandas with Plotly, so we can create visualizations directly from Pandas (in the past you had to learn workarounds to make them work together, but now it’s simpler) First, make sure you install Pandas and Plotly running the following commands on the terminal:

Install the following labraries in the this order – on Conda CMD prompt pip install pandas pip install plotly pip install cufflinks

Import the following Libraries

import pandas as pd
import cufflinks as cf
from IPython.display import display,HTML
cf.set_config_file(sharing='public',theme='ggplot',offline=True)

In this case, I’m using the ‘ggplot’ theme, but feel free to choose any theme you want. Run the command cf.getThemes() to get all the themes available. To make interactive visualizations with Pandas in the following sections, we only need to use the syntaxdataframe.iplot().

The data We’ll use a population dataframe. First, download the CSV file from Kaggle.com, move the file where your Python script is located, and then read it in a Pandas dataframe as shown below.

#Format year column to number with no decimals
df_population = pd.read_csv('documents/population/population.csv')
#use a list of indexes:
print(df_population.loc[[0,10]])
   country    year    population
0    China  2020.0  1.439324e+09
10   China  1990.0  1.176884e+09
print(df_population.head(10))
  country    year    population
0   China  2020.0  1.439324e+09
1   China  2019.0  1.433784e+09
2   China  2018.0  1.427648e+09
3   China  2017.0  1.421022e+09
4   China  2016.0  1.414049e+09
5   China  2015.0  1.406848e+09
6   China  2010.0  1.368811e+09
7   China  2005.0  1.330776e+09
8   China  2000.0  1.290551e+09
9   China  1995.0  1.240921e+09

This dataframe is almost ready for plotting, we just have to drop null values, reshape it and then select a couple of countries to test our interactive plots. The code shown below does all of this.

# dropping null values
df_population = df_population.dropna()
# reshaping the dataframe
df_population = df_population.pivot(index="year", columns="country", values="population")
# selecting 5 countries
df_population = df_population[['United States', 'India', 'China', 'Nigeria', 'Spain']]
print(df_population.head(10))
country  United States         India         China      Nigeria       Spain
year                                                                       
1955.0     171685336.0  4.098806e+08  6.122416e+08   41086100.0  29048395.0
1960.0     186720571.0  4.505477e+08  6.604081e+08   45138458.0  30402411.0
1965.0     199733676.0  4.991233e+08  7.242190e+08   50127921.0  32146263.0
1970.0     209513341.0  5.551898e+08  8.276014e+08   55982144.0  33883749.0
1975.0     219081251.0  6.231029e+08  9.262409e+08   63374298.0  35879209.0
1980.0     229476354.0  6.989528e+08  1.000089e+09   73423633.0  37698196.0
1985.0     240499825.0  7.843600e+08  1.075589e+09   83562785.0  38733876.0
1990.0     252120309.0  8.732778e+08  1.176884e+09   95212450.0  39202525.0
1995.0     265163745.0  9.639226e+08  1.240921e+09  107948335.0  39787419.0
2000.0     281710909.0  1.056576e+09  1.290551e+09  122283850.0  40824754.0

Lineplot

Let’s make a lineplot to compare how much the population has grown from 1955 to 2020 for the 5 countries selected. As mentioned before, we will use the syntax df_population.iplot(kind=’name_of_plot’) to make plots as shown below.

df_population.iplot(kind='line',xTitle='Years', yTitle='Population',
                    title='Population (1955-2020)')

Barplot

We can make a single barplot on barplots grouped by categories. Let’s have a look.

Single Barplot

Let’s create a barplot that shows the population of each country by the year 2020. To do so, first, we select the year 2020 from the index and then transpose rows with columns to get the year in the column. We’ll name this new dataframe df_population_2020 (we’ll use this dataframe again when plotting piecharts)

df_population_2020 = df_population[df_population.index.isin([2020])]
df_population_2020 = df_population_2020.T

Now we can plot this new dataframe with .iplot(). In this case, I’m going to set the bar color to blue using the color argument.

df_population_2020.iplot(kind='bar', color='blue',
                         xTitle='Years', yTitle='Population',
                         title='Population in 2020')

Barplot grouped by “n” variables

Now let’s see the evolution of the population at the beginning of each decade.

# filter years out
df_population_sample = df_population[df_population.index.isin([1980, 1990, 2000, 2010, 2020])]
# plotting
df_population_sample.iplot(kind='bar', xTitle='Years',
                           yTitle='Population')

Naturally, all of them increased their population throughout the years, but some did it at a faster rate.

Boxplot

Boxplots are useful when we want to see the distribution of the data. The boxplot will reveal the minimum value, first quartile (Q1), median, third quartile (Q3), and maximum value. The easiest way to see those values is by creating an interactive visualization. Let’s see the population distribution of the China.

df_population['China'].iplot(kind='box', color='green', 
                                     yTitle='Population')

Let’s say now we want to get the same distribution but for all the selected countries.

df_population.iplot(kind='box', xTitle='Countries',
                    yTitle='Population')

As we can see, we can also filter out any country by clicking on the legends on the right.

Histogram

A histogram represents the distribution of numerical data. Let’s see the population distribution of the USA and Nigeria.

df_population[['United States', 'Nigeria']].iplot(kind='hist',
                                                xTitle='Population')

Piechart

Let’s compare the population by the year 2020 again but now with a piechart. To do so, we’ll use the df_population_2020 dataframe created in the “Single Barplot” section. However, to make a piechart we need the “country” as a column and not as an index, so we use .reset_index() to get the column back. Then we transform the 2020 into a string.

# transforming data
df_population_2020 = df_population_2020.reset_index()
df_population_2020 =df_population_2020.rename(columns={2020:'2020'})
# plotting
df_population_2020.iplot(kind='pie', labels='country',
                         values='2020',
                         title='Population in 2020 (%)')

Scatterplot

Although population data is not suitable for a scatterplot (the data follows a common pattern), I would make this plot for the purposes of this guide. Making a scatterplot is similar to a line plot, but we have to add the mode argument.

df_population.iplot(kind='scatter', mode='markers')

Whaola! Now you’re ready to make your own beautiful interactive visualization with Pandas.