Differences between Power BI and Tableau

Power BI is a Data Visualization and Business Intelligence tool provided by Microsoft. It can collect data from different data sources like Excel spreadsheets, on-premise database, cloud database and convert them into meaningful reports and dashboards. Its features such as creating quick insights, Q&A, Embedded Report, and Self Service BI made it top among all BI tools. It is also robust and always ready for extensive modeling and real-time analytics, as well as custom visual development.

Tableau offers business analysts to take business decisions by its feature, data visualization available to all business users of any background. It can establish a connection with any data source (Excel, local/on-premise database, cloud database).

Tableau is the fastest growing Data Visualization Tool among all visualization tools. Its visualizations are created as worksheets and dashboards. The beauty of tableau is that it does not require any technical or programming knowledge to create or develop reports and dashboards.

Power BI vs. Tableau:

Power BITableau
It is provided by MicrosoftIt is provided by Tableau
It is available at a moderate priceIt is expensive than Power BI
Need a business/private email to open an accountNeed a business/private email to open an account
Uses DAX for Measures and Calculated columnsUses MDX for Measures and Dimensions
Connect limited Data Sources but increases it Data Source connections in Monthly updatesIt can connect to numerous Data Sources
Can handle large Datasets using Premium capacityCan handle large Datasets
It provides Account base subscriptionIt provides Key base subscription
Embedding report is easyEmbedding report is a Real time challenge
It is integrated with Microsoft Azure, which helps in analyzing the data and understanding the patterns of the productTableau has built-in machine learning capabilities which makes it suitable for doing ML operations on datasets
It supports R and Python language-based visualizationsIt provides full integrated support for R and Python

Which one to choose, Power BI or Tableau?

Data Analytics field has been changed over time from traditional bi practice embedded bi and collaborative bi. Initially, data analytics led by companies like IBM, Oracle, SAP but now this is not a situation. Now, this led by companies like Microsoft & Tableau because of their features like Embedded BI Collaborative BI, Data Blending, Multi Data Source Connection.

Both Power BI and Tableau have their own Pros and Cons. The right product can be chosen based on touchstones & priority.

TouchstonesPower BITableau
DescriptionA cloud-based business intelligence platform which offers an overview of critical dataA collection of intuitive business intelligence tools used for data discovery
VisualizationProvides various visualizationsProvides a larger set of visualizations than Power BI
OS SupportOnly WindowsWindows and Macintosh OS
Graphical featuresRegular charts, graphs, and mapsAny category of charts, bars, and graphs
CostCheaperCostly
OrganizationSuitable for Small, Medium & Large type of OrganizationSuitable for Medium & Large type of Organization

What is Power BI? 

Power BI is a Data Visualization & Business Intelligence tools that offer us to connect to single or multiple data sources and convert that connected raw data into impressive visual and share insights across an organization. It also offers us to embed the report into our application or website.

Product Suite of Power BI:

  • Power BI Desktop 
    • Free to download and install.
    • Connect and access various of on-prem and cloud sources like Excel, CSV/Text files, Azure, SharePoint, Dynamics CRM, etc.
    • Prepare by mashing up the data and create a data model using power query which uses M Query Language
    • After loading data to Power BI Desktop can establish the relationship between tables.
    • Create calculated measures, columns, and tables using Data Analysis eXpression(DAX).
    • Drag & drop interactive visuals on to pages using calculated measures and columns.
    • Publish to Power BI Web Service.
  • Power BI Service
    • This is one of the ways to embed the reports within the Website under an organization.
    • In the Power BI service forum, there are a collection of sections like Workspace, Dashboards, Reports, and Datasets.
    • Can create our own workspace as My-Workspace which helps to maintain personal work in Power BI Service.
    • Can pin number of Reports to a Dashboard to get together a number of meaningful Datasets for clear insight.
    • In this we can interact with our data with the help of Q&A {natural language query.}
  • Power BI Report Server
    • This is one of the products to allow businesses to host Power BI reports on an on-premise report server.
    • Can use the server to host paginated reports, KPI’s, Mobile reports and Excel workbook.
    • Shared data sets and shared data sources are in their own folders, to use as building blocks for the reports.

  • Power BI Mobile
    • Over Power BI provides mobile app services for IOS, Android and Windows 10 mobile devices.
    • In the mobile app, you can connect to and interact with your cloud and on-premise data.
    • It is very convenient to manage dashboard and reports on the go with your mobile app to stay connected and being on the same page with the organization.
  • On-Premise Gateway
    • This is a bridge to connect your on-premise data to online services like Power BI, Microsoft flow, Logic App’s and Power App’s services, we can use a single gateway with different services at the same time.

e.g.: – If you are using Power BI as well as Power App’s, a single gateway can be used for both which is dependent on the account you signed with it.

  • The on-premises data gateway implements data compression and transport encryption in all modes.
  • On-premises data gateway is supported only on 64-bit Windows operating system.
  • Multiple users can be share and reuse a gateway in this mode.
  • For Power BI, this includes support for schedule refresh and Direct Query.

What is Tableau?

Tableau is a Business Intelligence & Data Visualization Tool that used to analyze our data visually. Users can create and share interactive reports & dashboards using it. It offers Data Blending to users to Connect multiple data sources.

Product Suite of Tableau:

  • Tableau Server
    • Tableau Server is an enterprise-wide visuals analytics platform for creating interactive dashboards.
    • It is essentially an online hosting platform to hold your entire tableau Workbooks, Data sources and more.
    • Being the product of tableau, you can use the functionality of tableau without needing to always be downloading and opening workbooks with tableau desktop.
    • Can give security level permission to different work in an organization to determine who can access and interact with what.
    • As a tableau server user, you will be able to access UpToDate content and gain quick insight without relying on static distributed content.   
  • Tableau Desktop
    • This is a downloadable on-premise application for Computers and it is used for developing visualization in the form of sheets, Dashboards, and Stories.
    • There are some useful functionalities of tableau desktop are: Data transformation, Creating Data Sources, Creating Extracts and Publishing Visualizations on tableau server.
    • Tableau desktop produces files with extensions twb and twbx.
    • It is a licensed product but comes with two weeks of the trial period.
    • Starting from creating reports and charts to combining them to form a dashboard, all this work is done in tableau desktop.
  • Tableau Prep
    • Tableau Prep is a personal data preparation tool that empowers the user with the ability to cleanse aggregate, merge or otherwise prepare their data for analysis in tableau.
    • Tableau Prep has a simple and clean user interface that looks and feels like a final form of tableau desktop data sources screen.
    • In Tableau Prep the data is stored in flow pane manner with has universal unique identifier [UUID] which can store big data sets in a secure way.
  • Tableau Reader
    • Tableau Reader is a free desktop application that you can use to open with data visualizations built in tableau desktop.
    • It required reading and interacting with tableau packaged workbooks.
    • Tableau reader has the ability to retain interaction with visualization created in tableau desktop but will not allow connections to data which can be refreshed.
    • It only supports to read tableau data files; without the reader, you may need to share it publicly or convert the workbook into a pdf format.
  • Tableau Online
    • Tableau online is an analytics platform which is fully hosted in the cloud.
    • It can publish Dashboards and share your discoveries with anyone.
    • It has a facility to empower your organization to ask any question from any published data source using natural language.
    • It can connect to any cloud databases at any time anywhere and it can automatically refresh the data from Web-App like Google analytics and salesforce.
    • It empowers site admins to easily manage authentication and permissions for users, content, and data.
  • Tableau Public
    • This is a free service that lets anyone public interactive data visualizations to the web.
    • Visualizations are created in the accompanying app Tableau Desktop Public edition which required no programming skills.
    • It is for anyone who’s interested in understanding data and sharing those discoveries as a data visualization with the world.
    • It has some features highlights those are: – Heat Maps, Transparent sheets, Automatic Mobile Layouts, and Google Sheets.
    • As visualization are public so anyone can access the data and make a change by downloading the workbook so it is totally unsecured.
    • It has limitations of 15,000,000 rows of data per workbook.
    • It has 10GB of storage space for your workbook which is kind of limitation towards storage.
    • It supports Python with Tableau public called ‘Tabpy’, A new API that enables evaluation of python code within a tableau workbook  

Here is a link to Tableau

Strengths & Weakness of Power BI:

Strengths:

  • Free Power BI Desktop application for authors to develop reports
  • Uses DAX expressions for data calculations
  • Free Training Modules available for users
  • Composite Model (Direct Query, Dual, and Import) to connect dispersed multiple data sources and create a model
  • Multiple visuals in a single page
  • Also has Drill Down-Drill Up in visuals, Drill through pages, Toggle page or visual using Bookmarks, selection pane & buttons
  • Ability to connect multiple data sources
  • It is affordable desktop – free and pro (Power BI Service to share and collaborate with other users in the organization) – $9.99
  • Can integrate with Cortana – Windows Personal Voice Assistant
  • Power BI has integrated with all Microsoft products (Azure, SharePoint, Office 365, Microsoft Dynamics, Power Apps, Microsoft Flow)
  • Dataflow in power BI Service to connect to Azure Data lake storage 2 and other online services.

Weakness:

  • It is difficult for users who do not have knowledge of Excel
  • Clients who use large data sets must opt for Premium Capacity services to avoid unpleasant experience with datasets and its users with performance and timeouts issues
  • Power BI service compatible with few database drivers
  • Power BI has got a large set of product options which make it complex to understand, which option is best suited for a Business.


Strengths & Weakness of Tableau:

Strengths:

  • Tableau provides much beautiful visualization for which it stood top in the market among all BI tools.
  • Quickly combine shape, & clean the data for analysis.
  • It provides Data Blending.
  • Capable of Drill Down-Drill Up in visuals, Drill through pages and filters.
  • It can handle a large amount of data.
  • Uses Scripting languages such as R & Python to avoid performance and for complex table calculations.
  • Can build reports, dashboards, and story using Tableau Desktop.

Weakness:

  • Tableau is expensive when compared to other tools.
  • Scheduling or Notification of reports & dashboards.
  • Importing Custom Visualization is a bit difficult.
  • Complexity in embedding report to other applications.
  • Tableau is suitable for Huge organization which can pay for licensing cost.


Benefits of Power BI

  • Microsoft is a Brand. I hope everyone remembers the school or college days, the time when we started learning and using Microsoft products as they are very simple to understand and user-friendly. Hence, obvious that our eyes and brain are trained on all Microsoft products.
  • One who has working experience excel can easily cope up with Power BI Desktop & Mobile in no time.
  • Pin the visual available in Excel to Power BI Service Using Excel Add-on.
  • Once can build swift & reliable reports by simply drag and drop both inbuilt/custom visuals and this URL for Best practices to make an optimum performance for the report.
  • Accessibility of Colossal Learning Assets available Guided Learning in this URL.
  • As Power BI belongs to Microsoft family, hence it has privileged with Single Sign-On (SSO) and also tight integration with Microsoft products like Dynamics 365, Office 365, SharePoint Online, Power Apps, MS Flow, Azure SQL Database, Azure SQL Data warehouse, Azure Analysis server database… etc.
  • Power Query Many options related to wrangling and clean the data bring it as a perfect data model.
  • Post publishing the data into Power BI web service can schedule refresh without manual intervention.
  • Power BI backed superpower of with Artificial intelligence and Machine learning
  • Microsoft introduced Power Platform (Power BI to Measure, Power Apps to Act & Microsoft Flow to automate) and you can find more details in this URL.
  • Forthcoming Road Map provided for Power BI by Microsoft available in this URL.
  • Power BI is integrated with both Python and R coding to use visualizations.
  • Power BI Desktop Free – $0.00 & Power BI Web Service (Azure) Pro – $9.99 Monthly

Disadvantages of Power BI

Power BI desktop is the best tool to analyze your data while you connect using Direct query (or) Live connections and might struggle handle huge if you import data into the application and at times it might get hung or simply crashes. However, in future monthly updates, Microsoft Product team will surely resolve this problem.

Benefits of Tableau

  • Tableau can connect various sources, can effortlessly handle huge data and is a very good tool for Data visualization and create dashboards by simply drag and drop.
  • Tableau supports Python and R languages for creating visuals.
  • Tableau has spent its term as Leader in Gartner’s report URL from 2012 – 2018 and now moved to second place.

Disadvantages of Tableau

Tableau Creator – $70.00 & Tableau Online – $35 Monthly

  • Tableau product team has not concentrated advanced technologies missed integrated with Artificial intelligence and Machine learning.
  • Once pushed the reports to tableau online, it does not support scheduled refresh and one must refresh the data manually.
  • Analyst must use only inbuilt visual available in Tableau and no option to import custom visuals from the portal. Instead, according to the requirement developers need to create custom visuals by themselves.
  • To create a data model, data preparation options in Tableau is limited. For advance data wrangling and cleaning one must take the help of other tools like Excel, Python, R, or Tableau Prep.
  • There is integration with other Microsoft products like Dynamics 365, Office 365, Power Apps, Microsoft Flow which uses Single Sign-On (SSO).

Power BI & Tableau are most happening BI tools among all tools in business intelligence because of their features and capabilities like Embedded BI, Data Blending, Multi Data Source connection like Cloud databases and on-premise databases. They make sharing of reports and dashboards for the users, easy. Business Analyst without even having to access these tools can access reports & dashboards and take critical business decisions.

These two tools stood top in the BI market because of the attractive visualizations available. Power BI offers a feature of import of custom visual and creation of custom visual which is its beauty. These facts have made these BI tools most happening BI tools in the market till the date.

According to Gartner Magic Quadrant for Analytics and Business Intelligence Platforms report, the 1st choice is Power BI and  2nd top choice is Tableau in BI Tool in the present market.

Data Analysis Expressions (DAX) is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.

In Multidimensional Expressions (MDX), a measure is a named DAX expression that is resolved by calculating the expression to return a value in a Tabular Model. This innocuous definition covers an incredible amount of ground.

Advertisement

How to Create a Python Dash Plotly Dashboard App

In this tutorial, I will discuss and go through a practical example on how to create a Python Dash Plotly App. I will create multiple charts for Data Visualization using Dynamic Callbacks which is also known as Pattern Matching Callbacks from Plotly.com. I will use data of The World Population to create the Dashboard App.

Introduction:

Pattern Matching Callbacks – Creating different charts for Data Visualization with callbacks. The users get much more power and control over the App. It gives the users much more flexibility to create callbacks for every set of inputs and outputs that doesn’t yet exist in the App.

MATCH will fire the callback when any of the component’s properties change. However, instead of passing all of the values into the callback, MATCH will pass just a single value into the callback. Instead of updating a single output, it will update the dynamic output that is “matched” with.

Install / Import Python necessary Libraries:

Let’s get started. Import the following libraries as listed below: I’m using Anaconda Jupyter Notebook, launch the CMD Prompt and install the following libraries if you don’t currently have them installed on your computer.

import dash     #pip install dash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output, ALL, State, MATCH, ALLSMALLER
import plotly.express as px   #pip install plotly==5.2.2
import pandas as pd     #pip install pandas 
import numpy as np      #pip install numpy

Get Data:

We then read in the Panda data frame file. I have download the file to my computer but you can get it from my Github repository link.

df = pd.read_csv("Documents/Data Science/population.csv")     #https://github.com/Valnjee/datascience/blob/master/population.csv
print(df)
           country    year    population
0             China  2020.0  1.439324e+09
1             China  2019.0  1.433784e+09
2             China  2018.0  1.427648e+09
3             China  2017.0  1.421022e+09
4             China  2016.0  1.414049e+09
...             ...     ...           ...
4180  United States  1965.0  1.997337e+08
4181  United States  1960.0  1.867206e+08
4182  United States  1955.0  1.716853e+08
4183          India  1960.0  4.505477e+08
4184          India  1955.0  4.098806e+08

[4185 rows x 3 columns]

Cleanse Data:

Make sure to clean the data by dropping all the null values.

# dropping null values
df = df.dropna()
print(df.head(10))
  country    year    population
0   China  2020.0  1.439324e+09
1   China  2019.0  1.433784e+09
2   China  2018.0  1.427648e+09
3   China  2017.0  1.421022e+09
4   China  2016.0  1.414049e+09
5   China  2015.0  1.406848e+09
6   China  2010.0  1.368811e+09
7   China  2005.0  1.330776e+09
8   China  2000.0  1.290551e+09
9   China  1995.0  1.240921e+09

Form and App Layout Design:

Here we design the layout in HTML with the button. Every option will go into the children.

app = dash.Dash(__name__)
app.layout = html.Div([
    html.H1("The World Population Dashboard with Dynamic Callbacks", style={"textAlign":"center"}),
    html.Hr(),
    html.P("Add as many charts for Data Visualization:"),
    html.Div(children=[
        html.Button('Add Chart', id='add-chart', n_clicks=0),
    ]),
    html.Div(id='container', children=[])
])

First Callback:

The new child is append to the div_children. Every click triggers the callback, then you get another child to append to the div_children with everything created in it. The dcc.RadioItems have options of 4 charts.

Output – displays the chart.

State – saves the input of the children.

@app.callback(
    Output('container', 'children'),
    [Input('add-chart', 'n_clicks')],
    [State('container', 'children')]
)
def display_graphs(n_clicks, div_children):
    new_child = html.Div(
        style={'width': '45%', 'display': 'inline-block', 'outline': 'thin lightgrey solid', 'padding': 10},
        children=[
            dcc.Graph(
                id={
                    'type': 'dynamic-graph',
                    'index': n_clicks
                },
                figure={}
            ),
            dcc.RadioItems(
                id={
                    'type': 'dynamic-choice',
                    'index': n_clicks
                },
                options=[{'label': 'Bar Chart', 'value': 'bar'},
                         {'label': 'Line Chart', 'value': 'line'},
                         {'label': 'Scatter Chart', 'value': 'scatter'},
                         {'label': 'Pie Chart', 'value': 'pie'}],
                value='bar',
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-s',
                    'index': n_clicks
                },
                options=[{'label': s, 'value': s} for s in np.sort(df['country'].unique())],
                multi=True,
                value=["United States", "China"],
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-ctg',
                    'index': n_clicks
                },
                options=[{'label': c, 'value': c} for c in ['country']],
                value='country',
                clearable=False
            ),
            dcc.Dropdown(
                id={
                    'type': 'dynamic-dpn-num',
                    'index': n_clicks
                },
                options=[{'label': n, 'value': n} for n in ['population']],
                value='population',
                clearable=False
            )
            
        ]
    )
    div_children.append(new_child)
    return div_children
html.Br()

Second Callback and create Graphs:

  • The display_dropdowns callback returns two elements with the same index: a dropdown and a div.
  • The second callback uses the MATCH selector. With this selector, we’re asking Dash to:
    1. Fire the callback whenever the value property of any component with the id 'type': 'dynamic-dropdown' changes: Input({'type': 'dynamic-dropdown', 'index': MATCH}, 'value')
    2. Update the component with the id 'type': 'dynamic-output' and the index that matches the same index of the input: Output({'type': 'dynamic-output', 'index': MATCH}, 'children')
    3. Pass along the id of the dropdown into the callback: State({'type': 'dynamic-dropdown', 'index': MATCH}, 'id')
  • With the MATCH selector, only a single value is passed into the callback for each Input or State.
  • Notice how it’s important to design IDs dictionaries that “line up” the inputs with outputs. The MATCH contract is that Dash will update whichever output has the same dynamic ID as the id. In this case, the “dynamic ID” is the value of the index and we’ve designed our layout to return dropdowns & divs with identical values of index.
  • In some cases, it may be important to know which dynamic component changed. As above, you can access this by setting id as State in the callback.
  • You can also use dash.callback_context to access the inputs and state and to know which input changed. outputs_list is particularly useful with MATCH because it can tell you which dynamic component this particular invocation of the callback is responsible for updating. Here is what that data might look like with two dropdowns rendered on the page after we change the first dropdown.

The second callback renders the chart interactively. It uses a dictionary of ‘type and ‘index’. The dynamic part of the callback is the input – component_id and the component_property which is the value. Input will trigger when the value of the component_id is changed which refers to the dynamic-dpn-s. The index is going to be matched with the ‘index’ : MATCH = 1.

dff – Always make a copy of the data frame.

Sometimes the user wants to see the data in different charts. With the multiple charts and dropdown options, the user gets to select the different countries he/she is interested in.

@app.callback(
    Output({'type': 'dynamic-graph', 'index': MATCH}, 'figure'),
    [Input(component_id={'type': 'dynamic-dpn-s', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-ctg', 'index': MATCH}, component_property='value'),
     Input(component_id={'type': 'dynamic-dpn-num', 'index': MATCH}, component_property='value'),
     Input({'type': 'dynamic-choice', 'index': MATCH}, 'value')]
)
def update_graph(s_value, ctg_value, num_value, chart_choice):
    print(s_value)
    dff = df[df['country'].isin(s_value)]

    if chart_choice == 'bar':
        dff = dff.groupby([ctg_value], as_index=False)[['population']].sum()
        fig = px.bar(dff, x='country', y=num_value)
        return fig
    elif chart_choice == 'line':
        if len(s_value) == 0:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.line(dff, x='year', y=num_value, color=ctg_value)
            return fig
    elif chart_choice == 'scatter':
        if len(s_value) == 1:
            return {}
        else:
            dff = dff.groupby([ctg_value, 'year'], as_index=False)[['population']].sum()
            fig = px.scatter(dff, x='year', y=num_value, color=ctg_value)
            return fig    
    elif chart_choice == 'pie':
        fig = px.pie(dff, names=ctg_value, values=num_value)
        return fig

Here is the link on how to setup a development server.

if __name__ == '__main__':
    app.run_server(debug=False)
Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

Conclusion:


CONGRATULATIONS! You have just learnt how to develop Web apps. Dash Plotly gives data scientists the power to build web apps to interact with data, deep learning, artificial intelligence and machine learning models.

In this introductory article, we’ve explored how to develop dashboard apps using Dash Plotly. Although it’s a trivial application, it illustrates the core concepts of this technology. Besides development, we’ve also seen how effortless it is to code in Plotly.

Dash is the original low-code framework for rapidly building data apps in Python, R, Julia, and F# (experimental).

Written on top of Plotly.js and React.js, Dash is ideal for building and deploying data apps with customized user interfaces. It’s particularly suited for anyone who works with data.

Through a couple of simple patterns, Dash abstracts away all of the technologies and protocols that are required to build a full-stack web app with interactive data visualization.

Dash is simple enough that you can bind a user interface to your code in less than 10 minutes.

Dash apps are rendered in the web browser. You can deploy your apps to VMs or Kubernetes clusters and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready.

There is a lot behind the framework. To learn more about how it is built and what motivated Dash, read their announcement letter or their post Dash is React for Python.

Dash is an open source library released under the permissive MIT license. Plotly develops Dash and also offers a platform for writing and deploying Dash apps in an enterprise environment. If you’re interested, please get in touch.

Web Apps are great for Data Visualization and gives the clients more flexibilities to navigate and maneuver the data. It’s very user friendly and aid in simplifying the understanding of the DATA.

Data Visualization Using Python

Using Machine Learning Data Distribution

Powerful Exploratory Data Analysis in 2 lines of codes

Data Visualization Using Python

In this example we’ll perform different Data Visualization charts on Population Data. There’s an easy way to create visuals directly from Pandas, and we’ll see how it works in detail in this tutorial.

Install neccessary Libraries

To easily create interactive visualizations, we need to install Cufflinks. This is a library that connects Pandas with Plotly, so we can create visualizations directly from Pandas (in the past you had to learn workarounds to make them work together, but now it’s simpler) First, make sure you install Pandas and Plotly running the following commands on the terminal:

Install the following labraries in the this order – on Conda CMD prompt pip install pandas pip install plotly pip install cufflinks

Import the following Libraries

import pandas as pd
import cufflinks as cf
from IPython.display import display,HTML
cf.set_config_file(sharing='public',theme='ggplot',offline=True)

In this case, I’m using the ‘ggplot’ theme, but feel free to choose any theme you want. Run the command cf.getThemes() to get all the themes available. To create data visualization with Pandas in the following sections, we only need to use the syntaxdataframe.iplot().

The data we’ll use is a population dataframe. First, download the CSV file from Kaggle.com, move the file where your Python script is located, and then read it in a Pandas dataframe as shown below.

#Format year column to number with no decimals
df_population = pd.read_csv('documents/population/population.csv')
#use a list of indexes:
print(df_population.loc[[0,10]])
   country    year    population
0    China  2020.0  1.439324e+09
10   China  1990.0  1.176884e+09
print(df_population.head(10))
  country    year    population
0   China  2020.0  1.439324e+09
1   China  2019.0  1.433784e+09
2   China  2018.0  1.427648e+09
3   China  2017.0  1.421022e+09
4   China  2016.0  1.414049e+09
5   China  2015.0  1.406848e+09
6   China  2010.0  1.368811e+09
7   China  2005.0  1.330776e+09
8   China  2000.0  1.290551e+09
9   China  1995.0  1.240921e+09

This dataframe is almost ready for plotting, we just have to drop null values, reshape it and then select a couple of countries to test our interactive plots. The code shown below does all of this.

# dropping null values
df_population = df_population.dropna()
# reshaping the dataframe
df_population = df_population.pivot(index="year", columns="country", values="population")
# selecting 5 countries
df_population = df_population[['United States', 'India', 'China', 'Nigeria', 'Spain']]
print(df_population.head(10))
country  United States         India         China      Nigeria       Spain
year                                                                       
1955.0     171685336.0  4.098806e+08  6.122416e+08   41086100.0  29048395.0
1960.0     186720571.0  4.505477e+08  6.604081e+08   45138458.0  30402411.0
1965.0     199733676.0  4.991233e+08  7.242190e+08   50127921.0  32146263.0
1970.0     209513341.0  5.551898e+08  8.276014e+08   55982144.0  33883749.0
1975.0     219081251.0  6.231029e+08  9.262409e+08   63374298.0  35879209.0
1980.0     229476354.0  6.989528e+08  1.000089e+09   73423633.0  37698196.0
1985.0     240499825.0  7.843600e+08  1.075589e+09   83562785.0  38733876.0
1990.0     252120309.0  8.732778e+08  1.176884e+09   95212450.0  39202525.0
1995.0     265163745.0  9.639226e+08  1.240921e+09  107948335.0  39787419.0
2000.0     281710909.0  1.056576e+09  1.290551e+09  122283850.0  40824754.0

Lineplot

Let’s make a lineplot to compare how much the population has grown from 1955 to 2020 for the 5 countries selected. As mentioned before, we will use the syntax df_population.iplot(kind=’name_of_plot’) to make plots as shown below.

df_population.iplot(kind='line',xTitle='Years', yTitle='Population',
                    title='Population (1955-2020)')

Barplot

We can make a single barplot on barplots grouped by categories. Let’s have a look.

Single Barplot

Let’s create a barplot that shows the population of each country by the year 2020. To do so, first, we select the year 2020 from the index and then transpose rows with columns to get the year in the column. We’ll name this new dataframe df_population_2020 (we’ll use this dataframe again when plotting piecharts)

df_population_2020 = df_population[df_population.index.isin([2020])]
df_population_2020 = df_population_2020.T

Now we can plot this new dataframe with .iplot(). In this case, I’m going to set the bar color to blue using the color argument.

df_population_2020.iplot(kind='bar', color='blue',
                         xTitle='Years', yTitle='Population',
                         title='Population in 2020')

Barplot grouped by “n” variables

Now let’s see the evolution of the population at the beginning of each decade.

# filter years out
df_population_sample = df_population[df_population.index.isin([1980, 1990, 2000, 2010, 2020])]
# plotting
df_population_sample.iplot(kind='bar', xTitle='Years',
                           yTitle='Population')

Naturally, all of them increased their population throughout the years, but some did it at a faster rate.

Boxplot

Boxplots are useful when we want to see the distribution of the data. The boxplot will reveal the minimum value, first quartile (Q1), median, third quartile (Q3), and maximum value. The easiest way to see those values is by creating an interactive visualization. Let’s see the population distribution of the China.

df_population['China'].iplot(kind='box', color='green', 
                                     yTitle='Population')

Let’s say now we want to get the same distribution but for all the selected countries.

df_population.iplot(kind='box', xTitle='Countries',
                    yTitle='Population')

As we can see, we can also filter out any country by clicking on the legends on the right.

Histogram

A histogram represents the distribution of numerical data. Let’s see the population distribution of the USA and Nigeria.

df_population[['United States', 'Nigeria']].iplot(kind='hist',
                                                xTitle='Population')

Piechart

Let’s compare the population by the year 2020 again but now with a piechart. To do so, we’ll use the df_population_2020 dataframe created in the “Single Barplot” section. However, to make a piechart we need the “country” as a column and not as an index, so we use .reset_index() to get the column back. Then we transform the 2020 into a string.

# transforming data
df_population_2020 = df_population_2020.reset_index()
df_population_2020 =df_population_2020.rename(columns={2020:'2020'})
# plotting
df_population_2020.iplot(kind='pie', labels='country',
                         values='2020',
                         title='Population in 2020 (%)')

Scatterplot

Although population data is not suitable for a scatterplot (the data follows a common pattern), I would make this plot for the purposes of this guide. Making a scatterplot is similar to a line plot, but we have to add the mode argument.

df_population.iplot(kind='scatter', mode='markers')

Whaola! Now you’re ready to make your own beautiful interactive visualization with Pandas.

The difference between MS Excel, MS Access, MySQL, Cloud ML and AI

Definitions of each Application:

Microsoft Excel:
Microsoft Excel is used to displays the data in horizontal and vertical rows. The data are usually stored in the cells. We have an option of formulas in the Excel that can be used for data, data analysis, statistical analysis, and its place of storage.

  • You can even add any charts, graphics, etc. to make it more presentable.
  • Excel locks the whole spreadsheet once it is accessed.
  • An Excel document is referred to as a workbook and each of these workbooks must contain at least one worksheet.

Microsoft Access:
Microsoft Access is a database program, it uses unique ID numbers and an editable list of data to store details on large amounts of items, i.e., you could use this program to store large amount of data.

  • Access is designed to have multiple users working in the same DB files along with the various safety precautions items to help protect the data such as record level locking.
  • The database created in Access is saved with a .mdb extension.
  • Data is stored in tables.
  • Each field of a table can be associated with certain constraints like only allowing an alphanumeric value or different datatypes.
  • Like any other relational database, it works on the principles of tables, fields, and relationships. It supports different kinds of datatypes – numbers, dates, texts, etc.

MySQL:

MySQL is an open-source relational database management system based on SQL – Structured Query Language. The application is used for a wide range of purposes, including data warehousing, e-commerce, and logging applications.

  • Data is stored in tables.
  • Each field of a table can be associated with certain constraints like only allowing an alphanumeric value or different datatypes.
  • Like any other relational database, it works on the principles of tables, fields, and relationships. It supports different kinds of datatypes – numbers, dates, texts, etc.
  • MySQL is easy to use.
  • It is secure.
  • Client/ Server Architecture.
  • Free to download.
  • It is scalable.
  • High speed.
  • High Flexibility.

Cloud ML:

The Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale. The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs.

The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science. AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.

  • The cloud’s pay-per-use model is good for bursty AI or machine learning workloads.
  • The cloud makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand increases.
  • The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.
  • AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.

Cloud AI:

The AI cloud, a concept only now starting to be implemented by enterprises, combines artificial intelligence (AI) with cloud computing. An AI cloud consists of a shared infrastructure for AI use cases, supporting numerous projects and AI workloads simultaneously, on cloud infrastructure at any given point in time.

  • Data Mining.
  • Agile Development.
  • Reshaping of IT Infrastructure.
  • Seamless Data Access.
  • Analytics and Prediction.
  • Cloud Security Automation.
  • Cost-Effective.
MS ExcelMS AccessMySQLCloud MLCloud AI
Microsoft Excel is an application that uses spreadsheets to create charts, graphs, tabular models.Microsoft Access is an application that acts as a database program. Access deal with database program by collecting, sorting, and manipulating data.MySQL is an open-source relational database management system based on SQLThe Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale.An AI cloud consists of a shared infrastructure for AI use cases, supporting numerous projects and AI workloads simultaneously, on cloud infrastructure at any given point in time.
It is used for spreadsheets,  statistical and financial calculations.   Excel helps in performing sophisticated what-if analysis operations on your data, such as statistical, engineering, and regression analysis.It is used for storing and manipulating large amounts of information.   Access do not perform what-if analysis.The application is used for a wide range of purposes, including data warehousing, e-commerce, and logging applications.The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs.Enterprises use the power of AI-driven cloud computing to be more efficient, strategic, and insight-driven. AI can automate complex and repetitive tasks to boost productivity, as well as perform data analysis without any human intervention. IT teams can also use AI to manage and monitor core workflows.
Microsoft Excel is easy to learn.Microsoft access is quite hard to learn.   Access that needs programming knowledge for some part.  MySQL is easy to use.The pay-per-use model further makes it easy to access more sophisticated capabilities without the need to bring in new advanced hardware.Cloud AI Platform is a service that enables user to easily build machine learning models, that work on any type of data, of any size.
The storage capacity is less since excel isn’t built for storing data.The storage capacity is more since access is mainly built for storing, sorting, and manipulating databases.MySQL stores data in files in your hard disk. The maximum size of MySQL table is 65536 terabytes.This storage service provides petabytes of capacity with a maximum unit size of 10 MB per cell and 100 MB per row. 1024 Petabytes of data.1024 Petabytes of data. The larger the RAM the higher the amount of data it can handle hence faster processing. 16GB RAM and above is recommended for most deep learning tasks.
Excel is less flexibility as compared to access.Access has more flexibility as compared to excel.High Flexibility.  High Flexibility and Cost Effective.Seamless Data Access. High Flexibility and Cost Effective.
It works on the data model of a non-relational or flat worksheet.It works on the model of multiple relational tables and sheets.MySQL is a Relational Database Management System (RDBMS). The logical model, with objects such as databases, tables, views, rows, and columns, offers a flexible programming environment.Cloud ML Engine is used to train machine learning models in TensorFlow and other Python ML libraries (such as scikit-learn) without having to manage any infrastructure.In Artificial Intelligence, the Decision Tree (DT) model is used to arrive at a conclusion based on the data from past decisions. 
It locks the entire spreadsheet.It locks data at the record level.MySQL uses table-level locking in all storage engines except InnoDB meaning that table-level locking is used for tables running the MyISAM, MEMORY and MERGE storage engines, permitting only one session to update tables at a time.Cloud DLP – Data Loss Prevention provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.Cloud DLP – Data Loss Prevention provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.
Excel is good for short term solutions and small-scale projectsAccess is good for long term solutions and large-scale projects.MySQL is ideal for storing application data, specifically web application data. As MySQL is a relational database, it’s a good fit for applications that rely heavily on multi-row transactions.The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.  The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.  
Microsoft ProductMicrosoft ProductOracle ProductGoogle, Amazon, Microsoft, and IBMGoogle, Amazon, Microsoft, and IBM
   ML’s aim is to improve accuracy without caring for success.

The goal of AI is to increase the chances of success.
   ML is the way for the computer program to learn from experience.AI is a computer program doing smart work.
   The ML’s goal is to keep learning from data to maximize the performance.The future goal of AI is to stimulate intelligence for solving highly complex programs.
   ML allows the computer to learn new things from the available information.AI involves decision-making.
   ML looks for the only solution.AI looks for optimal solutions.
     

The difference between MS Excel and MS Access

Microsoft Excel:
Microsoft Excel is used to displays the data in horizontal and vertical rows. The data are usually stored in the cells. We have an option of formulas in the Excel that can be used for data, data analysis, statistical analysis, and its place of storage.

  • You can even add any charts, graphics, etc. to make it more presentable.
  • Excel locks the whole spreadsheet once it is accessed.
  • An Excel document is referred to as a workbook and each of these workbooks must contain at least one worksheet.

Microsoft Access:
Microsoft Access is a database program, it uses unique ID numbers and an editable list of data to store details on large amounts of items, i.e., you could use this program to store large amount of data.

  • Access is designed to have multiple users working in the same DB files along with the various safety precautions items to help protect the data such as record level locking.
  • The database created in Access is saved with a .mdb extension.
  • Data is stored in tables.
  • Each field of a table can be associated with certain constraints like only allowing an alphanumeric value or different datatypes.
  • Like any other relational database, it works on the principles of tables, fields, and relationships. It supports different kinds of datatypes – numbers, dates, texts, etc.
MS ExcelMS Access
Microsoft Excel is an application that uses spreadsheets to create charts, graphs, tabular models.Microsoft Access is an application that acts as a database program. Access deal with database program by collecting, sorting, and manipulating data.
It is used for spreadsheets,  statistical and financial calculations.   Excel helps in performing sophisticated what-if analysis operations on your data, such as statistical, engineering, and regression analysis.It is used for storing and manipulating large amounts of information.   Access do not perform what-if analysis.
Microsoft Excel is easy to learn.Microsoft access is quite hard to learn.   Access that needs programming knowledge for some part.  
The storage capacity is less since excel isn’t built for storing data.The storage capacity is more since access is mainly built for storing, sorting, and manipulating databases.
Excel is less flexibility as compared to access.Access has more flexibility as compared to excel.
It works on the data model of a non-relational or flat worksheet.It works on the model of multiple relational tables and sheets.
It locks the entire spreadsheet.It locks data at the record level.
Excel is good for short term solutions and small-scale projectsAccess is good for long term solutions and large-scale projects.
Microsoft ProductMicrosoft Product

Conclusion:

Microsoft Excel and Access are great tools for data exploration of structured and unstructured data, which has earned a top spot amongst tools used by Data Scientists and Analysts.