Data Governance

What is Data Governance?

Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information that enables an organization to achieve its goals.

  • Establish the processes and responsibilities that provide the quality and security of the data used across a business or organization.
  • Is the practice of identifying and collecting data across a business or organization.
  • Defines who can take what action upon what data, in which situations, and using what methods.

 

Data Government Policy

A data governance policy is a document that formally outlines how organizational data will be managed and controlled. A few common areas covered by data governance policies are:

  • Data quality – ensuring data is correct, consistent and free of “noise” that might be impeded usage and analysis.
  • Data availability – ensuring that data is available and easy to consume by the business functions that require it.
  • Data usability – ensuring data is clearly structured, documented and labeled, enables easy search and retrieval, and is compatible with tools used by business users.
  • Data integrity – ensuring data retains its essential qualities even as it is stored, converted, transferred and viewed across different platforms.
  • Data security – ensuring data is classified according to its sensitivity, and defining processes for safeguarding information and preventing data loss and leakage.

Addressing all of these points requires a right combination of people skills, internal processes, and the appropriate technology.

Data Stewards

Data stewards are individual team members responsible for overseeing data and implementing policies and processes. Data stewards are typically subject matter experts who are familiar with the data used by a specific business functions or department. These roles are typically filled by IT or data professionals with expertise on data domains and assets. Data stewards may also play a role as engineers, quality analysts, data modelers, and data architects. They also ensure the fitness of data elements, both content and metadata, administer the data and ensure compliance with regulations.

Data Governance vs Data Management

Data governance is a strategy used while data management is the practices used to protect the value of data. When creating a data governance strategy, you incorporate and define data management practices. Data governance examples and policies direct how technologies and solutions are used, while management leverages these solutions to achieve tasks.

Data Governance Frameworks

A data governance framework is a structure that helps an organization assign responsibilities, make decisions, and take action on enterprise data. Data governance frameworks can be classified into three types:

  • Command and control – the framework designates a few employees as data stewards, and requires them to take on data governance responsibilities.
  • Traditional – the framework designates a larger number of employees as data stewards, on a voluntary basis, with a few serving as “critical data stewards” with additional responsibilities.
  • Non-invasive – the framework recognizes people as data stewards based on their existing work and relation to the data; everyone who creates and modifies data becomes a data steward for that data.

Essential elements of a data governance framework include:

  • Funding and management support – a data governance framework is not meaningful unless it is backed by management as an official company policy.
  • User engagements – ensuring those who consume the data understand and will cooperate with data governance rules.
  • Data governance council – a formal body responsible for defining the data governance framework and helping to enact it in the organization.

While many companies create data governance frameworks independently, there are several standards which can help formulate a data governance framework, including COBIT, ISO/IEC 38500, and ISO/TC 215.

Goals of Information Governance Initiatives

Data and information governance helps organizations achieve goals such as:

  • Complying with standards like SOX, Basel I/II, HIPAA, GDPR
  • Maximizing the value of data and enabling its re-use
  • Improving data-driven decision making
  • Reducing the cost of data management

Data Governance Strategy

A data governance strategy informs the content of an organization’s data governance framework. It requires you to define, for each set of organizational data:

  • Where: Where it is physically stored
  • Who: Who has or should have access to it
  • What: Definition of important entities such as “customer”, “vendor”, “transaction”
  • How: What the current structure of the data is
  • Quality: Current and desired quality of the source data and consumable data sets
  • Goals: What we want to do with this data
  • Requirements: What needs to happen for the data to meet the goals

What is a Data Governance Policy and Why is it Important?

Data governance policies are guidelines that you can use to ensure your data and assets are used properly and managed consistently. These guidelines typically include policies related to privacy, security, access, and quality. Guidelines also cover the roles and responsibilities of those implementing policies and compliance measures.

The purpose of these policies is to ensure that organizations are able to maintain and secure high-quality data. Governance policies form the base of your larger governance strategy and enable you to clearly define how governance is carried out.

Data Governance Roles

Data governance operations are performed by a range of organizational members, including IT staff, data management professionals, business executives, and end users. There is no strict standard for who should fill data governance roles but there are standard roles that organizations implement.

Chief Data Officer

Chief data officers are typically senior executives that oversee your governance program. This role is responsible for acting as a program advocate, working to secure staffing, funding, and approval for the project, and monitoring program progress.

Data Governance Manager and Team

Data governance managers may be covered by the chief data officer role or may be separate staff. This role is responsible for managing your data governance team and having a more direct role in the distribution and management of tasks. This person helps coordinate governance processes, leads training sessions and meetings, evaluates performance metrics, and manages internal communications.

Data Governance Committee

The data governance committee is an oversight committee that approves and directs the actions of the governance team and manager. This committee is typically composed of data owners and business executives.

They take the recommendations of the data governance professionals and ensure that processes and strategies align with business goals. This committee is also responsible for resolving disputes between business units related to data or governance.

A 4-Step Data Governance Model

Managing data governance principles effectively requires creating a business function, similar to human resources or research and development. This function needs to be well defined and should include the following process steps:

  1. Discovery—processes dedicated to determining the current state of data, which processes are dependent on data, what technical and organizational capabilities support data, and the flow of the data lifecycle. These processes derive insights about data and data use for use in definition processes. Discovery processes run simultaneously with and are used iteratively with definition processes.
  2. Definition—processes dedicated to the documentation of data definitions, relationships, and taxonomies. In these processes, insights from discovery processes are used to define standards, measurements, policies, rules, and strategies to operationalize governance.
  3. Application—processes dedicated to operationalizing and ensuring compliance with governance strategies and policies. These processes include the implementation of roles and responsibilities for governance.
  4. Measurement—processes dedicated to monitoring and measuring the value and effectiveness of governance workflows. These processes provide visibility into governance practices and ensure auditability.

Typical Data Governance Questions

  1. Can these data be trusted?
  2. Who understand these data?
  3. Who does what in terms of data governance?
  4. Where can we find the data needed for the process?
  5. Who should be able to change this data?
  6. What happens after changes are made?

 

Data Governance Maturity Model

Evaluating the maturity of your governance strategies can help you identify areas of improvement. When evaluating your practices, consider the following levels.

Level 0: Unaware

Level 0 organizations have no awareness of data governance meaning and no system or set of policies defined for data. This includes a lack of policies for creating, collecting, or sharing information. No data models are outlined and no standards are established for storing or transferring data.

Action items

Strategy planners and system architects need to inform IT and business leaders about the importance and benefits of data governance and Enterprise Information Management (EIM).

Level 1: Aware

Level 1 Organizations understand that they are lacking data governance solutions and processes but have few or no strategies in place. Typically IT and business leaders understand that Enterprise Information Management (EIM) is important but have not taken action to enforce the creation of governance policies.

Action Items

Planners and architects need to begin determining organization needs and developing a strategy to meet those needs.

Level 2: Reactive

Level 2 organizations understand the importance and value of data and have some policies in place to protect data. Typically, the practices used to protect data by these organizations are ineffective, incomplete, or inconsistently enforced.

Action items

Management teams need to push for consistency and standardization for the implementation of policies.

Level 3: Proactive

Level 3 organizations are actively working to apply governance, including implementing proactive measures. Data governance is a part of all organizational processes. However, there is typically no universal system for governance. Instead, information owners are responsible for management.

Action items

Organizations need to evaluate governance at the departmental level and centralize responsibilities.

Level 4: Managed

Level 4 organizations have developed and consistently implemented governance policies and standards. These organizations have categorized their data assets and can monitor data use and storage. Additionally, oversight of governance is performed by an established team with roles and responsibilities.

Action Items

Teams should actively track data management tasks and perform audits to ensure that policies are applied consistently.

Level 5: Effective

Level 5 organizations have achieved reliable data governance structures. They may have individuals in their teams with data governance certifications and have established experts. These organizations can effectively leverage their data for competitive advantage and improvements in productivity.

Action items

Teams should work to maintain governance and verify compliance. Teams may also actively investigate methods for improving proactive governance. For example, by researching best practices for specific governance cases, like big data governance.

Data Governance Best Practices

A data governance initiative must start with broad management support and acceptance from stakeholders who own and manage the data (called data custodians).

It is advisable to start with a small pilot project, on a set of data which is especially problematic and in need of governance, to show stakeholders and management what is involved, and demonstrate the return on investment of data governance activity.

When rolling out data governance across the organization, use templates, models and existing tools when possible in order to save time and empower organizational roles to improve quality, accessibility and integrity for their own data. Evaluate and consider using data governance tools which can help standardize processes and automate manual activities.

Most importantly, build a community of data stewards willing to take responsibility for data quality. Preferably, these should be the individuals who already create and manage data sets, and understand the value of making data usable for the entire organization.

The difference between Machine Learning (ML) and Artificial Intelligence (AI)

Cloud ML:

The Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale. The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs.

The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science. AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.

  • The cloud’s pay-per-use model is good for bursty AI or machine learning workloads.
  • The cloud makes it easy for enterprises to experiment with machine learning capabilities and scale up as projects go into production and demand increases.
  • The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.
  • AWS, Microsoft Azure, and Google Cloud Platform offer many machine learning options that don’t require deep knowledge of AI, machine learning theory, or a team of data scientists.

Cloud AI:

The AI cloud, a concept only now starting to be implemented by enterprises, combines artificial intelligence (AI) with cloud computing. An AI cloud consists of a shared infrastructure for AI use cases, supporting numerous projects and AI workloads simultaneously, on cloud infrastructure at any given point in time.

Artificial intelligence (AI) assists in the automation of routine activities within IT infrastructure, which increases productivity. The combination of AI and cloud computing results in an extensive network capable of holding massive volumes of data while continuously learning and improving.

  • Data Mining.
  • Agile Development.
  • Reshaping of IT Infrastructure.
  • Seamless Data Access.
  • Analytics and Prediction.
  • Cloud Security Automation.
  • Cost-Effective.
Cloud MLCloud AI
The Cloud ML Engine is a hosted platform to run machine learning training jobs and predictions at scale.An AI cloud consists of a shared infrastructure for AI use cases, supporting numerous projects and AI workloads simultaneously, on cloud infrastructure at any given point in time.
The service can also be used to deploy a model that is trained in external environments. Cloud ML Engine automates all resource provisioning and monitoring for running the jobs.Enterprises use the power of AI-driven cloud computing to be more efficient, strategic, and insight-driven. AI can automate complex and repetitive tasks to boost productivity, as well as perform data analysis without any human intervention. IT teams can also use AI to manage and monitor core workflows.
The pay-per-use model further makes it easy to access more sophisticated capabilities without the need to bring in new advanced hardware.Cloud AI Platform is a service that enables user to easily build machine learning models, that work on any type of data, of any size.
This storage service provides petabytes of capacity with a maximum unit size of 10 MB per cell and 100 MB per row. 1024 Petabytes of data.1024 Petabytes of data. The larger the RAM the higher the amount of data it can handle hence faster processing. 16GB RAM and above is recommended for most deep learning tasks.
High Flexibility and Cost Effective.Seamless Data Access. High Flexibility and Cost Effective.
Cloud ML Engine is used to train machine learning models in TensorFlow and other Python ML libraries (such as scikit-learn) without having to manage any infrastructure.In Artificial Intelligence, the Decision Tree (DT) model is used to arrive at a conclusion based on the data from past decisions. 
Cloud DLP – Data Loss Prevention provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.Cloud DLP – Data Loss Prevention provides tools to classify, mask, tokenize, and transform sensitive elements to help you better manage the data that you collect, store, or use for business or analytics.
The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.  The cloud makes intelligent capabilities accessible without requiring advanced skills in artificial intelligence or data science.  
Google, Amazon, Microsoft, and IBMGoogle, Amazon, Microsoft, and IBM
ML’s aim is to improve accuracy without caring for success.

The goal of AI is to increase the chances of success.
ML is the way for the computer program to learn from experience.AI is a computer program doing smart work.
The ML’s goal is to keep learning from data to maximize the performance.The future goal of AI is to stimulate intelligence for solving highly complex programs.
ML allows the computer to learn new things from the available information.AI involves decision-making.
ML looks for the only solution.AI looks for optimal solutions.
  

ML and AI:

Even though many differences exist between ML and AI, they are closely connected. AI and ML are often viewed as the body and the brain. The body collects information, the brain processes it. The same is with AI, which accumulates information while ML processes it.

Conclusion:

AI involves a computer executing a task a human could do. Machine learning involves the computer learning from its experience and making decisions based on the information. While the two approaches are different, they are often used together to achieve many goals in different industries.

Correlation Measures The Relationship Between Two Variables

What is Correlation?

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It’s a common tool for describing simple relationships without making a statement about cause and effect.

Why is Correlation important?

Once correlation is known it can be used to make predictions. When we know a score on one measure we can make a more accurate prediction of another measure that is highly related to it. The stronger the relationship between/among variables the more accurate the prediction.

Related Articles:

How to Calculate Correlation