**Clusters**are collections of similar data**Clustering**is a type of unsupervised learning- The
**Correlation Coefficient**describes the strength of a relationship.

## Clusters

**Clusters** are collections of data based on similarity.

Data points clustered together in a graph can often be classified into clusters.

In the graph below we can distinguish 3 different clusters:

## Identifying Clusters

Clusters can hold a lot of valuable information, but clusters come in all sorts of shapes, so how can we recognize them?

The two main methods are:

- Using Visualization
- Using an Clustering Algorithm

## Clustering

**Clustering** is a type of **Unsupervised Learning**.

Clustering is trying to:

- Collect similar data in groups
- Collect dissimilar data in other groups

## Clustering Methods

- Density Method
- Hierarchical Method
- Partitioning Method
- Grid-based Method

The **Density Method** considers points in a dense regions to have more similarities and differences than points in a lower dense region. The density method has a good accuracy. It also has the ability to merge clusters.

Two common algorithms are DBSCAN and OPTICS.

The **Hierarchical Method** forms the clusters in a tree-type structure. New clusters are formed using previously formed clusters.

Two common algorithms are CURE and BIRCH.

The **Grid-based Method** formulates the data into a finite number of cells that form a grid-like structure.

Two common algorithms are CLIQUE and STING

The **Partitioning Method** partitions the objects into k clusters and each partition forms one cluster.

One common algorithm is CLARANS.

## Correlation Coefficient

The **Correlation Coefficient** (r) describes the strength and direction of a linear relationship and x/y variables on a scatterplot.

The value of r is always between -1 and +1:

-1.00 | Perfect downhill | Negative linear relationship. |

-0.70 | Strong downhill | Negative linear relationship. |

-0.50 | Moderate downhill | Negative linear relationship. |

-0.30 | Weak downhill | Negative linear relationship. |

0 | No linear relationship. | |

+0.30 | Weak uphill | Positive linear relationship. |

+0.50 | Moderate uphill | Positive linear relationship. |

+0.70 | Strong uphill | Positive linear relationship. |

+1.00 | Perfect uphill | Positive linear relationship. |