# AI – Collecting Data

Up to 80% of an Artificial Intelligence project is about Collecting Data:

• What data is Required?
• What data is Available?
• How to Select the data?
• How to Collect the data?
• How to Clean the data?
• How to Prepare the data?
• How to Use the data?

## What is Data?

Data can be many things. With Artificial Intelligence it must be a collection of facts:

## Intelligence Needs Data

Human intelligence needs data:

A real estate broker needs data about sold houses to estimate prices.

Artificial intelligence needs data:

A computer program also needs data to estimate prices.

## Storing Data

The most common data to collect are Numbers and Measurements.

Often data are stored in arrays representing the relationship between values.

This table contains house prices versus size:

## Quantitative vs. Qualitative

Quantitative data are numerical:

• 55 cars
• 15 meters
• 35 children

Qualitative data are descriptive:

• It is cold
• It is long
• It was fun

## Census or Sampling

Census is when we collect data for every member of a group.

Sample is when we collect data for some members of a group.

If we wanted to know how many Americans smoke cigarettes, we could ask every person in the US (a census), or we could ask 10 000 people (a sample).

A census is Accurate, but hard to do. A sample is Inaccurate, but is easier to do.

## Sampling Terms

Population is group of individuals (objects) we want to collect information from.

Census is information about every individual in a population.

Sample is information about a part of the population (In order to represent all).

## Random Samples

In order for a sample to represent a population, it must be collected randomly.

Random Sample, is a sample where every member of the population has an equal chance to appear in the sample.

## Sampling Bias

Sampling Bias (Error) occurs when samples are collected in such a way that some individuals are less (or more) likely to be included in the sample.