Statistics for Machine Learning

Types of Data

In statistics, data can be classified into different types based on its nature and characteristics. Understanding the types of data is essential as it determines the appropriate statistical methods and techniques to be used for analysis.

Categorical Data

Categorical data consists of qualitative variables that represent categories or groups. These variables do not have numerical values but are used to label and classify data into different groups. Examples of categorical data include:

  • Gender (Male/Female)
  • Marital Status (Married/Single/Divorced)
  • Product Categories (Electronics/Clothing/Furniture)
  • Colors (Red/Blue/Green)
Numerical Data

Numerical data consists of quantitative variables that represent measurable quantities with numerical values. These variables can be further classified into two types:

Continuous vs. Discrete Data

Continuous data takes on an infinite number of possible values within a specific range. It can be measured with great precision and includes variables such as age, weight, height, temperature, etc.

Discrete data, on the other hand, can only take on specific values from a finite set of numbers. It is often counted and includes variables like the number of children in a family, the number of cars in a parking lot, etc.