Introduction to Pandas Library
In this section, we will introduce the Pandas library in Python. Pandas is a powerful data manipulation and analysis library that provides easy-to-use data structures and data analysis tools. We will cover topics such as Pandas data structures, data handling, data cleaning, and basic data analysis using Pandas.
Pandas Data Structures
Pandas provides two primary data structures: Series and DataFrame.
Series
A Series is a one-dimensional labeled array that can hold any data type. It is similar to a column in a spreadsheet or a single column of a SQL table.
import pandas as pd # Create a Series s = pd.Series([3, 1, 5, 2, 4]) print(s)
DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different data types. It is similar to a spreadsheet or a SQL table.
import pandas as pd # Create a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Sydney']} df = pd.DataFrame(data) print(df)
Data Handling and Cleaning
Pandas provides numerous functions and methods for handling and cleaning data, such as selecting and filtering data, handling missing values, and transforming data.
Selecting and Filtering Data
You can select specific rows or columns from a DataFrame based on certain conditions using boolean indexing or label-based indexing.
import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 35, 40], 'City': ['New York', 'London', 'Paris', 'Sydney']} df = pd.DataFrame(data) # Select rows with Age greater than 30 filtered_df = df[df['Age'] > 30] print(filtered_df)
Handling Missing Values
Pandas provides functions to handle missing values, such as `isnull()`, `fillna()`, and `dropna()`. These functions allow you to identify, replace, or remove missing values in your data.
import pandas as pd import numpy as np data = {'Name': ['Alice', 'Bob', np.nan, 'David'], 'Age': [25, 30, np.nan, 40], 'City': ['New York', 'London', 'Paris', np.nan]} df = pd.DataFrame(data) # Check for missing values print(df.isnull()) # Fill missing values with a specific value df_filled = df.fillna('Unknown') print(df_filled) # Drop rows with missing values df_dropped = df.dropna() print(df_dropped)
Basic Data Analysis with Pandas
Pandas provides a wide range of functions for basic data analysis, such as descriptive statistics, grouping and aggregating data, and merging datasets.