Pandas Library
Introduction to Pandas
Data manipulation with Pandas
Indexing and Selection
DataFrame is a 2-dimensional labelled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
-
Dict of 1D ndarrays, lists, dicts, or Series
The resulting index will be the union of the indexes of the various Series
-
2-D numpy.ndarray
The ndarrays must all be the same length. If an index is passed, it must clearly also be the same length as the arrays
Column selection, addition, deletion in Dataframe
You can treat a DataFrame semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations.
Adding new Column
Deleting a Column
The insert function is used to insert a new column at a specific column location:
Indexing / selection
Operation | Syntax | Result |
Select column | df[col] | Series |
Select row by label | df.loc[label] | Series |
Select row by integer location | df.iloc[loc] | Series |
Slice rows | df[5:10] | DataFrame |
Data alignment and arithmetic
Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).
DataFrame.loc:
Access a group of rows and columns by label(s).
.loc[] is primarily label based, but may also be used with a boolean array.
Single label. Note this returns the row as a Series.
List of labels.
Note using [[]] returns a DataFrame.
Slice with labels for row and single label for column:
Object with conditional return:
DataFrame.iloc:
Similar to loc [[]] double square bracket will give you Dataframe as outcome while [] single square bracket will give return as Series.
Below are few example:
Return remains the same for both method but first one will be Series & second will be Dataframe
Below are syntax which gives return as DataFrame:
Below are syntax which gives return as Series:
Handling missing data
Data aggregation and grouping
Data merging and joining