Importing CSV file in Python.

Python is a very useful and efficient language for data manipulation and analysis. But to analyse a dataset it has to be imported into our programming enviroment and one of the libraries in python which can help an analyst to do the same is pandas.

In this short write-up we will look into the steps needed to import a CSV file into a DataFrame using pandas.

Dependencies:

You can install pandas on Windows by either of the following:

  • Using pip
  • Using Anaconda

To install pandas with pip you need to go to the command prompt (go to the search bar on your system and type ‘cmd’) and type the following:  (Note: We are assuming that python is already installed on your system.)

 pip install pandas

(pip is a package installer for python)

If everything goes fine it will look like this:

To install pandas with anaconda you need to install anaconda first(Click here) and then you have go to the Products section and choose the edition you want to download. Once it is installed you have a direct access to pandas as anaconda automatically installs many important libraries for you.

After the installation is complete open your Python Integrated Development Environment (IDE) whichever you are using for ex. IDLE, Jupyter Notebook, PyCharm, Spyder etc and start writing your code.

Code:

>>>import pandas as pd    # Line 1

>>>url=‘filepath\\filename.csv’

>>>data=pd.read_csv(url)  #Line 2

In first line we are importing the pandas library and giving it an alias ‘pd’ so that we don’t have to type ‘pandas’ each time we need to call it rather we can use its alias or a short form.

In the second line we are importing our data which is in a csv file by calling read_csv() function and providing it with the url which is the location of our csv file.

There are a few parameters in read_csv() function which are used frequently. Let us try to understand a few here:

  • filepath_or_buffer: We have already seen this parameter which is the url of our csv file.
  • sep: This is the delimiter to use, the default value of the separator is ‘,’ but you can specify your own.
  • header: If your csv file has headers the default value suffices which is header=0 and the first row of the csv is interpreted by python as column names but if your csv don’t have headers than you have to pass header= None
  • names: This is the list of column names to use in your dataframe having no header.

Finally we have successfully imported our data into python and now we can start our data analysis. The data analysis part is a big topic in itself which we will cover in coming blogs.

We sincerely hope that this write-up adds some value in your learning curve with python.

To know more about read_csv() function and it’s parameters click on the link below:

Read_csv documentation

Contributors:

Brijesh Pandey

He is working as a Lead-Strategic Partnerships with Klaymatrix Data Labs. He is a Data Science enthusiast, FRM (Charter Awaited),Ex-banker and MBA specializing in Finance.