How to Read CSV Files in Python

A CSV file is a file that contains values separated by a delimiter such as a comma. Each delimiter represents a table column value and every new line is a new row or data record. As a result, data from CSV's can be processed logically into a Python data type such as a list or dictionary so it can be used in the program.

 

In this tutorial, we will learn how to read CSV files in Python using the csv package with examples.

 

Import the csv Package

csv is a native Python package that makes it easier to process CSV data into Python data types. Before it can be used it will need to be imported at the top of your program like this:

 

import csv

 

Open and Read a CSV File with csv.reader()

To read a CSV file we can use the csv.reader() method, which will return each new line of the CSV as a list. To do this we will first have to open the CSV in read mode using the open() function before passing the file object to csv.reader().

 

In the example below we are using this sample CSV.

 

import csv

with open('sample1.csv', 'r') as file:
    reader = csv.reader(file)
    
    for l in reader:
        print(l)
['Month', ' "Average"', ' "2005"', ' "2006"', ' "2007"', ' "2008"', ' "2009"', ' "2010"', ' "2011"', ' "2012"', ' "2013"', ' "2014"', ' "2015"']
['May', '  0.1', '  0', '  0', ' 1', ' 1', ' 0', ' 0', ' 0', ' 2', ' 0', '  0', '  0  ']
['Jun', '  0.5', '  2', '  1', ' 1', ' 0', ' 0', ' 1', ' 1', ' 2', ' 2', '  0', '  1']
['Jul', '  0.7', '  5', '  1', ' 1', ' 2', ' 0', ' 1', ' 3', ' 0', ' 2', '  2', '  1']
['Aug', '  2.3', '  6', '  3', ' 2', ' 4', ' 4', ' 4', ' 7', ' 8', ' 2', '  2', '  3']
['Sep', '  3.5', '  6', '  4', ' 7', ' 4', ' 2', ' 8', ' 5', ' 2', ' 5', '  2', '  5']
['Oct', '  2.0', '  8', '  0', ' 1', ' 3', ' 2', ' 5', ' 1', ' 5', ' 2', '  3', '  0']
['Nov', '  0.5', '  3', '  0', ' 0', ' 1', ' 1', ' 0', ' 1', ' 0', ' 1', '  0', '  1']
['Dec', '  0.0', '  1', '  0', ' 1', ' 0', ' 0', ' 0', ' 0', ' 0', ' 0', '  0', '  1']

 

Read a CSV with Different Delimiters

While commas are the most popular delimiter for CSV's you will probably encounter CSV's using different delimiters such as : (colons) or | (pipes) as it is possible to use any delimiter character. To parse a different delimiter, we can specify it as an argument of the csv.reader() method using delimiter='char'.

 

import csv

with open('sample1.csv', 'r') as file:
    reader = csv.reader(file, delimiter='|')
    
    for l in reader:
        print(l)
['Month', ' "Average"', ' "2005"', ' "2006"', ' "2007"', ' "2008"']
['May', '  0.1', '  0', '  0', ' 1', ' 1']
['Jun', '  0.5', '  2', '  1', ' 1', ' 0']
['Jul', '  0.7', '  5', '  1', ' 1', ' 2']
['Aug', '  2.3', '  6', '  3', ' 2', ' 4']
['Sep', '  3.5', '  6', '  4', ' 7', ' 4']
['Oct', '  2.0', '  8', '  0', ' 1', ' 3']
['Nov', '  0.5', '  3', '  0', ' 0', ' 1']
['Dec', '  0.0', '  1', '  0', ' 1', ' 0']

 

note - for tab delimiters use \t (backslash, t).

 

Remove Initial Spaces

The example CSV we have been using has spaces after each delimiter, which remain after the file has been parsed into Python. To remove these and get values with no leading space characters, pass skipinitialspace=True as an argument of the csv.reader() method.

 

import csv

with open('sample1.csv', 'r') as file:
    reader = csv.reader(file, skipinitialspace=True)
    
    for l in reader:
        print(l)
['Month', 'Average', '2005', '2006', '2007', '2008']
['May', '0.1', '0', '0', '1', '1']
['Jun', '0.5', '2', '1', '1', '0']
['Jul', '0.7', '5', '1', '1', '2']
['Aug', '2.3', '6', '3', '2', '4']
['Sep', '3.5', '6', '4', '7', '4']
['Oct', '2.0', '8', '0', '1', '3']
['Nov', '0.5', '3', '0', '0', '1']
['Dec', '0.0', '1', '0', '1', '0']

 

Now we have nice clean values with no leading spaces.

 

Reading CSV Files with Quotes

Some CSV's have values wrapped in quotation marks. To remove them can pass quoting=csv.QUOTE_ALL as an argument of the csv.reader() method.

 

The quoting argument has three predefined values that can be used:

  • csv.QUOTE_ALL - quotes are around all values in the CSV
  • csv.QUOTE_MINIMAL - quotes wrap values containing special characters
  • csv.QUOTE_NONNUMERIC - only non-numeric values are wrapped in quotes
  • csv.QUOTE_NONE - none of the values are wrapped in quotes

 

 

How to Create Dialects

Processing CSV's that are formatted differently will mean that a csv.reader() method with different arguments will have to be created to handle each one. That is not an ideal situation. Fortunately, the csv package has a method called register_dialect, which can be used to create custom argument objects that can be passed into csv.reader() using dialect='myDialect'.

 

Let's register a dialect to handle a | (pipe) delimiter, extra initial spaces and quotes.

 

import csv

csv.register_dialect('my_dialect',
                     skipinitialspace=True,
                     quoting=csv.QUOTE_ALL,
                     delimiter='|')

with open('sample1.csv', 'r') as file:
    reader = csv.reader(file, dialect='my_dialect')
    
    for r in reader:
        print(r)
['Month, "Average", "2005", "2006", "2007", "2008"']
['May,  0.1,  0,  0, 1, 1']
['Jun,  0.5,  2,  1, 1, 0']
['Jul,  0.7,  5,  1, 1, 2']
['Aug,  2.3,  6,  3, 2, 4']
['Sep,  3.5,  6,  4, 7, 4']
['Oct,  2.0,  8,  0, 1, 3']
['Nov,  0.5,  3,  0, 0, 1']
['Dec,  0.0,  1,  0, 1, 0']

 

Convert a CSV to a Python Dictionary

The csv package can also convert a CSV file to a Python dictionary with the csv.DictReader() method. Dictionaries are a key-value data type meaning the header (first row) of the CSV will be the keys and the rest of the values will be paired the correct key.

 

Here is a table representation of a CSV that we will open and parse as a dictionary

 

MonthAverage20052006
May0.100
Jun0.521
Jul0.751
Aug2.363

 

import csv

with open('sample1.csv', 'r') as file:
    reader = csv.DictReader(file, skipinitialspace=True)
    
    for l in reader:
        print(l)
{'Month': 'May', 'Average': '0.1', '2005': '0', '2006': '0', '': ''}
{'Month': 'Jun', 'Average': '0.5', '2005': '2', '2006': '1', '': ''}
{'Month': 'Jul', 'Average': '0.7', '2005': '5', '2006': '1', '': ''}
{'Month': 'Aug', 'Average': '2.3', '2005': '6', '2006': '3', '': ''}

 

As we can see in the output above, a collection of dictionaries was created with each one representing a row. In each dictionary, the key matches the column header.

 

Deduce CSV Dialect Automatically using the Sniffer Class

If you are working with CSV's that have different formats, you can use the csv.Sniffer() class to automatically deduce its format. It will look at the file and detect what it thinks are the delimiters and whether or not the CSV has a header row.

 

import csv

with open('sample1.csv', 'r') as file:
    sample = file.read(100)
    has_header = csv.Sniffer().has_header(sample)
    print(has_header)

    deduced = csv.Sniffer().sniff(sample)

with open('sample1.csv', 'r') as file:
    reader = csv.reader(file, deduced)

    for r in reader:
        print(r)

 

In the example above we are reading the first 100 characters of the CSV before supplying it to the Sniffer() class. The next step is checking if there is a header row using the Sniffer().has_header() method. Finally, the delimiters are determined using the Sniffer().sniff() method before opening and reading the file, supply the result from the sniffer as an argument of the csv.reader() method.

 

Detecting the format of a CSV is never going to be 100% accurate but the Sniffer() class is pretty accurate.

 

If Sniffer() can't determine what the delimiter should be an error will be thrown. With that in mind, it may be worth putting it in a try except block and either inform the user that the format of the CSV needs to be fixed, or attempt a manual conversion.

 

Conclusion

You now know how to read CSV files in Python and convert them into lists or dictionaries.

csv read file