Reading Values from Excel Sheets in Python and Writing to DataFrames: A Step-by-Step Guide

Reading Values from Excel Sheets in Python and Writing to DataFrames

====================================================================

As a technical blogger, I’ve encountered numerous questions on Stack Overflow regarding data manipulation between Excel sheets and pandas DataFrames. In this article, we’ll delve into the world of reading values from Excel sheets using Python and writing those values to DataFrames.

Prerequisites


To follow along with this tutorial, you’ll need:

  • Python 3.x installed on your system
  • The pandas library for data manipulation
  • The openpyxl library for reading Excel files
  • The numpy library for numerical computations (optional)

You can install the required libraries using pip:

pip install pandas openpyxl numpy

Reading Values from an Excel Sheet


We’ll start by reading values from an Excel sheet using the openpyxl library. Openpyxl is a popular Python library for working with Excel files.

Creating an Excel File

First, let’s create an Excel file with the desired structure:

import openpyxl

# Create a new workbook
wb = openpyxl.Workbook()

# Select the first sheet
sheet = wb.active

# Write values to the sheet
sheet['A1'] = 'Date'
sheet['B1'] = 'Person A'
sheet['C1'] = 'Person B'
sheet['D1'] = 'Person C'
sheet['E1'] = 'Person D'

# Write data to the sheet
for i in range(2, 11):
    date = f'{i-1:02d}/{31}/2017'  # Example dates
    person_a = f'A{i}'
    person_b = f'B{i}'
    person_c = f'C{i}'
    person_d = f'D{i}'
    
    sheet[f'{date}'] = [person_a, person_b, person_c, person_d]

This code creates an Excel file with five columns (Date, Person A, Person B, Person C, and Person D) and writes data to the sheet.

Reading Values from an Excel Sheet

To read values from an Excel sheet, you can use the read_excel function from pandas:

import pandas as pd

# Read the Excel file
df = pd.read_excel('example.xlsx', header=1)

# Print the DataFrame
print(df)

In this example, we’re reading the first row of the Excel file (the headers) and storing it in the header parameter. The rest of the rows are stored in the DataFrame.

Writing Values to a DataFrame


Now that we’ve read values from an Excel sheet, let’s write those values to a DataFrame.

Creating a DataFrame

First, let’s create an empty DataFrame:

import pandas as pd

# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])

This DataFrame has two columns: Person and Count. We’ll fill it with data later.

Writing Values to the DataFrame

To write values to the DataFrame, you can use the iloc method:

import pandas as pd

# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])

# Write values to the DataFrame
for i in range(2, 11):
    person = f'A{i}'
    count = i
    
    row = pd.DataFrame([[person, count]], columns=['Person', 'Count'])
    
    df = pd.concat([df, row], ignore_index=True)

In this example, we’re creating a new DataFrame for each row and concatenating it to the original DataFrame using pd.concat.

Merging DataFrames


Now that we’ve written values to a DataFrame, let’s merge it with another DataFrame.

Creating Another DataFrame

First, let’s create an empty DataFrame:

import pandas as pd

# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])

This DataFrame has five columns: Date, Person A, Person B, Person C, and Person D.

Writing Values to the Second DataFrame

To write values to the second DataFrame, you can use a similar approach as before:

import pandas as pd

# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])

# Write values to the second DataFrame
for i in range(1, 11):
    date = f'{i-1:02d}/31/2017'  
    person_a = f'A{i}'
    person_b = f'B{i}'
    person_c = f'C{i}'
    person_d = f'D{i}'
    
    row = pd.DataFrame([[date, person_a, person_b, person_c, person_d]], columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
    
    df2 = pd.concat([df2, row], ignore_index=True)

In this example, we’re creating a new DataFrame for each row and concatenating it to the original DataFrame using pd.concat.

Merging DataFrames

To merge the two DataFrames, you can use the merge function:

import pandas as pd

# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])

# Write values to the second DataFrame
for i in range(1, 11):
    date = f'{i-1:02d}/31/2017'  
    person_a = f'A{i}'
    person_b = f'B{i}'
    person_c = f'C{i}'
    person_d = f'D{i}'
    
    row = pd.DataFrame([[date, person_a, person_b, person_c, person_d]], columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
    
    df2 = pd.concat([df2, row], ignore_index=True)

# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])

# Write values to the first DataFrame
for i in range(2, 11):
    person = f'A{i}'
    count = i
    
    row = pd.DataFrame([[person, count]], columns=['Person', 'Count'])
    
    df = pd.concat([df, row], ignore_index=True)

# Merge the two DataFrames
merged_df = pd.merge(df, df2, on='Date')

# Print the merged DataFrame
print(merged_df)

In this example, we’re merging the two DataFrames using the on parameter and printing the resulting DataFrame.

Conclusion


In this article, we’ve explored how to read values from an Excel sheet and write those values to a DataFrame. We’ve also discussed how to merge two DataFrames based on a common column.

By following these steps, you can easily manipulate data between Excel sheets and pandas DataFrames using Python. Remember to always use the pandas library for data manipulation and the openpyxl library for reading Excel files.


Last modified on 2023-09-26