Reading Values from Excel Sheets in Python and Writing to DataFrames
====================================================================
As a technical blogger, I’ve encountered numerous questions on Stack Overflow regarding data manipulation between Excel sheets and pandas DataFrames. In this article, we’ll delve into the world of reading values from Excel sheets using Python and writing those values to DataFrames.
Prerequisites
To follow along with this tutorial, you’ll need:
- Python 3.x installed on your system
- The
pandaslibrary for data manipulation - The
openpyxllibrary for reading Excel files - The
numpylibrary for numerical computations (optional)
You can install the required libraries using pip:
pip install pandas openpyxl numpy
Reading Values from an Excel Sheet
We’ll start by reading values from an Excel sheet using the openpyxl library. Openpyxl is a popular Python library for working with Excel files.
Creating an Excel File
First, let’s create an Excel file with the desired structure:
import openpyxl
# Create a new workbook
wb = openpyxl.Workbook()
# Select the first sheet
sheet = wb.active
# Write values to the sheet
sheet['A1'] = 'Date'
sheet['B1'] = 'Person A'
sheet['C1'] = 'Person B'
sheet['D1'] = 'Person C'
sheet['E1'] = 'Person D'
# Write data to the sheet
for i in range(2, 11):
date = f'{i-1:02d}/{31}/2017' # Example dates
person_a = f'A{i}'
person_b = f'B{i}'
person_c = f'C{i}'
person_d = f'D{i}'
sheet[f'{date}'] = [person_a, person_b, person_c, person_d]
This code creates an Excel file with five columns (Date, Person A, Person B, Person C, and Person D) and writes data to the sheet.
Reading Values from an Excel Sheet
To read values from an Excel sheet, you can use the read_excel function from pandas:
import pandas as pd
# Read the Excel file
df = pd.read_excel('example.xlsx', header=1)
# Print the DataFrame
print(df)
In this example, we’re reading the first row of the Excel file (the headers) and storing it in the header parameter. The rest of the rows are stored in the DataFrame.
Writing Values to a DataFrame
Now that we’ve read values from an Excel sheet, let’s write those values to a DataFrame.
Creating a DataFrame
First, let’s create an empty DataFrame:
import pandas as pd
# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])
This DataFrame has two columns: Person and Count. We’ll fill it with data later.
Writing Values to the DataFrame
To write values to the DataFrame, you can use the iloc method:
import pandas as pd
# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])
# Write values to the DataFrame
for i in range(2, 11):
person = f'A{i}'
count = i
row = pd.DataFrame([[person, count]], columns=['Person', 'Count'])
df = pd.concat([df, row], ignore_index=True)
In this example, we’re creating a new DataFrame for each row and concatenating it to the original DataFrame using pd.concat.
Merging DataFrames
Now that we’ve written values to a DataFrame, let’s merge it with another DataFrame.
Creating Another DataFrame
First, let’s create an empty DataFrame:
import pandas as pd
# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
This DataFrame has five columns: Date, Person A, Person B, Person C, and Person D.
Writing Values to the Second DataFrame
To write values to the second DataFrame, you can use a similar approach as before:
import pandas as pd
# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
# Write values to the second DataFrame
for i in range(1, 11):
date = f'{i-1:02d}/31/2017'
person_a = f'A{i}'
person_b = f'B{i}'
person_c = f'C{i}'
person_d = f'D{i}'
row = pd.DataFrame([[date, person_a, person_b, person_c, person_d]], columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
df2 = pd.concat([df2, row], ignore_index=True)
In this example, we’re creating a new DataFrame for each row and concatenating it to the original DataFrame using pd.concat.
Merging DataFrames
To merge the two DataFrames, you can use the merge function:
import pandas as pd
# Create a new DataFrame
df2 = pd.DataFrame(columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
# Write values to the second DataFrame
for i in range(1, 11):
date = f'{i-1:02d}/31/2017'
person_a = f'A{i}'
person_b = f'B{i}'
person_c = f'C{i}'
person_d = f'D{i}'
row = pd.DataFrame([[date, person_a, person_b, person_c, person_d]], columns=['Date', 'Person A', 'Person B', 'Person C', 'Person D'])
df2 = pd.concat([df2, row], ignore_index=True)
# Create a new DataFrame
df = pd.DataFrame(columns=['Person', 'Count'])
# Write values to the first DataFrame
for i in range(2, 11):
person = f'A{i}'
count = i
row = pd.DataFrame([[person, count]], columns=['Person', 'Count'])
df = pd.concat([df, row], ignore_index=True)
# Merge the two DataFrames
merged_df = pd.merge(df, df2, on='Date')
# Print the merged DataFrame
print(merged_df)
In this example, we’re merging the two DataFrames using the on parameter and printing the resulting DataFrame.
Conclusion
In this article, we’ve explored how to read values from an Excel sheet and write those values to a DataFrame. We’ve also discussed how to merge two DataFrames based on a common column.
By following these steps, you can easily manipulate data between Excel sheets and pandas DataFrames using Python. Remember to always use the pandas library for data manipulation and the openpyxl library for reading Excel files.
Last modified on 2023-09-26