Basics of Openpyxl and Ingestion of data into SQLite database

Priya Srinivasan

In this blogpost, we will be discussing the basics of the python library openpyxl, and about how to ingest an excel spreadsheet data into database such as SQLite.

Openpyxl is a Python library for Excel spreadsheet management. It enables the reading and writing of Excel files and provides a variety of tools for manipulating spreadsheet data. This post discusses the fundamentals of openpyxl, including how to install it, open and read Excel files, and write data to Excel files.

Installing openpyxl

One can install openpyxl using pip, the Python package manager by running the following command in the terminal:

pip install openpyxl

This will download and install the latest version of openpyxl.

Loading an excel workbook

In order to load an excel workbook with openpyxl, the workbook must be imported firstly, using the load_workbook function.

from openpyxl import load_workbook

workbook = load_workbook(filename='file.xlsx')

Once the workbook is loaded, the sheets also can be loaded and values can be read from or written into it.

Loading a worksheet

sheet = workbook.worksheets[0]

The first worksheet has been read into sheet. Values from the worksheet can be also read.

Reading values from an excelsheet

cell_value = sheet['A1'].value

The variable cell_value contains the value present in the first row and first column of the sheet.

Writing values to an excelsheet

from openpyxl import Workbook

workbook = Workbook()
sheet = workbook.active

To write data to an excel sheet, first a new workbook is created and a sheet is activated.

sheet['A1'] = 'Hello World!'

In the above example, the phrase ‘Hello world!’ is written to the first cell of the worksheet.

Once the data is written, the workbook can be saved as follows:

workbook.save(filename='file2.xlsx')

Iterating through rows and columns

To iterate through rows and columns in an Excel file using openpyxl, one can use the iter_rows() and iter_cols() methods on a worksheet object.

from openpyxl import load_workbook

workbook = load_workbook(filename='file.xlsx')
sheet = workbook.active

for row in sheet.iter_rows(min_row=1, max_row=3, min_col=1, max_col=3):
    for cell in row:
        print(cell.value)

In the above example, we’re iterating through the first three rows and three columns of the worksheet. The iter_rows() method returns a generator that produces tuples of cells for each row, and the iter_cols() method does the same for columns.

Ingestion of excel sheet data into SQLite Database

Python’s sqlite3 module and openpyxl package can be used to import data from an Excel file into a SQLite database.

import sqlite3
from openpyxl import load_workbook

# Load the Excel file
workbook = load_workbook(filename='file.xlsx')
sheet = workbook.active

# Connect to the SQLite database
conn = sqlite3.connect('example.db')
cursor = conn.cursor()

# Create a table for the data
cursor.execute('CREATE TABLE data (col1 TEXT, col2 TEXT, col3 TEXT)')

# Insert the data into the table
for row in sheet.iter_rows(min_row=2):
    col1 = row[0].value
    col2 = row[1].value
    col3 = row[2].value
    cursor.execute('INSERT INTO data VALUES (?, ?, ?)', (col1, col2, col3))

# Commit the changes and close the database connection
conn.commit()
conn.close()

In this example, we first load the file.xlsx Excel file and then access its active worksheet. Then, we’ll create a SQLite database named example.db and a table named data to contain the data.

Using parameterized SQL queries, we next iterate through the rows of the worksheet, extract the values for each column, and insert them into the data database. Eventually, the database modifications are committed and the connection is closed.