Python Using Regex on Entire Excel File
Following up on my previous post that used VBA in Excel to pull out Regex matches from a spreadsheet, I wanted to demonstrate that Python can accomplish basically the same goal. Python needs a little help to work with Excel files in the form of openpyxl and pandas modules that will need to be installed for this to work.
Here are the steps I took to pull out all the numbers with 9 to 14 characters from an Excel file and put them in the clipboard:
from tkinter import Tk
from tkinter.filedialog import askopenfilename
def get_excel_path():
Tk().withdraw()
return askopenfilename(filetypes = [('Excel Files','*.csv;*.xlsx;*.xlsm;*.xltx;*.xltm')],
title = 'Select your Excel File and Click Open')
import os
import pandas as pd
from openpyxl import load_workbook
def upload_spreadsheet(path, active_sheet_only = True):
'''Returns pandas dataframe. Returns empty dataframe if fails'''
#check if file exists
if not os.path.isfile(path):
return pd.DataFrame()
#check file type
file_type = os.path.splitext(path)[1]
if file_type == '.csv':
df = pd.read_csv(path)
elif file_type in ['.xlsx','.xlsm','.xltx','.xltm']:
wb = load_workbook(path)
#convert sheets to pandas dataframe
if active_sheet_only:
df = pd.DataFrame(wb.active.values)
else:
#combine all sheets into one dataframe
frames = [pd.DataFrame(sheet.values) for sheet in wb.worksheets]
df = pd.concat(frames)
wb.close()
else:
return pd.Dataframe()
return df
def combine_dataframe(dataframe):
'''Converts pandas dataframe into one list'''
return [cell for cell in [dataframe[i] for i in dataframe]]
import re
def text_search(text, regex, dedupe=True):
re_matches = re.findall(regex, str(text))
if re_matches:
#remove all but first match if tuples
re_matches = [y[0] if type(y) == tuple else y for y in re_matches]
if dedupe:
return list(set(re_matches))
return list(re_matches)
return []
def copy_text(text):
subprocess.run(['clip.exe'], input = text.strip().encode('ascii', 'ignore'), check = True)
def main():
file_path = get_excel_path()
if file_path == '':return
df = upload_spreadsheet(file_path, active_sheet_only = False)
#convert to a list then join into string
text = ' '.join([str(x) for x in combine_dataframe(df)])
pattern = r'([0-9]{9,14})'
matches = text_search(text, pattern)
copy_text('\n'.join(matches))
print(len(matches), 'matches added to your clipboard.')
if __name__ == '__main__':
main()
The full Python code can be found on my github:
https://github.com/eddiethedean/Excel-Python
Here are the steps I took to pull out all the numbers with 9 to 14 characters from an Excel file and put them in the clipboard:
Easy File Explorer Popup
We need an easy way of telling Python where our spreadsheet is saved at. With tkinter we can have a windows file explorer popup so the user can browse to their excel file they want to pull data out of.from tkinter import Tk
from tkinter.filedialog import askopenfilename
def get_excel_path():
Tk().withdraw()
return askopenfilename(filetypes = [('Excel Files','*.csv;*.xlsx;*.xlsm;*.xltx;*.xltm')],
title = 'Select your Excel File and Click Open')
Convert Excel File Into Pandas Data-frame
The pandas module does double duty in allowing us to upload csv files and ensure our outputs are useful as a data-frame object. The openpyxl module gives us the ability to load Excel specific files and also outputs them as pandas data-frame objects. We just have to check the path given to know the file type in order to make sure we use the right function, pandas.read_csv or openpyxl.load_workbook.import os
import pandas as pd
from openpyxl import load_workbook
def upload_spreadsheet(path, active_sheet_only = True):
'''Returns pandas dataframe. Returns empty dataframe if fails'''
#check if file exists
if not os.path.isfile(path):
return pd.DataFrame()
#check file type
file_type = os.path.splitext(path)[1]
if file_type == '.csv':
df = pd.read_csv(path)
elif file_type in ['.xlsx','.xlsm','.xltx','.xltm']:
wb = load_workbook(path)
#convert sheets to pandas dataframe
if active_sheet_only:
df = pd.DataFrame(wb.active.values)
else:
#combine all sheets into one dataframe
frames = [pd.DataFrame(sheet.values) for sheet in wb.worksheets]
df = pd.concat(frames)
wb.close()
else:
return pd.Dataframe()
return df
Combine Our Data-frame Into String
Once we have our data-frame we can easily combine it into one string with a list comprehension so we can run our Regex against it just one time.def combine_dataframe(dataframe):
'''Converts pandas dataframe into one list'''
return [cell for cell in [dataframe[i] for i in dataframe]]
Pull Out Regex Matches from String
With re.findall, the only complication is when it returns a list of tuples sometimes, depending on the regex you use. We have to check the list items for type tuple and pull out just the matches (the first items in the tuples) in order to get a usable list of matches to return.import re
def text_search(text, regex, dedupe=True):
re_matches = re.findall(regex, str(text))
if re_matches:
#remove all but first match if tuples
re_matches = [y[0] if type(y) == tuple else y for y in re_matches]
if dedupe:
return list(set(re_matches))
return list(re_matches)
return []
Copy String to Clipboard
import subprocessdef copy_text(text):
subprocess.run(['clip.exe'], input = text.strip().encode('ascii', 'ignore'), check = True)
Put It All Together
Now we have all the functions we need to upload our Excel spreadsheet and pull out all the numbers with 9 to 14 characters.def main():
file_path = get_excel_path()
if file_path == '':return
df = upload_spreadsheet(file_path, active_sheet_only = False)
#convert to a list then join into string
text = ' '.join([str(x) for x in combine_dataframe(df)])
pattern = r'([0-9]{9,14})'
matches = text_search(text, pattern)
copy_text('\n'.join(matches))
print(len(matches), 'matches added to your clipboard.')
if __name__ == '__main__':
main()
The full Python code can be found on my github:
https://github.com/eddiethedean/Excel-Python
Excellent post. I learned a lot from this blog and I suggest my friends to visit your blog to learn new concept about technology.
ReplyDeleteccna course in Chennai
Python Training in Chennai
R Programming Training in Chennai
AWS Training in Chennai
DevOps Training in Chennai
Angularjs Training in Chennai
RPA Training in Chennai
Blue Prism Training in Chennai
Data Science Course in Chennai
Regular expressions (regex) aren't ideal for processing entire Excel files. They're better suited for manipulating specific text within the file. However, Python offers libraries for working with Excel data effectively. Here's how to approach this:
DeleteLibraries:
python projects for engineering students
openpyxl: This library allows you to read and write data in Excel files (xlsx format).
pandas: This library provides powerful data manipulation tools for analyzing and processing data from Excel files.
Approach:
Read the Excel file: Use openpyxl.load_workbook() to open the file.
Iterate through sheets (optional):
If you need to process data from all sheets, use a loop to iterate through each sheet in the workbook (workbook.sheetnames).
Process data:
Big Data Projects For Final Year Students
Use pandas' pd.read_excel() function to read the data into a pandas DataFrame. This allows you to use various methods like filtering, grouping, and applying regex operations on specific columns.
Machine Learning Projects for Final Year
Thank you for providing the creativity post with comprehensive content. This is very helpful for gain my skill knowledge, Kindly updating...
ReplyDeleteLinux Training in Chennai
Learn Linux
Spark Training in Chennai
Oracle Training in Chennai
Unix Training in Chennai
Oracle DBA Training in Chennai
Tableau Training in Chennai
Power BI Training in Chennai
Embedded System Course Chennai
Great job for publishing such a nice article. Your article isn’t only useful but it is additionally really informative. Read more info about Hni clients database. Thank you because you have been willing to share information with us.
ReplyDeleteThe best casino games for a mobile device - Aprcasino
ReplyDeleteBest online 카지노 사이트 유니88 casino games for a mobile device – Discover a list of online casino games, play slots, and more and try them for free!
I really appreciate your efforts for writing this blog.It was reaaly interesting and helpful.
ReplyDeletePython Classes in Nagpur