https://github.com/dipto9999/excel_spreadsheet_organizer
Using Python open-source libraries, organize and modify information from excel and xml files for better readability.
https://github.com/dipto9999/excel_spreadsheet_organizer
excel jupyter-notebook openpyxl pandas python xml
Last synced: about 2 months ago
JSON representation
Using Python open-source libraries, organize and modify information from excel and xml files for better readability.
- Host: GitHub
- URL: https://github.com/dipto9999/excel_spreadsheet_organizer
- Owner: Dipto9999
- Created: 2021-08-21T20:46:21.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2021-08-21T20:46:24.000Z (almost 5 years ago)
- Last Synced: 2025-06-19T10:42:02.528Z (about 1 year ago)
- Topics: excel, jupyter-notebook, openpyxl, pandas, python, xml
- Language: Jupyter Notebook
- Homepage:
- Size: 25.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Excel Spreadsheet Organizer
## Contents
* [Overview](#Overview)
* [Solution Details](#Solution-Details)
## Overview
This was a short task completed for a confidential corporate organizer where information from an Excel Spreadsheet was
modified and combined with information extracted from an XML file to create a table in another Excel Spreadsheet. This was done through the use of open-source libraries in Python and run in Jupyter Notebook. This allowed me to further explore the Pandas Library and become more familiar working with DataFrames.
Note : All sensitive information has been modified and replaced.
### Solution Details
A few custom functions were built to shorten strings with a common prefixes, as shown here :
```python
# This function shortens an ID from the Excel file.
def shorten_id(original_column) :
length_to_cut = len('Delete Account ')
length_total = len(original_column)
truncated_column = original_column[length_to_cut: length_total]
return truncated_column
```
```python
# This function shortens the role information from the XML file.
def shorten_role(original_column) :
length_to_cut = len('Role=')
length_total = len(original_column)
truncated_column = original_column[length_to_cut: length_total]
return truncated_column
```
The XML file was iterated through using ElementTree Library built-in functions, as shown here :
```python
# Iterate through XML file to filter and organize relevant information.
for account in root.iter('account') :
id = account.get('id', default = None)
for i in range(id_series.size) :
# If ID is found in the ID series, execute code.
if (id == id_series[i]) :
# Add this unordered ID to the list.
xml_ids.append(id)
# Assign role column to have the role information.
for attribute in account.iter('attribute') :
if attribute.get('name') == 'Role' :
attributeValueRef_id = str()
for attributeValueRef in attribute.iter('attributeValueRef') :
# Remove the 'Role=' using the custom function before acquiring the information.
attributeValueRef_id += shorten_role(attributeValueRef.get('id')) + ' \n'
# Append each string to the list if the ID is found in the ID series.
role_column.append(attributeValueRef_id)
```
The ID and role columns were matched by iterating through the respective DataFrame and List.
```python
# Organize role column to match the ordered ID series.
for organized_index in range(id_series.size) :
# Add an empty string to end of list to account for IDs with blank roles.
organized_role_column.append('')
for unorganized_index in range(len(xml_ids)) :
if (id_series[organized_index] == xml_ids[unorganized_index]) :
# Replace empty string for IDs with roles in the XML file.
organized_role_column.insert(organized_index, (role_column[unorganized_index]))
```
There is also a Python Script written with additional comments to further understand the procedure of developing this organizer.
Note : This Exploration Took a Weekend to Complete, Spanning Approximately 10 Hours Altogether.