https://github.com/dipto9999/excel_spreadsheet_organizer

Using Python open-source libraries, organize and modify information from excel and xml files for better readability.
https://github.com/dipto9999/excel_spreadsheet_organizer

excel jupyter-notebook openpyxl pandas python xml

Last synced: about 2 months ago
JSON representation

Using Python open-source libraries, organize and modify information from excel and xml files for better readability.

Host: GitHub
URL: https://github.com/dipto9999/excel_spreadsheet_organizer
Owner: Dipto9999
Created: 2021-08-21T20:46:21.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2021-08-21T20:46:24.000Z (almost 5 years ago)
Last Synced: 2025-06-19T10:42:02.528Z (about 1 year ago)
Topics: excel, jupyter-notebook, openpyxl, pandas, python, xml
Language: Jupyter Notebook
Homepage:
Size: 25.4 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Excel Spreadsheet Organizer

## Contents

* [Overview](#Overview)

    * [Solution Details](#Solution-Details)

## Overview

This was a short task completed for a confidential corporate organizer where information from an Excel Spreadsheet was

modified and combined with information extracted from an XML file to create a table in another Excel Spreadsheet. This was done through the use of open-source libraries in Python and run in Jupyter Notebook. This allowed me to further explore the Pandas Library and become more familiar working with DataFrames.

Note : All sensitive information has been modified and replaced.

### Solution Details

A few custom functions were built to shorten strings with a common prefixes, as shown here :

```python

# This function shortens an ID from the Excel file.

def shorten_id(original_column) :

    length_to_cut = len('Delete Account ')

    length_total = len(original_column)

    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column

```

```python

# This function shortens the role information from the XML file.

def shorten_role(original_column) :

    length_to_cut = len('Role=')

    length_total = len(original_column)

    truncated_column = original_column[length_to_cut: length_total]

    return truncated_column

```

The XML file was iterated through using ElementTree Library built-in functions, as shown here :

```python

# Iterate through XML file to filter and organize relevant information.

for account in root.iter('account') :

    id = account.get('id', default = None)

    for i in range(id_series.size) :

        # If ID is found in the ID series, execute code.

        if (id == id_series[i]) :

            # Add this unordered ID to the list.

            xml_ids.append(id)

            # Assign role column to have the role information.

            for attribute in account.iter('attribute') :

                if attribute.get('name') == 'Role' :

                    attributeValueRef_id = str()

                    for attributeValueRef in attribute.iter('attributeValueRef') :

                        # Remove the 'Role=' using the custom function before acquiring the information.

                        attributeValueRef_id += shorten_role(attributeValueRef.get('id')) + ' \n'

                    # Append each string to the list if the ID is found in the ID series.

                    role_column.append(attributeValueRef_id)

```

The ID and role columns were matched by iterating through the respective DataFrame and List.

```python

# Organize role column to match the ordered ID series.

for organized_index in range(id_series.size) :

    # Add an empty string to end of list to account for IDs with blank roles.

    organized_role_column.append('')

    for unorganized_index in range(len(xml_ids)) :

        if (id_series[organized_index] == xml_ids[unorganized_index]) :

            # Replace empty string for IDs with roles in the XML file.

            organized_role_column.insert(organized_index, (role_column[unorganized_index]))

```

There is also a Python Script written with additional comments to further understand the procedure of developing this organizer.

Note : This Exploration Took a Weekend to Complete, Spanning Approximately 10 Hours Altogether.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dipto9999/excel_spreadsheet_organizer

Awesome Lists containing this project

README