Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/isabelsw/cfgdataanalysis

This was an assignment for the CFG course "Introduction to Python & Apps". It entailed importing and working with data from a pre-existing Excel file and then exporting all the results to a new Excel workbook. See README for details.
https://github.com/isabelsw/cfgdataanalysis

excel matplotlib openpyxl pandas pycharm-ide python3 seaport

Last synced: 8 days ago
JSON representation

Host: GitHub
URL: https://github.com/isabelsw/cfgdataanalysis
Owner: isabelsw
License: mit
Created: 2024-05-21T17:26:56.000Z (9 months ago)
Default Branch: master
Last Pushed: 2024-05-22T10:30:56.000Z (9 months ago)
Last Synced: 2024-05-22T19:48:19.885Z (9 months ago)
Topics: excel, matplotlib, openpyxl, pandas, pycharm-ide, python3, seaport
Language: Python
Homepage:
Size: 46.9 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

BASIC DATA ANALYSIS

Go Straight to the Docs »

Report a bug
·
Request a feature

Table of Contents

About The Project

Built With

Features

Setup

Roadmap

Contributing

Licence

Contact

Acknowledgements

## About The Project

This is a solution to the assignment "Spreadsheet Analysis" as outlined in the course "Introduction to Python & Apps" by Code First Girls Ltd. It entails basic analysis of fictional sales data using both Excel and Python. See Features for an overview of the methods and libraries/modules used.

(back to top)

## Built With

| Python | PyCharm | pandas | Matplotlib | Seaborn | Microsoft Excel 365 |
|----------|----------|----------|----------|----------|----------|
| Python | pycharm | Pandas | mpl | | |

## Features

* Use different libraries/modules for simple analysis of data from an Excel workbook:
- pandas
- openpyxl
- seaborn
- matplotlib
* Find the total sum of values and the minimum and maximum values
* Use pct_change for working out periodic changes as a percentage
* Create a simple lineplot for showing periodic changes as a percentage
* Import an Excel file and then write results to a new Excel file
* Assign data to different spreadsheets upon writing to the new Excel file

(back to top)

## Setup

### Prerequisites

* Install Python and/or Install PyCharm by Jet Brains (you can opt for the free and open-source **community edition**). Make sure you are downloading a Python 3 version.
* Get a subscription for Microsoft 365 and download the latest verstion of MS Excel.
* See Pycharm Guide for creating a new Py project.
* Download the file sales.csv via the repository's main page.

* PIP package

1) See if PIP is already installed on your system. Write in either Windows OS shell or macOS terminal:

### Install
```sh
pip help
```

```sh
pip3 help
```

```sh
python3 -m pip help
```

3) If it is not installed, write in shell or terminal:

```sh
python3 get-pip.py
```

N.B. Make sure you have added pip.exe to the Environment Variables (Path).

* pandas library; openpyxl module; seaborn library; matplotlib

There are 2 ways, EITHER:

- Write in PyCharm IDE install [library name]. E.g.,

install pandas

- OR Write in shell or terminal pip install [library name]. For macOS terminal, if that instruction does not work, you can write pip3 install [library name]. E.g.,

```sh
pip install pandas
```

(back to top)

## Roadmap

If you want to have a go yourself, I have provided the **guide** below. You will create **3 separate Py files** that reflect each stage of the process.
- N.B. From this point on, **Python** will be abbreviated to **Py**.

PY FILE 1

* Create a new **.py** file in your project folder in PyCharm (if you have not already done so). Name the file.
* Import **openpyxl and math**.
- N.B. It is good practice to import your libraries/modules at the top; however, I have imported pandas as pd later in the code so that it is shown within its immediate context.
* Import **sales.csv** file (sales.csv); otherwise, create your own spreadsheet with similar quantitative data.

![PyCharm_table_sales_csv](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/a3423aa4-092d-4f71-83fd-64f12b2c9c52)

* **Create a function** named read_data using **def** to retrieve the sales and months from the spreadsheet.
* Use the **def method** to create a function that makes a Py dictionary from the data for monthly sales (from sales.csv).
* Collect **all the sales** from each month into a single list using the **list() function**.
* Get the **sales total** (across all months) using the **sum() function**.
* Find the **largest and smallest values** in the sales data using the **min() and max() functions**.
* Find the **months with the lowest/highest number** of sales by using the **dictionary get() method** and writing the **keys as max_value and min_value**.
* Find the **average** value of sales using the **lens() function** (returns the number of values for sales). Then continue the formula by writing **total / number_of_sales** (total divided by the number of values returned by the lens function).
* Using **math.ceil() method** (from the math module), **round the number** of average sales that was calculated in the previous step. You do not have to do this, but it ensures better readability of the table that we will be creating later.
* **Import pandas as pd** to read sales.csv and create a dataframe. Use **pct_change() method** to retrieve sales data and calculate the monthly change in sales from one month to the next as a percentage. **Multiply pct_change by 100** (* 100) to complete the formula. NaN will appear in the table for Jan if you do not use the **fillna() function**; this is because there is no value for the previous month (in the sales data) with which to calculate the change in sales for Jan. Fill none with the value 0 to make it clear that there is no monthly-change data for this month.
* Create a **pandas dataframe** for the output. Then use **ExcelWriter** and **df.to_excel** to write the results (**df**) to a new workbook and spreadsheet. Write the parameter **index=False** to remove the surplus column.
* This step is **optional**: you can load the existing workbook and **change its formatting** so that the size of the cells fit the values better. I have used **from openpyxl import load_workbook** to load the workbook and **from openpyxl.utils import get_column_letter** for this task.
* **Save and close** the workbook.
* **Print** the results with the **string format() method** so that they appear in the console (you will need these values for writing to the new Excel sheet; see below). **/n** in the print function adds a line break. Then write **run()** to complete def run().

![PyCharm_Console_Output_basicdataanalysis](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/c9bf5604-2665-41ea-b7d2-1185cf8ecf72)

PY FILE 2

* Import **pandas as pd**.
* Read the Excel file (created via the previous steps) into a **pandas dataframe** object with the **pd.read_excel function**.
* With your results from PY FILE 1, add the new data as a **dictionary of lists** and create a **dataframe** for it with the **pd.DataFrame function**.
* Write the **dataframe** to a new Excel spreadsheet using **pd.Excelwriter**. Name your sheet in the parameters and write **index=false** to remove the generated extra column.

PY FILE 3

* Import: **seaborn as sns**; **pandas as pd**; **matplotlib as plt**.
* Specify columns from which to retrieve the data. Read this data into a pandas dataframe via the **pd.read_excel function**.
* **Print** this data.

![PyCharm_Console_Output_graph](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/c1df8b1f-f280-4173-9684-e783943ec1cd)

![Figure_1_cfgspreadsheetanalysis](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/58ed36fb-f665-466a-b0ef-9b71ad74e4b0)

* Create a simple lineplot with the **sns.lineplot() function**.
* Show this lineplot by writing **plt.show()**. You can then save this lineplot to a local folder.

See the [open issues](https://github.com/isabelsw/cfgdataanalysis/issues) for a full list of proposed features and known issues.

(back to top)

## Contributing

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks!

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

(back to top)

## Licence

Distributed under the MIT Licence. See "LICENSE.txt" for more information.

(back to top)

## Contact

My profile - isabelsw

Project link - [https://github.com/isabelsw/cfgdataanalysis](https://github.com/isabelsw/cfgdataanalysis)

(back to top)

## Acknowledgements

* The project brief was created by Code First Girls Ltd. A special thanks goes to my instructors Vanny and Andrew on the "Introduction to Python & Apps" course.
* README template was created by Othneil Drew.
* GIF courtesy of GIPHY.

(back to top)

## Copyright Notices

* PyCharm logo: Copyright © 2000-2024 JetBrains s.r.o. JetBrains and the JetBrains logo are registered trademarks of JetBrains s.r.o.
* Python logo: Copyright © 2001-2024 Python Software Foundation.
* pandas logo: Copyright © 2008 AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team.
* Seaport logo: Copyright © Matthias Bussonnier and Seaport.
* matplotlib logo: Copyright © 2002–2012 John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team; 2012–2024 The Matplotlib development team.

(back to top)