Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/isabelsw/cfgdataanalysis
This was an assignment for the CFG course "Introduction to Python & Apps". It entailed importing and working with data from a pre-existing Excel file and then exporting all the results to a new Excel workbook. See README for details.
https://github.com/isabelsw/cfgdataanalysis
excel matplotlib openpyxl pandas pycharm-ide python3 seaport
Last synced: 4 days ago
JSON representation
This was an assignment for the CFG course "Introduction to Python & Apps". It entailed importing and working with data from a pre-existing Excel file and then exporting all the results to a new Excel workbook. See README for details.
- Host: GitHub
- URL: https://github.com/isabelsw/cfgdataanalysis
- Owner: isabelsw
- License: mit
- Created: 2024-05-21T17:26:56.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-05-22T10:30:56.000Z (7 months ago)
- Last Synced: 2024-05-22T19:48:19.885Z (7 months ago)
- Topics: excel, matplotlib, openpyxl, pandas, pycharm-ide, python3, seaport
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Table of Contents
- About The Project
- Built With
- Features
- Setup
- Roadmap
- Contributing
- Licence
- Contact
- Acknowledgements
- Copyright Notices
This is a solution to the assignment "Spreadsheet Analysis" as outlined in the course "Introduction to Python & Apps" by Code First Girls Ltd. It entails basic analysis of fictional sales data using both Excel and Python. See Features for an overview of the methods and libraries/modules used.
| Python | PyCharm | pandas | Matplotlib | Seaborn | Microsoft Excel 365 |
|----------|----------|----------|----------|----------|----------|
| | | | | | |
## Features
* Use different libraries/modules for simple analysis of data from an Excel workbook:
- pandas
- openpyxl
- seaborn
- matplotlib
* Find the total sum of values and the minimum and maximum values
* Use pct_change for working out periodic changes as a percentage
* Create a simple lineplot for showing periodic changes as a percentage
* Import an Excel file and then write results to a new Excel file
* Assign data to different spreadsheets upon writing to the new Excel file
### Prerequisites
* Install Python and/or Install PyCharm by Jet Brains (you can opt for the free and open-source **community edition**). Make sure you are downloading a Python 3 version.
* Get a subscription for Microsoft 365 and download the latest verstion of MS Excel.
* See Pycharm Guide for creating a new Py project.
* Download the file sales.csv via the repository's main page.* PIP package
1) See if PIP is already installed on your system. Write in either Windows OS shell or macOS terminal:
### Install
```sh
pip help
```
```sh
pip3 help
```
```sh
python3 -m pip help
```
3) If it is not installed, write in shell or terminal:
```sh
python3 get-pip.py
```N.B. Make sure you have added pip.exe to the Environment Variables (Path).
* pandas library; openpyxl module; seaborn library; matplotlib
There are 2 ways, EITHER:
- Write in PyCharm IDE install [library name]. E.g.,install pandas
- OR Write in shell or terminal pip install [library name]. For macOS terminal, if that instruction does not work, you can write pip3 install [library name]. E.g.,```sh
pip install pandas
```
If you want to have a go yourself, I have provided the **guide** below. You will create **3 separate Py files** that reflect each stage of the process.
- N.B. From this point on, **Python** will be abbreviated to **Py**.PY FILE 1
* Create a new **.py** file in your project folder in PyCharm (if you have not already done so). Name the file.
* Import **openpyxl and math**.
- N.B. It is good practice to import your libraries/modules at the top; however, I have imported pandas as pd later in the code so that it is shown within its immediate context.
* Import **sales.csv** file (sales.csv); otherwise, create your own spreadsheet with similar quantitative data.
![PyCharm_table_sales_csv](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/a3423aa4-092d-4f71-83fd-64f12b2c9c52)
* **Create a function** named read_data using **def** to retrieve the sales and months from the spreadsheet.
* Use the **def method** to create a function that makes a Py dictionary from the data for monthly sales (from sales.csv).
* Collect **all the sales** from each month into a single list using the **list() function**.
* Get the **sales total** (across all months) using the **sum() function**.
* Find the **largest and smallest values** in the sales data using the **min() and max() functions**.
* Find the **months with the lowest/highest number** of sales by using the **dictionary get() method** and writing the **keys as max_value and min_value**.
* Find the **average** value of sales using the **lens() function** (returns the number of values for sales). Then continue the formula by writing **total / number_of_sales** (total divided by the number of values returned by the lens function).
* Using **math.ceil() method** (from the math module), **round the number** of average sales that was calculated in the previous step. You do not have to do this, but it ensures better readability of the table that we will be creating later.
* **Import pandas as pd** to read sales.csv and create a dataframe. Use **pct_change() method** to retrieve sales data and calculate the monthly change in sales from one month to the next as a percentage. **Multiply pct_change by 100** (* 100) to complete the formula. NaN will appear in the table for Jan if you do not use the **fillna() function**; this is because there is no value for the previous month (in the sales data) with which to calculate the change in sales for Jan. Fill none with the value 0 to make it clear that there is no monthly-change data for this month.
* Create a **pandas dataframe** for the output. Then use **ExcelWriter** and **df.to_excel** to write the results (**df**) to a new workbook and spreadsheet. Write the parameter **index=False** to remove the surplus column.
* This step is **optional**: you can load the existing workbook and **change its formatting** so that the size of the cells fit the values better. I have used **from openpyxl import load_workbook** to load the workbook and **from openpyxl.utils import get_column_letter** for this task.
* **Save and close** the workbook.
* **Print** the results with the **string format() method** so that they appear in the console (you will need these values for writing to the new Excel sheet; see below). **/n** in the print function adds a line break. Then write **run()** to complete def run().
![PyCharm_Console_Output_basicdataanalysis](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/c9bf5604-2665-41ea-b7d2-1185cf8ecf72)
PY FILE 2
* Import **pandas as pd**.
* Read the Excel file (created via the previous steps) into a **pandas dataframe** object with the **pd.read_excel function**.
* With your results from PY FILE 1, add the new data as a **dictionary of lists** and create a **dataframe** for it with the **pd.DataFrame function**.
* Write the **dataframe** to a new Excel spreadsheet using **pd.Excelwriter**. Name your sheet in the parameters and write **index=false** to remove the generated extra column.PY FILE 3
* Import: **seaborn as sns**; **pandas as pd**; **matplotlib as plt**.
* Specify columns from which to retrieve the data. Read this data into a pandas dataframe via the **pd.read_excel function**.
* **Print** this data.
![PyCharm_Console_Output_graph](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/c1df8b1f-f280-4173-9684-e783943ec1cd)
![Figure_1_cfgspreadsheetanalysis](https://github.com/isabelsw/cfgdataanalysis/assets/170036120/58ed36fb-f665-466a-b0ef-9b71ad74e4b0)
* Create a simple lineplot with the **sns.lineplot() function**.
* Show this lineplot by writing **plt.show()**. You can then save this lineplot to a local folder.See the [open issues](https://github.com/isabelsw/cfgdataanalysis/issues) for a full list of proposed features and known issues.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks!1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
Distributed under the MIT Licence. See "LICENSE.txt" for more information.
My profile - isabelsw
Project link - [https://github.com/isabelsw/cfgdataanalysis](https://github.com/isabelsw/cfgdataanalysis)
* The project brief was created by Code First Girls Ltd. A special thanks goes to my instructors Vanny and Andrew on the "Introduction to Python & Apps" course.
* README template was created by Othneil Drew.
* GIF courtesy of GIPHY.
* PyCharm logo: Copyright © 2000-2024 JetBrains s.r.o. JetBrains and the JetBrains logo are registered trademarks of JetBrains s.r.o.
* Python logo: Copyright © 2001-2024 Python Software Foundation.
* pandas logo: Copyright © 2008 AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team.
* Seaport logo: Copyright © Matthias Bussonnier and Seaport.
* matplotlib logo: Copyright © 2002–2012 John Hunter, Darren Dale, Eric Firing, Michael Droettboom and the Matplotlib development team; 2012–2024 The Matplotlib development team.