Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abeertechcamus/documentdata
The dataset was cleaned and queried using Python inside Jupyter Notebook and visualizes using PowerBI Document Data Analysis Projects
https://github.com/abeertechcamus/documentdata
dax jupyter-notebook numpy pandas powerbi python
Last synced: 17 days ago
JSON representation
The dataset was cleaned and queried using Python inside Jupyter Notebook and visualizes using PowerBI Document Data Analysis Projects
- Host: GitHub
- URL: https://github.com/abeertechcamus/documentdata
- Owner: Abeertechcamus
- Created: 2024-10-20T10:56:55.000Z (28 days ago)
- Default Branch: main
- Last Pushed: 2024-10-29T18:40:28.000Z (19 days ago)
- Last Synced: 2024-10-29T18:45:59.495Z (19 days ago)
- Topics: dax, jupyter-notebook, numpy, pandas, powerbi, python
- Homepage:
- Size: 1.34 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DocumentData
This dashboard was built using this dataset [Ordersdataset](Orders.csv).
**Data:**
The file appears to contain 20,008 rows and 19 columns**Data Cleaning :**
Python (pandas)**Data Visualization**
PowerBI## overview
Here’s an overview of the data structure:
- Row 3 (index 2) has the actual header labels.
- Columns contain various details such as order date, country, city, product category, quantity, unit price, discount, and status.
- Issues include extra headers, missing values, and a lack of consistent column names.# clean the data
I’ll clean the data by setting the correct headers, removing empty rows, and renaming columns for clarity.It includes headers spread across multiple rows, and many columns are labeled "Unnamed.
### correct headers
```import pandas as pd
df=pd.read_csv(r'Orders.csv', skiprows=4)
df
```### Drop any completely empty rows
```
df.dropna(how='all', inplace=True)```
### Display City names to capital titile where applicable
```
df['City']=df['City'].str.title()
```### Remove "Tel:" from phone numbers and strip extra spaces
```
df['Phone Number']=df['Phone Number'].str.replace('Tel:','')
```
### Display a summary of the cleaned data
```
df.head(), df.info()
```To view the dashboard enter this link [employee dashboard](employee_dashboard.pdf).