https://github.com/ananyachibber21/excel-for-data-science-and-analysis
Excel For Data Science and Analysis
https://github.com/ananyachibber21/excel-for-data-science-and-analysis
Last synced: 9 months ago
JSON representation
Excel For Data Science and Analysis
- Host: GitHub
- URL: https://github.com/ananyachibber21/excel-for-data-science-and-analysis
- Owner: ananyachibber21
- Created: 2022-02-19T13:21:20.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-11T01:34:47.000Z (almost 4 years ago)
- Last Synced: 2025-02-09T06:32:12.240Z (11 months ago)
- Homepage:
- Size: 3.41 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Excel For Data Science and Analysis
In our day-to-day tasks, we use excel in performing almost all tasks whether it is collecting data, storing data, analysing data, or data cleaning. It is the primary knowledge that should be expected from a person expertizing in the field of Data Science. It is so simple to use and to store the results to provide a basic statistics of the given data.
## Power Queries
Power Query is a business intelligence tool available in Excel that allows you to import data from many different sources and then clean, transform and reshape your data as needed. It allows you to set up a query once and then reuse it with a simple refresh. It's also pretty powerful.
***Step 1-*** *Importing the files from the folder on your PC (files can be imported from Web or a Database as well).*

***Step 2-*** *Browse the folder where the excel files are stored.*

***Step 3-*** *Click on OK after selecting the folder path to the CSV files.*

***Step 4-*** *A window will appear showing the data. Click on Combine & Edit to combine all the excel files in that folder.*

*A new window appears with all the data of the combined CSV files.*

### Performing functions on an excel file is known as Data Cleaning.
***Step 5-*** *Performing operationas on the Data can be performed. To remove a column from the CSV file simply select the column from the header. Right click and and Remove.*

***Step 6-*** *Merging columns is done by selecting the two columns and and Right click. Go to the Merge Columns.*

*Choose a separator. It could be anything you need to use as a delimiter for separation between the two column. In case of a certain separator not available select "custom" from the drop down menu and choose the selector of your choice. Select the "New Column name" and press OK.*

*The two merged columns appears to be something like this.*

***Step 6-*** *One column can be splitted into two. To perform this task simply choose the column and Right Click. In the Home tab selct "Split Column" and choose "By delimiter". This will split the column by the provided delimiter.*

*This operation gives the following result after it is performed.*

***Step 7-*** *The data type of a column can also be changed. Select the column and Right Click. Choose the data type for the given column.*

***Step 8-*** *Once the Data Cleaning has been performed, go to the Home tab and click on Close & Load.*

*A new Excel Sheet after Data Cleaning is visible. Save this file for later purpose. The later added files to the same folder performs the Data Cleaning automatically.*

## Formulas
Apart from using Power Queries, there are wide range of formulas to perform operations in the data of the Excel File. Just navigate to the Formulas and click "Insert Function".

1. CONCATENATE
Combine the values of several cells into one
Formula: Combine the values of several cells into one.

2. VLOOKUP
The formula allows you to look up data that is arranged in vertical columns.
Formula: =VLOOKUP(LOOKUP_VALUE,TABLE_ARRAY, COL_INDEX_NUM, [RANGE_LOOKUP])

3. LEN
To get the number of characters in a given cell.
Formula: =LEN(SELECT CELL)

4. SUMIF
It adds up the values in cells which meet a selected number.
Formula: =SUMIF(RANGE,CRITERIA,[sum_range])

5. DAYS/NETWORKDAYS
To determine the number of days between two calendar dates.
Formula: =NETWORKDAYS(SELECT CELL, SELECT CELL,[numberofholidays])

Formula: =NETWORKDAYS(SELECT CELL, SELECT CELL,[numberofholidays])

6. SUBSTITUTE
Replacing cells in bulks
Formula: =SUBSTITUTE(A1,"p","s")

7. MINIF/MAXIF
Minimum of a set of values, and match on criteria.
Formula: =MIN(IF(RANGE1,CRITERIA1,RANGE2))

Maximum of a set of values, and match on criteria.
Formula: =MAX(IF(RANGE1,CRITERIA1,RANGE2))

8. COUNTIFS
Counts the numbers how many times a value appears based on one criteria.
Formula: =COUNTIFS(RANGE,CRITERIA)

9. LEFT/RIGHT
will return the “x” number of characters from the beginning of the cell.
Formula: =LEFT(SELECT CELL,NUMBER)

will return the “x” number of characters from the end of the cell.
Formula: =RIGHT(SELECT CELL,NUMBER)

## Pivot Tables
### Creating Pivot Table
Select the table or the rows and columns. Go to the insert tab and click on the PivotTable.

The range of the table is selected. Now choose a new worksheet or the existing worksheet to work with the Pivot Tables. Press OK.

The extreme right section of the new worksheet created contains all the column names of the table. On the buttom are four fields for the formatting of the Pivot Table according to your choice.

Dragging UnitCost to the VALUES section will give the sum of the total unit costs present in the table.

Dragging the items to the ROWS with show all the items in a row. Choosing both items to the rows and UnitCost to the VALUES with show the items in a row along with sum of the total of the each item and the grand total.

The Pivot Table for the 5 fields appears to be like this.

A chart of this new worksheet can be formed. Go to the insert tab and click on Recommended Charts. Choose the type of chart you want to form.


A filter can also be chosen by picking and dropping the fields to the filter section.


Sorting of the column can be done by selecting the particular column to be sorted. Right click and sort.

Pivot table can also be created by some specific and selected values instead of the entire table.

