https://github.com/abhinav330/instagram-influencers-analysis
This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.
https://github.com/abhinav330/instagram-influencers-analysis
data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn
Last synced: over 1 year ago
JSON representation
This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.
- Host: GitHub
- URL: https://github.com/abhinav330/instagram-influencers-analysis
- Owner: Abhinav330
- Created: 2024-08-25T22:37:51.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-31T02:30:06.000Z (almost 2 years ago)
- Last Synced: 2025-03-03T08:16:37.025Z (over 1 year ago)
- Topics: data, data-science, data-visualization, exploratory-data-analysis, exploratory-data-visualizations, influncer-products, instagram, scikit-learn, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 926 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[](https://app.codacy.com/gh/Abhinav330/Instagram-Influncers-Analysis/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)








# Instagram Profiles Data Analysis
## Introduction
This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.
## Data Loading and Basic Inspection
- The script loads a dataset from a CSV file named 'Instagram Profiles - Github Hashtag - instagram_profile.csv' into a Pandas DataFrame named `df`.
- It displays the first few rows and provides information about the dataset using `df.head()` and `df.info()`.
## Data Visualization Class
- The script defines a Python class named `visual_preprocess` to encapsulate data visualization and preprocessing functions.
## Data Exploration and Preprocessing
- The class contains various methods for exploring and preprocessing the data:
- `_row_col`: Helper function to calculate the number of rows and columns in the DataFrame.
- `disp_tot_row_col`: Displays the total row and column count.
- `missingv`: Visualizes missing values using a heatmap.
- `_null_calculator`: Helper function to calculate the percentage of null values in columns.
- `null_percentage`: Calculates and displays columns with a specified percentage of null values.
- `get_col_empty`: Returns columns with null values above a specified threshold.
## Data Cleaning
- Columns with a high percentage of null values (above 50%) are dropped from the DataFrame.
## Data Visualization
- Various data visualizations are created using Seaborn and Matplotlib, including:
- Distribution of 'posts_count' using a histogram.
- Filtering and exploration of records with 'posts_count' greater than 2000.
- Scatterplots of various features ('followers', 'following', 'highlights_count', etc.) with respect to different account types and privacy settings.
- Bar plots showing relationships between 'is_business_account' and 'is_professional_account' with 'followers' and 'following'.
- Additional scatterplots exploring features related to 'following' and 'followers'.
## Hashtag Analysis
- The script defines a function (`hashtag_freq`) to extract and analyze hashtags from the 'post_hashtags' column.
- The function counts the frequency of hashtags and displays the top 10 most frequently used hashtags in the dataset.