https://github.com/abhinav330/instagram-influencers-analysis

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.
https://github.com/abhinav330/instagram-influencers-analysis

data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn

Last synced: about 2 months ago
JSON representation

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

Host: GitHub
URL: https://github.com/abhinav330/instagram-influencers-analysis
Owner: Abhinav330
Created: 2024-08-25T22:37:51.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-08-31T02:30:06.000Z (almost 2 years ago)
Last Synced: 2025-03-03T08:16:37.025Z (over 1 year ago)
Topics: data, data-science, data-visualization, exploratory-data-analysis, exploratory-data-visualizations, influncer-products, instagram, scikit-learn, sklearn
Language: Jupyter Notebook
Homepage:
Size: 926 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          [![Codacy Badge](https://app.codacy.com/project/badge/Grade/709f87abe4a24b56842715d13d55dfc1)](https://app.codacy.com/gh/Abhinav330/Instagram-Influncers-Analysis/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/matplotlib?color=gold)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/numpy?color=gold)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/pandas?color=yellow)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/scikit-learn?color=silver)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/scipy?color=beige)

![GitHub Pipenv locked dependency version](https://img.shields.io/github/pipenv/locked/dependency-version/Abhinav330/Instagram-Influncers-Analysis/seaborn?color=gold)

![GitHub Pipenv locked Python version](https://img.shields.io/github/pipenv/locked/python-version/Abhinav330/Instagram-Influncers-Analysis?color=dark%20green)

![GitHub repo size](https://img.shields.io/github/repo-size/Abhinav330/Instagram-Influncers-Analysis)

# Instagram Profiles Data Analysis

## Introduction

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

## Data Loading and Basic Inspection

- The script loads a dataset from a CSV file named 'Instagram Profiles - Github Hashtag - instagram_profile.csv' into a Pandas DataFrame named `df`.

- It displays the first few rows and provides information about the dataset using `df.head()` and `df.info()`.

## Data Visualization Class

- The script defines a Python class named `visual_preprocess` to encapsulate data visualization and preprocessing functions.

## Data Exploration and Preprocessing

- The class contains various methods for exploring and preprocessing the data:

  - `_row_col`: Helper function to calculate the number of rows and columns in the DataFrame.

  - `disp_tot_row_col`: Displays the total row and column count.

  - `missingv`: Visualizes missing values using a heatmap.

  - `_null_calculator`: Helper function to calculate the percentage of null values in columns.

  - `null_percentage`: Calculates and displays columns with a specified percentage of null values.

  - `get_col_empty`: Returns columns with null values above a specified threshold.

## Data Cleaning

- Columns with a high percentage of null values (above 50%) are dropped from the DataFrame.

## Data Visualization

- Various data visualizations are created using Seaborn and Matplotlib, including:

  - Distribution of 'posts_count' using a histogram.

  - Filtering and exploration of records with 'posts_count' greater than 2000.

  - Scatterplots of various features ('followers', 'following', 'highlights_count', etc.) with respect to different account types and privacy settings.

  - Bar plots showing relationships between 'is_business_account' and 'is_professional_account' with 'followers' and 'following'.

  - Additional scatterplots exploring features related to 'following' and 'followers'.

## Hashtag Analysis

- The script defines a function (`hashtag_freq`) to extract and analyze hashtags from the 'post_hashtags' column.

- The function counts the frequency of hashtags and displays the top 10 most frequently used hashtags in the dataset.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhinav330/instagram-influencers-analysis

Awesome Lists containing this project

README