https://github.com/lohiyah/dataset_summariser
A Python tool for generating concise summaries of datasets using Pandas and NumPy for analysis, and Matplotlib and Seaborn for insightful visualizations. Ideal for quick exploratory data analysis (EDA).
https://github.com/lohiyah/dataset_summariser
matplotlib numpy pandas seaborn
Last synced: 6 months ago
JSON representation
A Python tool for generating concise summaries of datasets using Pandas and NumPy for analysis, and Matplotlib and Seaborn for insightful visualizations. Ideal for quick exploratory data analysis (EDA).
- Host: GitHub
- URL: https://github.com/lohiyah/dataset_summariser
- Owner: LohiyaH
- License: apache-2.0
- Created: 2024-11-06T13:32:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-06T13:36:14.000Z (over 1 year ago)
- Last Synced: 2025-03-25T11:15:13.995Z (11 months ago)
- Topics: matplotlib, numpy, pandas, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 12.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: news.xlsx
- License: LICENSE
Awesome Lists containing this project
README
# Dataset Summarizer
This repository provides a Python-based summarizer tool to load, analyze, and visualize datasets using libraries like `pandas`, `numpy`, `matplotlib`,`seaborn`,`tensorflow`,`re`,`pickle`. This tool aims to give a quick statistical summary and visual insights into your data, making it easier to explore data patterns and distributions.
## Table of Contents
- [Description](#description)
- [Features](#features)
- [Libraries Used](#libraries-used)
- [Installation](#installation)
- [Usage](#usage)
- [License](#license)
## Description
The Dataset Summarizer tool helps you quickly load a dataset, generate descriptive statistics, and visualize key patterns within the data. It handles tasks like summarizing each column's basic statistics, handling missing values, and generating histograms, box plots, and heatmaps to aid in data understanding.
## Features
- Load and display the first few rows of a dataset.
- Generate column-wise summary statistics (mean, median, mode, standard deviation).
- Handle missing values in the dataset.
- Visualize data distributions and relationships with plots like histograms, box plots, and heatmaps.
## Libraries Used
The following Python libraries were used in the project:
1. **pandas**
- A data manipulation library used to load, transform, and summarize datasets.
- Essential functions like `.head()`, `.describe()`, and `.info()` help to understand data structure and content.
2. **numpy**
- A library for numerical operations, supporting statistical functions such as `mean`, `median`, and `std`.
- Provides efficient ways to handle missing data and large numerical computations.
3. **matplotlib**
- A foundational library for creating static and interactive visualizations.
- Used to generate simple plots, like histograms and box plots, which help visualize data distribution.
4. **seaborn**
- A high-level library built on `matplotlib` for creating attractive and informative statistical plots.
- Used for advanced visualization types, such as pair plots and heatmaps, that facilitate data exploration and pattern discovery.
## Installation
To install the necessary libraries, run:
```bash
pip install pandas numpy matplotlib seaborn tensorflow pickle re time
# transformer-abstractive-summarization
Abstractive Text Summarization using Transformer
- Implementation of the state of the art Transformer Model from "Attention is all you need", Vaswani et. al.
https://arxiv.org/abs/1706.03762
- Inshorts Dataset: https://www.kaggle.com/shashichander009/inshorts-news-data