https://github.com/lohiyah/dataset_summariser

A Python tool for generating concise summaries of datasets using Pandas and NumPy for analysis, and Matplotlib and Seaborn for insightful visualizations. Ideal for quick exploratory data analysis (EDA).
https://github.com/lohiyah/dataset_summariser

matplotlib numpy pandas seaborn

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/lohiyah/dataset_summariser
Owner: LohiyaH
License: apache-2.0
Created: 2024-11-06T13:32:05.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-06T13:36:14.000Z (over 1 year ago)
Last Synced: 2025-03-25T11:15:13.995Z (11 months ago)
Topics: matplotlib, numpy, pandas, seaborn
Language: Jupyter Notebook
Homepage:
Size: 12.3 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: news.xlsx
- License: LICENSE

Awesome Lists containing this project

README

# Dataset Summarizer

This repository provides a Python-based summarizer tool to load, analyze, and visualize datasets using libraries like `pandas`, `numpy`, `matplotlib`,`seaborn`,`tensorflow`,`re`,`pickle`. This tool aims to give a quick statistical summary and visual insights into your data, making it easier to explore data patterns and distributions.

## Table of Contents
- [Description](#description)
- [Features](#features)
- [Libraries Used](#libraries-used)
- [Installation](#installation)
- [Usage](#usage)
- [License](#license)

## Description

The Dataset Summarizer tool helps you quickly load a dataset, generate descriptive statistics, and visualize key patterns within the data. It handles tasks like summarizing each column's basic statistics, handling missing values, and generating histograms, box plots, and heatmaps to aid in data understanding.

## Features

- Load and display the first few rows of a dataset.
- Generate column-wise summary statistics (mean, median, mode, standard deviation).
- Handle missing values in the dataset.
- Visualize data distributions and relationships with plots like histograms, box plots, and heatmaps.

## Libraries Used

The following Python libraries were used in the project:

1. **pandas**
- A data manipulation library used to load, transform, and summarize datasets.
- Essential functions like `.head()`, `.describe()`, and `.info()` help to understand data structure and content.

2. **numpy**
- A library for numerical operations, supporting statistical functions such as `mean`, `median`, and `std`.
- Provides efficient ways to handle missing data and large numerical computations.

3. **matplotlib**
- A foundational library for creating static and interactive visualizations.
- Used to generate simple plots, like histograms and box plots, which help visualize data distribution.

4. **seaborn**
- A high-level library built on `matplotlib` for creating attractive and informative statistical plots.
- Used for advanced visualization types, such as pair plots and heatmaps, that facilitate data exploration and pattern discovery.

## Installation

To install the necessary libraries, run:
```bash
pip install pandas numpy matplotlib seaborn tensorflow pickle re time

# transformer-abstractive-summarization
Abstractive Text Summarization using Transformer

- Implementation of the state of the art Transformer Model from "Attention is all you need", Vaswani et. al.
https://arxiv.org/abs/1706.03762

- Inshorts Dataset: https://www.kaggle.com/shashichander009/inshorts-news-data

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lohiyah/dataset_summariser

Awesome Lists containing this project

README