Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nisaaragharia/mass_summarization
Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 70B
https://github.com/nisaaragharia/mass_summarization
data-science dataset dataset-generation llama2 llms summarization
Last synced: about 1 month ago
JSON representation
Large Scale Dataset Cleaning (Summarization and Information Extraction) Using LLAMA2 70B
- Host: GitHub
- URL: https://github.com/nisaaragharia/mass_summarization
- Owner: NisaarAgharia
- Created: 2024-03-31T15:42:44.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-04-07T04:47:03.000Z (9 months ago)
- Last Synced: 2024-10-18T22:08:02.417Z (3 months ago)
- Topics: data-science, dataset, dataset-generation, llama2, llms, summarization
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/datasets/nisaar/LLAMA2_Legal_Dataset_4.4k_Instructions
- Size: 60.5 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Mass Summarization
This repository is dedicated to the task of mass summarization using large language models (LLMs), particularly focusing on leveraging the capabilities of llama models. The primary goal is to facilitate large-scale dataset cleaning and summarization to enhance data understanding and usability.
## Overview
The `Mass_Summarization` repository contains notebooks that demonstrate the use of llama2 models for summarizing large datasets. These notebooks are designed to be used with Google Colab and Runpod, making it easy for users to run and modify the code according to their needs.
![mass_summary](https://github.com/NisaarAgharia/Mass_Summarization/assets/22457544/f5af3a91-a5b3-4006-8316-e554c8ff98d6)## Notebooks
- **LLAMA2_13B_Summarizer.ipynb**: Demonstrates how to perform mass summarization using the llama2 13B model on Google Colab.
- **LLAMA_2_70B_Information_Extractor.ipynb**: Shows the use of the llama2 70B model for information extraction on Runpod.