Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/aadisrivastava05/pravachakai-hindi-article-generation

This project is a Hindi Article Generator, developed using natural language processing (NLP) techniques to create contextually relevant and coherent articles in Hindi. The model was trained on a custom dataset collected via web scraping. The model has been fine-tuned on a dataset of Hindi news headlines and articles.
https://github.com/aadisrivastava05/pravachakai-hindi-article-generation

fine-tuning huggingface-transformers llama3 nlp unsloth

Last synced: about 2 months ago
JSON representation

This project is a Hindi Article Generator, developed using natural language processing (NLP) techniques to create contextually relevant and coherent articles in Hindi. The model was trained on a custom dataset collected via web scraping. The model has been fine-tuned on a dataset of Hindi news headlines and articles.

Awesome Lists containing this project

README

        

# Hindi Article GeneratorπŸ“°
This project is a Hindi Article Generator, developed using natural language processing (NLP) techniques to create contextually relevant and coherent articles in Hindi. The model was trained on a custom dataset collected via web scraping. The model has been fine-tuned on a dataset of Hindi news headlines and articles to generate high-quality, fluent articles for a variety of use cases such as news, blogs, and creative content.

## Direct link to Kaggle Notebook with outputs https://www.kaggle.com/code/aadisrivastava/hindi-article-generator
(I would highly recommend you to check it out, as it has all the cells including the outputs)

# πŸ“ Dataset
The model is trained on a dataset that I created through web scraping from the BBC Hindi website. The dataset includes a wide variety of articles, ensuring that the generator produces diverse and contextually accurate content.
## Here is the link to the Dataset repo containing the dataset and scraping code:-
https://github.com/AadiSrivastava05/BBC-Hindi-News-Dataset-with-web-scraping-script
## The dataset is also available on Kaggle which you can use directly in your code without downloading:-
https://www.kaggle.com/datasets/aadisrivastava/bbc-hindi-news-articles-dataset-detailed

# πŸ“Œ Features
* Generates fluent and contextually accurate Hindi articles.
* Can be used for content creation in media platforms, blogs, and creative writing.
* Built with advanced NLP techniques and fine-tuned for the Hindi language on **llama 3**.
* This same dataset and approach can be used for many other tasks, some of them I am listing below:-
1) Generating a headline for an article
2) Classification of an article into different categories
3) Classification of an article headline into different categories