Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/deepmancer/advanced-recommender-system
Advance information retrieval system that combines advanced indexing, machine learning, and personalized search to enhance academic research and document discovery.
https://github.com/deepmancer/advanced-recommender-system
bigram-model collaborative-filtering crawling-python fine-tuning information-retrieval language-model natural-language-processing nlp positional-indexing pytorch recommender-system selenium spelling-correction tokenization transformers vectorization
Last synced: 6 days ago
JSON representation
Advance information retrieval system that combines advanced indexing, machine learning, and personalized search to enhance academic research and document discovery.
- Host: GitHub
- URL: https://github.com/deepmancer/advanced-recommender-system
- Owner: deepmancer
- License: mit
- Created: 2023-08-18T11:22:10.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-16T11:00:39.000Z (3 months ago)
- Last Synced: 2024-10-11T20:02:56.820Z (28 days ago)
- Topics: bigram-model, collaborative-filtering, crawling-python, fine-tuning, information-retrieval, language-model, natural-language-processing, nlp, positional-indexing, pytorch, recommender-system, selenium, spelling-correction, tokenization, transformers, vectorization
- Language: Jupyter Notebook
- Homepage:
- Size: 1.85 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ Advanced Recommender System
Welcome to the **Advanced Recommender Systems**!
---
## ๐ Project Overview
Our goal is to create a comprehensive platform that excels in retrieving, classifying, ranking, and recommending documents tailored to user preferences. The pipeline of the project follows these key stages:
1. **Data Collection & Preprocessing**:
- ๐ฅ **Data Collection**: Gather academic paper data.
- ๐ง **Preprocessing**: Prepare data for indexing.2. **Indexing & Retrieval Infrastructure**:
- ๐๏ธ **Indexing**: Develop an indexing system.
- โ๏ธ **Spell Correction**: Integrate spell correction mechanisms.
- ๐ **Vector Space Models**: Apply models for accurate search ranking.3. **Machine Learning & Clustering**:
- ๐ค **Machine Learning**: Implement classification algorithms.
- ๐งฉ **Clustering**: Organize documents into clusters.4. **Web Crawling & Personalized Search**:
- ๐ **Web Crawling**: Collect additional data from the web.
- ๐ **Personalized Search**: Develop advanced search and recommendation features.5. **Evaluation & Optimization**:
- ๐ **Evaluation**: Assess system performance using metrics.
- ๐ง **Optimization**: Refine and improve system effectiveness.---
## ๐๏ธ Phase 1: Data Acquisition and Indexing Infrastructure
Phase 1 focuses on laying the foundation for a robust information retrieval system by establishing an efficient data processing and indexing infrastructure.
### Datasets
- **Dataset**: Scientific articles from [Semantic Scholar](https://www.semanticscholar.org/).
- **Dataset Category**: Artificial Intelligence & Bioinformatics### Key Components
- **๐ Data Preprocessing & Preparation**: Structure academic papers for efficient retrieval.
- **๐ Positional Index Construction**: Create a positional index for precise document searches.
- **๐ Spell Correction Integration**: Integrate a bigram-based spell correction system.
- **๐งฎ Vector Space Modeling**: Implement vector space models for effective document ranking:
- **`ltn-lnn`**: Term frequency normalization model.
- **`ltc-lnc`**: Term and document frequency adjustment model.
- **`Okapi BM25`**: Probabilistic relevance ranking model.
- **๐ Evaluation Metrics**: Assess system performance with metrics like MRR, Precision, Recall, F1 Score, MAP, and NDCG.---
## ๐งฌ Phase 2: Machine Learning and Clustering for Document Retrieval
In Phase 2, we enhance retrieval capabilities through machine learning techniques, improving classification and clustering to refine the search system.
### Key Components
- **๐ Dataset**: Access the scientific articles dataset from [Kaggle](https://www.kaggle.com/datasets/spsayakpaul/arxiv-paper-abstracts?resource=download).
- **๐ Naive Bayes Classification**: Implement a Naive Bayes classifier for document categorization.
- **๐ค Neural Network Classifier**: Develop a neural network classifier for improved accuracy.
- **๐ง Large Language Models**: Fine-tune a pre-trained model for advanced classification.
- **๐งฎ Hierarchical Clustering**: Apply hierarchical clustering for document organization.---
## ๐ ๏ธ Phase 3: Web Crawling, Link Analysis, and Personalized Search
Phase 3 centers on expanding the systemโs capabilities with web crawling, link analysis, and advanced personalization features.
### Key Components
- **๐ท๏ธ Web Crawling**: Deploy a web crawler to gather academic articles and related data.
- **๐ Link Analysis**: Utilize PageRank and HITS algorithms to determine article importance.
- **๐ Content-Based Recommendation**: Develop recommendations based on article content similarity.
- **๐ค Collaborative Filtering**: Recommend articles based on the preferences of similar users.
- **๐งช Evaluation of Recommender Systems**: Measure recommendation system performance using metrics like nDCG.### Final Product
Upon completion of Phase 3, the Advanced Recommender Systems will be a comprehensive tool that excels in retrieving, organizing, ranking, and recommending academic papers tailored to users' research needs.
---
## ๐ License
This project is licensed under the MIT License. For detailed information, please refer to the [LICENSE](LICENSE) file.
---
Thank you for your interest in the **Advanced Recommender Systems**! We hope this project serves as a valuable and engaging tool for your research and information retrieval needs. Happy exploring! ๐