{"id":15783637,"url":"https://github.com/deepmancer/advanced-recommender-system","last_synced_at":"2025-04-01T16:30:42.414Z","repository":{"id":189135747,"uuid":"680112920","full_name":"deepmancer/advanced-recommender-system","owner":"deepmancer","description":"Advance information retrieval system that combines advanced indexing, machine learning, and personalized search to enhance academic research and document discovery.","archived":false,"fork":false,"pushed_at":"2024-08-16T11:00:39.000Z","size":1936,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-11T20:02:56.820Z","etag":null,"topics":["bigram-model","collaborative-filtering","crawling-python","fine-tuning","information-retrieval","language-model","natural-language-processing","nlp","positional-indexing","pytorch","recommender-system","selenium","spelling-correction","tokenization","transformers","vectorization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deepmancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-18T11:22:10.000Z","updated_at":"2024-08-26T08:52:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"e8c24e24-30c1-43da-a6d7-35b9f722e725","html_url":"https://github.com/deepmancer/advanced-recommender-system","commit_stats":{"total_commits":36,"total_committers":3,"mean_commits":12.0,"dds":0.5277777777777778,"last_synced_commit":"be7b6bdbd3ea54819150849e3c634e46822c3a1c"},"previous_names":["alirezahr79/information-retrieval","alirezaheidari-cs/information-retrieval","deepmancer/information-retrieval","deepmancer/modern-information-retrieval","deepmancer/advanced-recommender-system"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepmancer%2Fadvanced-recommender-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepmancer%2Fadvanced-recommender-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepmancer%2Fadvanced-recommender-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deepmancer%2Fadvanced-recommender-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deepmancer","download_url":"https://codeload.github.com/deepmancer/advanced-recommender-system/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246620261,"owners_count":20806731,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigram-model","collaborative-filtering","crawling-python","fine-tuning","information-retrieval","language-model","natural-language-processing","nlp","positional-indexing","pytorch","recommender-system","selenium","spelling-correction","tokenization","transformers","vectorization"],"created_at":"2024-10-04T20:00:23.764Z","updated_at":"2025-04-01T16:30:42.408Z","avatar_url":"https://github.com/deepmancer.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📚 Advanced Recommender System\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/PyTorch-%23EE4C2C.svg?style=for-the-badge\u0026logo=PyTorch\u0026logoColor=white\" alt=\"PyTorch\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Hugging%20Face-FFD21E?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000\" alt=\"Hugging Face Transformers\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Python-3670A0?style=for-the-badge\u0026logo=python\u0026logoColor=ffdd54\" alt=\"Python\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge\u0026logo=scikit-learn\u0026logoColor=white\" alt=\"scikit-learn\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Jupyter-F37626.svg?\u0026style=for-the-badge\u0026logo=Jupyter\u0026logoColor=white\" alt=\"Jupyter Notebook\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-blue.svg?style=for-the-badge\" alt=\"License\"\u003e\n\u003c/p\u003e\n\n\u003e Welcome to the Advanced Recommender System project!\n\nThe **Advanced Recommender System** is a comprehensive platform designed to streamline the process of retrieving, classifying, ranking, and recommending academic documents tailored to user preferences. Whether you're conducting research or exploring literature, this system aims to enhance your workflow with cutting-edge methodologies.\n\n---\n\n| **Source Code** | **Website** |\n|:-----------------|:------------|\n| \u003ca href=\"https://github.com/deepmancer/advanced-recommender-system\" target=\"_blank\"\u003egithub.com/deepmancer/advanced-recommender-system\u003c/a\u003e | \u003ca href=\"https://deepmancer.github.io/advanced-recommender-system/\" target=\"_blank\"\u003edeepmancer.github.io/advanced-recommender-system\u003c/a\u003e |\n\n---\n\n## ✨ Key Features:\n- End-to-end pipeline for academic document retrieval and recommendation.\n- Integration of machine learning and deep learning models, clustering techniques, and advanced search algorithms.\n- Personalized search and recommendation for tailored user experiences.\n\n---\n\n## 🔍 Overview\n\nThe project pipeline is divided into three core phases:\n\n1. **📥 Data Collection \u0026 Indexing Infrastructure**:\n   - Collect and preprocess data for efficient retrieval.\n   - Build robust indexing and retrieval systems with spell correction and vector space models.\n\n2. **🧠 Machine Learning \u0026 Clustering**:\n   - Leverage classification algorithms and clustering techniques to improve document categorization and organization.\n\n3. **🌐 Web Crawling \u0026 Personalized Recommendations**:\n   - Enhance the system by incorporating web crawling, link analysis, and personalized recommendation engines.\n\n---\n\n## 🛠️ Workflow Phases\n\n### 📂 Phase 1: Data Acquisition and Indexing Infrastructure\n\nIn this phase, we establish a strong foundation for data processing and retrieval. \n\n**Datasets**:\n- Source: [Semantic Scholar](https://www.semanticscholar.org/)\n- Focus: Artificial Intelligence \u0026 Bioinformatics\n\n**Key Components**:\n- 🛠️ **Data Preprocessing**: Structuring academic papers for indexing.\n- 📍 **Positional Index Construction**: Creating a positional index for precise search results.\n- ✏️ **Spell Correction**: Bigram-based system to correct typos in queries.\n- 📐 **Vector Space Modeling**:\n  - `ltn-lnn`: Term frequency normalization.\n  - `ltc-lnc`: Adjustments for term and document frequency.\n  - `Okapi BM25`: Probabilistic ranking model.\n- 📈 **Evaluation Metrics**: Metrics such as MRR, Precision, Recall, F1 Score, MAP, and nDCG ensure robust performance analysis.\n\n---\n\n\n### 🧬 Phase 2: Machine Learning and Clustering for Document Retrieval\n\nThis phase enhances search capabilities with classification and clustering techniques.\n\n**Datasets**:\n- Source: [Kaggle ArXiv Abstracts](https://www.kaggle.com/datasets/spsayakpaul/arxiv-paper-abstracts?resource=download)\n\n**Key Components**:\n- 🗂️ **Naive Bayes Classification**: Basic categorization of documents.\n- 🤖 **Neural Network Classifier**: Improved accuracy for document classification.\n- 🔍 **Large Language Models**: Fine-tuned models for advanced categorization.\n- 🗂️ **Hierarchical Clustering**: Organizing documents into meaningful groups.\n\n---\n\n### 🕸️ Phase 3: Web Crawling, Link Analysis, and Personalized Search\n\nThe final phase focuses on enriching data and delivering personalized recommendations. \n\n**Key Components**:\n- 🕷️ **Web Crawling**: Gathering additional data from academic sources.\n- 🔗 **Link Analysis**:\n  - **PageRank**: Measure document importance.\n  - **HITS**: Identify hubs and authorities in document networks.\n- 🧠 **Recommendation Engines**:\n  - **Content-Based Filtering**: Recommend articles based on similarity.\n  - **Collaborative Filtering**: Suggest articles based on user preferences.\n- 📈 **Evaluation Metrics**: Metrics like nDCG assess recommendation quality.\n\n\n### 🌟 Final Deliverable\n\nA powerful and user-friendly recommender system capable of retrieving, organizing, ranking, and recommending academic articles.\n\n---\n\n## 📝 License\n\nThis project is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute the code while adhering to the terms of the license.\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions from the community! Here's how you can help:\n1. **Star the repository** ⭐ to show your support.\n2. **Fork the repository** 🍴 and implement new features or fixes.\n3. Submit a **pull request** 🔄 with your contributions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepmancer%2Fadvanced-recommender-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeepmancer%2Fadvanced-recommender-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeepmancer%2Fadvanced-recommender-system/lists"}