{"id":13935802,"url":"https://github.com/ibrahimsharaf/doc2vec","last_synced_at":"2026-03-09T17:37:43.567Z","repository":{"id":91014623,"uuid":"84238935","full_name":"ibrahimsharaf/doc2vec","owner":"ibrahimsharaf","description":":notebook: Long(er) text representation and classification using Doc2Vec embeddings","archived":false,"fork":false,"pushed_at":"2024-06-17T22:49:39.000Z","size":13291,"stargazers_count":108,"open_issues_count":8,"forks_count":42,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-09-02T23:33:32.926Z","etag":null,"topics":["doc2vec","gensim","nlp-machine-learning","scikit-learn","sentiment-analysis","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ibrahimsharaf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-07T19:41:34.000Z","updated_at":"2025-08-07T12:21:50.000Z","dependencies_parsed_at":"2024-04-27T23:38:22.201Z","dependency_job_id":"7394718c-2fbe-4af7-b763-a8f465605c19","html_url":"https://github.com/ibrahimsharaf/doc2vec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ibrahimsharaf/doc2vec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibrahimsharaf%2Fdoc2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibrahimsharaf%2Fdoc2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibrahimsharaf%2Fdoc2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibrahimsharaf%2Fdoc2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ibrahimsharaf","download_url":"https://codeload.github.com/ibrahimsharaf/doc2vec/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ibrahimsharaf%2Fdoc2vec/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30304784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-09T17:35:44.120Z","status":"ssl_error","status_checked_at":"2026-03-09T17:35:43.707Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["doc2vec","gensim","nlp-machine-learning","scikit-learn","sentiment-analysis","text-classification"],"created_at":"2024-08-07T23:02:06.399Z","updated_at":"2026-03-09T17:37:43.551Z","avatar_url":"https://github.com/ibrahimsharaf.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Doc2Vec Text Classification [![Build Status](https://travis-ci.org/ibrahimsharaf/doc2vec.svg?branch=master)](https://travis-ci.org/ibrahimsharaf/doc2vec)\n\nText classification model which uses gensim Doc2Vec for generating paragraph embeddings and scikit-learn Logistic Regression for classification.\n\n\n### Dataset\n\n25,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary (1 for postive, 0 for negative).\n\nThis source dataset was collected in association with the following publication:\n\n```Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). \"Learning Word Vectors for Sentiment Analysis.\" The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).```\n\n### Usage\n- Install the required tools \n\n    ```pip install -r requirements.txt```\n- Run the script \n    \n     ```python text_classifier.py```\n\n### References\n- Kaggle – Bag of Words Meets Bags of Popcorn (https://www.kaggle.com/c/word2vec-nlp-tutorial)\n- Gensim – Deep learning with paragraph2vec (https://radimrehurek.com/gensim/models/doc2vec.html)\n- Quoc Le and Tomas Mikolov. Distributed Representations of Sentences and Documents (https://arxiv.org/pdf/1405.4053v2.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibrahimsharaf%2Fdoc2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fibrahimsharaf%2Fdoc2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibrahimsharaf%2Fdoc2vec/lists"}