{"id":20228202,"url":"https://github.com/mmourafiq/philo2vec","last_synced_at":"2025-04-10T17:26:07.424Z","repository":{"id":146815874,"uuid":"64703555","full_name":"mmourafiq/philo2vec","owner":"mmourafiq","description":"An implementation of word2vec applied to [stanford philosophy encyclopedia](http://plato.stanford.edu/)","archived":false,"fork":false,"pushed_at":"2016-08-12T17:48:03.000Z","size":36079,"stargazers_count":35,"open_issues_count":1,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-24T15:04:38.319Z","etag":null,"topics":["crawled-data","deep-learning","embeddings","negative-samples","philosophy-encyclopedia","skips","tensorflow","vector-representations","word2vec"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mmourafiq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-08-01T21:41:00.000Z","updated_at":"2024-01-04T16:06:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"913f51a5-a2fb-4744-9dec-8eaa6f02351d","html_url":"https://github.com/mmourafiq/philo2vec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mmourafiq%2Fphilo2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mmourafiq%2Fphilo2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mmourafiq%2Fphilo2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mmourafiq%2Fphilo2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mmourafiq","download_url":"https://codeload.github.com/mmourafiq/philo2vec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248261968,"owners_count":21074229,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawled-data","deep-learning","embeddings","negative-samples","philosophy-encyclopedia","skips","tensorflow","vector-representations","word2vec"],"created_at":"2024-11-14T07:29:05.767Z","updated_at":"2025-04-10T17:26:07.418Z","avatar_url":"https://github.com/mmourafiq.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# philo2vec\n\nA Tensorflow implementation of word2vec applied to [stanford philosophy encyclopedia](http://plato.stanford.edu/), the implementation supports both `cbow` and `skip gram`\n\nfor more reference, please have a look at this papers:\n \n * [Distributed Representations of Words and Phrases and their Compositionality](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)\n * [word2vec Parameter Learning Explained](http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf)\n * [Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method](http://arxiv.org/pdf/1402.3722v1.pdf)\n\nAfter training the model returns some interesting results, see [interesting results part](https://github.com/mouradmourafiq/philo2vec#some-interesting-results)\n\nEvaluating `hume - empiricist + rationalist`:\n\n```\ndescartes\nmalebranche\nspinoza\nhobbes\nherder\n```\n\n\u003cimg width=\"1000\" alt=\"screen shot 2016-08-12 at 19 19 22\" src=\"https://cloud.githubusercontent.com/assets/1261626/17630893/d81cc96e-60c1-11e6-9826-947f9de43db2.png\"\u003e\n\n\n### Some interesting results\n\n#### Similarities\n\nSimilar words to `death`:\n \n```\nuntimely\nravages\ngrief\ntorment\n```\n\nSimilar words to `god`: \n\n```\ndivine\nDe Providentia\nchrist\nHesiod\n```\n\nSimilar words to `love`: \n\n```\nfriendship\naffection\nchrist\nreverence\n```\n\nSimilar words to `life`:\n\n```\ncareer\nlive\nlifetime\ncommunity\nsociety\n```\n\nSimilar words to `brain`:\n\n```\nneurological\nsenile\nnerve\nnervous\n```\n\n#### operations\n \nEvaluating `hume - empiricist + rationalist`:\n\n```\ndescartes\nmalebranche\nspinoza\nhobbes\nherder\n```\n\nEvaluating `ethics - rational`:\n\n```\nhiroshima\n```\n\nEvaluating `ethic - reason`:\n\n```\ninegalitarian\nanti-naturalist\naustere\n```\n\nEvaluating `moral - rational`:\n\n```\ncommonsense\n```\n\nEvaluating `life - death + love`:\n\n```\nself-positing\nfriendship\ncare\nharmony\n```\n\nEvaluating `death + choice`:\n\n```\nregret\nagony\nmisfortune\nimpending\n```\n\nEvaluating `god + human`:\n\n```\ndivine\ninviolable\nyahweh\ngod-like\nman\n```\n\nEvaluating `god + religion`:\n\n```\namida\ntorah\nscripture\nbuddha\nsokushinbutsu\n```\n\nEvaluating `politic + moral`:\n\n```\nrights-oriented\nnormative\nethics\nintegrity\n```\n\n\n### The repo contains:\n\n  * an object to crawl data from the philosophy encyclopedia; [PlatoData](https://github.com/mouradmourafiq/philo2vec/blob/master/data.py)\n  * a object to build the vocabulary based on the crawled data; [VocabBuilder](https://github.com/mouradmourafiq/philo2vec/blob/master/preprocessors.py)\n  * the model that computes the continuous distributed representations of words; [Philo2Vec](https://github.com/mouradmourafiq/philo2vec/blob/master/models.py)\n\n\n### Installation\n\nThe dependencies used for this module can be easily installed with pip:\n\n```\n\u003e pip install -r requirements.txt\n```\n\n### The params for the VocabBuilder:\n\n  * **min_frequency**: the minimum frequency of the words to be used in the model.\n  * **size**: the size of the data, the model then use the top size most frequenct words.\n\n\n### The hyperparams of the model:\n   \n  * **optimizer**: an instance of tensorflow `Optimizer`, such as `GradientDescentOptimizer`, `AdagradOptimizer`, or `MomentumOptimizer`.\n  * **model**: the model to use to create the vectorized representation, possible values: `CBOW`, `SKIP_GRAM`.\n  * **loss_fct**: the loss function used to calculate the error, possible values: `SOFTMAX`, `NCE`.\n  * **embedding_size**: dimensionality of word embeddings.\n  * **neg_sample_size**: number of negative samples for each positive sample\n  * **num_skips**: numer of skips for a `SKIP_GRAM` model.\n  * **context_window**:  window size, this window is used to create the context for calculating the vector representations [ window target window ].\n\n\n### Quick usage:\n\n```python\nparams = {\n    'model': Philo2Vec.CBOW,\n    'loss_fct': Philo2Vec.NCE,\n    'context_window': 5,\n}\nx_train = get_data()\nvalidation_words = ['kant', 'descartes', 'human', 'natural']\nx_validation = [StemmingLookup.stem(w) for w in validation_words]\nvb = VocabBuilder(x_train, min_frequency=5)\npv = Philo2Vec(vb, **params)\npv.fit(epochs=30, validation_data=x_validation)\n```\n\n```python\nparams = {\n    'model': Philo2Vec.SKIP_GRAM,\n    'loss_fct': Philo2Vec.SOFTMAX,\n    'context_window': 2,\n    'num_skips': 4,\n    'neg_sample_size': 2,\n}\nx_train = get_data()\nvalidation_words = ['kant', 'descartes', 'human', 'natural']\nx_validation = [StemmingLookup.stem(w) for w in validation_words]\nvb = VocabBuilder(x_train, min_frequency=5)\npv = Philo2Vec(vb, **params)\npv.fit(epochs=30, validation_data=x_validation)\n```\n\n\n### about stemming\n\nSince the words are stemmed as part of the preprocessing, some operation are sometimes necessary\n\n```python\nStemmingLookup.stem('religious')  # returns \"religi\"\n\nStemmingLookup.original_form('religi')  # returns \"religion\"\n```\n\n\n### Getting similarities\n\n```python\npv.get_similar_words(['rationalist', 'empirist'])\n```\n\n### Evaluating operations\n\n```python\npv.evaluate_operation('moral - rational')\n```\n\n### plotting vectorized words\n```python\npv.plot(['hume', 'empiricist', 'descart', 'rationalist'])\n```\n\n### Training details\n\n#### skip_gram:\n \n\u003cimg width=\"873\" alt=\"skip_gram_loss\" src=\"https://cloud.githubusercontent.com/assets/1261626/17628496/d19a0d42-60b5-11e6-8cbc-20f1aac3becc.png\"\u003e\n\n\u003cimg width=\"874\" alt=\"skip_gram_embeddings\" src=\"https://cloud.githubusercontent.com/assets/1261626/17628497/d19a811e-60b5-11e6-8e7c-733309b5249d.png\"\u003e\n\n\u003cimg width=\"878\" alt=\"skip_gram_w\" src=\"https://cloud.githubusercontent.com/assets/1261626/17628499/d1a1d00e-60b5-11e6-8638-8f68c288205b.png\"\u003e\n\n\u003cimg width=\"862\" alt=\"skip_gram_b\" src=\"https://cloud.githubusercontent.com/assets/1261626/17628498/d19b6778-60b5-11e6-8ed1-e45b0566d8c7.png\"\u003e\n\n#### cbow:\n\n\n\u003cimg width=\"867\" alt=\"cbow_loss\" src=\"https://cloud.githubusercontent.com/assets/1261626/17630043/e41aae9c-60bd-11e6-9289-92d5dc58e55f.png\"\u003e\n\n\u003cimg width=\"885\" alt=\"cbow_embedding\" src=\"https://cloud.githubusercontent.com/assets/1261626/17630045/e41e2a90-60bd-11e6-8b49-891c3ba8ebf7.png\"\u003e\n\n\u003cimg width=\"869\" alt=\"cbow_w\" src=\"https://cloud.githubusercontent.com/assets/1261626/17630044/e41b5e6e-60bd-11e6-9eb7-55dd1dbfda48.png\"\u003e\n\n\u003cimg width=\"856\" alt=\"cbow_b\" src=\"https://cloud.githubusercontent.com/assets/1261626/17630042/e417f292-60bd-11e6-92e7-a8e9a5ddfb32.png\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmmourafiq%2Fphilo2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmmourafiq%2Fphilo2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmmourafiq%2Fphilo2vec/lists"}