{"id":13416299,"url":"https://github.com/cemoody/lda2vec","last_synced_at":"2025-05-15T03:07:29.278Z","repository":{"id":40636949,"uuid":"48560572","full_name":"cemoody/lda2vec","owner":"cemoody","description":null,"archived":false,"fork":false,"pushed_at":"2021-11-16T03:32:50.000Z","size":14096,"stargazers_count":3159,"open_issues_count":64,"forks_count":627,"subscribers_count":117,"default_branch":"master","last_synced_at":"2025-04-14T03:09:31.827Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cemoody.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-25T00:40:49.000Z","updated_at":"2025-04-11T12:21:57.000Z","dependencies_parsed_at":"2022-08-02T16:00:49.745Z","dependency_job_id":null,"html_url":"https://github.com/cemoody/lda2vec","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cemoody%2Flda2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cemoody%2Flda2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cemoody%2Flda2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cemoody%2Flda2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cemoody","download_url":"https://codeload.github.com/cemoody/lda2vec/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254264766,"owners_count":22041793,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:56.660Z","updated_at":"2025-05-15T03:07:24.258Z","avatar_url":"https://github.com/cemoody.png","language":"Python","funding_links":[],"categories":["Python","APIs and Libraries"],"sub_categories":["Knowledge Graphs"],"readme":"lda2vec: Tools for interpreting natural language\n=================================================\n\n.. image:: http://img.shields.io/badge/license-MIT-blue.svg?style=flat\n    :target: https://github.com/cemoody/lda2vec/blob/master/LICENSE\n\n.. image:: https://readthedocs.org/projects/lda2vec/badge/?version=latest\n    :target: http://lda2vec.readthedocs.org/en/latest/?badge=latest\n\n.. image:: https://travis-ci.org/cemoody/lda2vec.svg?branch=master\n    :target: https://travis-ci.org/cemoody/lda2vec\n\n.. image:: https://img.shields.io/badge/coverage-93%25-green.svg\n    :target: https://travis-ci.org/cemoody/lda2vec\n\n.. image:: https://img.shields.io/twitter/follow/chrisemoody.svg?style=social\n    :target: https://twitter.com/intent/follow?screen_name=chrisemoody\n\n.. image:: lda2vec_network_publish_text.gif\n\n\nThe lda2vec model tries to mix the best parts of word2vec and LDA\ninto a single framework. word2vec captures powerful relationships \nbetween words, but the resulting vectors are largely uninterpretable\nand don't represent documents. LDA on the other hand is quite\ninterpretable by humans, but doesn't model local word relationships\nlike word2vec. We build a model that builds both word and document\ntopics, makes them interpreable,  makes topics over clients, times,\nand documents, and makes them supervised topics.\n\n*Warning*: this code is a big series of experiments. It's research software,\nand we've tried to make it simple to modify lda2vec and to play around with\nyour own custom topic models. However, it's still research software.\nI wouldn't run this in production, Windows, and I'd only use it after you've\ndecided both word2vec and LDA are inadequate and you'd like to tinker with your\nown cool models :) That said, I don't want to discourage experimentation:\nthere's some limited documentation, a modicum of unit tests, and some \ninteractive examples to get you started.\n\n\nResources\n---------\nSee the research paper `Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec \u003chttp://arxiv.org/abs/1605.02019\u003e`_\n\nSee this `Jupyter Notebook \u003chttp://nbviewer.jupyter.org/github/cemoody/lda2vec/blob/master/examples/twenty_newsgroups/lda2vec/lda2vec.ipynb\u003e`_\nfor an example of an end-to-end demonstration.\n\nSee this `slide deck \u003chttp://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec-57135994\u003e`_\nor this `youtube video \u003chttps://www.youtube.com/watch?v=eHcBeVnAiD4\u003e`_\nfor a presentation focused on the benefits of word2vec, LDA, and lda2vec.\n\nSee the `API reference docs \u003chttps://lda2vec.readthedocs.org/en/latest/\u003e`_\n\n\nAbout\n-----\n\n.. image:: images/img00_word2vec.png\n\nWord2vec tries to model word-to-word relationships.\n\n.. image:: images/img01_lda.png\n\nLDA models document-to-word relationships.\n\n.. image:: images/img02_lda_topics.png\n\nLDA yields topics over each document.\n\n.. image:: images/img03_lda2vec_topics01.png\n\nlda2vec yields topics not over just documents, but also regions.\n\n.. image:: images/img04_lda2vec_topics02.png\n\nlda2vec also yields topics over clients.\n\n.. image:: images/img05_lda2vec_topics03_supervised.png\n\nlda2vec the topics can be 'supervised' and forced to predict another target.\n\nlda2vec also includes more contexts and features than LDA. LDA dictates that\nwords are generated by a document vector; but we might have all kinds of\n'side-information' that should influence our topics. For example, a single\nclient comment is about a particular item ID, written at a particular time\nand in a particular region. In this case, lda2vec gives you topics over all\nitems (separating jeans from shirts, for example) times (winter versus summer)\nregions (desert versus coastal) and clients (sporty vs professional attire).\n\nUltimately, the topics are interpreted using the excellent pyLDAvis library:\n\n.. image:: images/img06_pyldavis.gif\n\n\nRequirements\n------------\n\nMinimum requirements:\n\n- Python 2.7+\n- NumPy 1.10+\n- Chainer 1.5.1+\n- spaCy 0.99+\n\n\nRequirements for some features:\n\n- CUDA support\n- Testing utilities: py.test\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcemoody%2Flda2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcemoody%2Flda2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcemoody%2Flda2vec/lists"}