{"id":13696373,"url":"https://github.com/primaryobjects/lda","last_synced_at":"2025-04-04T05:09:25.780Z","repository":{"id":17455119,"uuid":"20229044","full_name":"primaryobjects/lda","owner":"primaryobjects","description":"LDA topic modeling for node.js","archived":false,"fork":false,"pushed_at":"2024-08-20T21:54:11.000Z","size":46,"stargazers_count":296,"open_issues_count":2,"forks_count":49,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-28T04:14:24.762Z","etag":null,"topics":["ai","artificial-intelligence","javascript","keywords","language","lda","machine-learning","natural-language-processing","nlp","node","node-js","nodejs","topic-modeling","topics"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/primaryobjects.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["primaryobjects"]}},"created_at":"2014-05-27T17:35:48.000Z","updated_at":"2025-03-22T20:10:43.000Z","dependencies_parsed_at":"2024-06-18T16:51:59.553Z","dependency_job_id":"b3606e75-7789-4433-92f3-fec92b44063a","html_url":"https://github.com/primaryobjects/lda","commit_stats":{"total_commits":29,"total_committers":6,"mean_commits":4.833333333333333,"dds":0.4137931034482759,"last_synced_commit":"9c4cb2ba0bd8f84ee7d6c934a7dd8177f630a46e"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Flda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Flda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Flda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Flda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/primaryobjects","download_url":"https://codeload.github.com/primaryobjects/lda/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247123107,"owners_count":20887261,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","javascript","keywords","language","lda","machine-learning","natural-language-processing","nlp","node","node-js","nodejs","topic-modeling","topics"],"created_at":"2024-08-02T18:00:38.971Z","updated_at":"2025-04-04T05:09:25.758Z","avatar_url":"https://github.com/primaryobjects.png","language":"JavaScript","funding_links":["https://github.com/sponsors/primaryobjects"],"categories":["Models","Javascript","JavaScript","[](https://github.com/josephmisiti/awesome-machine-learning/blob/master/README.md#javascript)Javascript","📦 Legacy \u0026 Inactive Projects"],"sub_categories":["Latent Dirichlet Allocation (LDA) [:page_facing_up:](https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf)","Tools","[Tools](#tools-1)","Speech Recognition"],"readme":"﻿LDA\n--------\n\nLatent Dirichlet allocation (LDA) topic modeling in javascript for node.js.\nLDA is a machine learning algorithm that extracts topics and their related keywords from a collection of documents.\n\nIn LDA, a document may contain several different topics, each with their own related terms. The algorithm uses a probabilistic model for detecting the number of topics specified and extracting their related keywords. For example, a document may contain topics that could be classified as beach-related and weather-related. The beach topic may contain related words, such as sand, ocean, and water. Similarly, the weather topic may contain related words, such as sun, temperature, and clouds.\n\nSee http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation\n\n```bash\n$ npm install lda\n```\n\n## Usage\n```javascript\nvar lda = require('lda');\n\n// Example document.\nvar text = 'Cats are small. Dogs are big. Cats like to chase mice. Dogs like to eat bones.';\n\n// Extract sentences.\nvar documents = text.match( /[^\\.!\\?]+[\\.!\\?]+/g );\n\n// Run LDA to get terms for 2 topics (5 terms each).\nvar result = lda(documents, 2, 5);\n```\n\nThe above example produces the following result with two topics (topic 1 is \"cat-related\", topic 2 is \"dog-related\"):\n```\nTopic 1\ncats (0.21%)\ndogs (0.19%)\nsmall (0.1%)\nmice (0.1%)\nchase (0.1%)\n\nTopic 2\ndogs (0.21%)\ncats (0.19%)\nbig (0.11%)\neat (0.1%)\nbones (0.1%)\n```\n\n## Output\n\nLDA returns an array of topics, each containing an array of terms. The result contains the following format:\n\n```\n[ [ { term: 'dogs', probability: 0.2 },\n    { term: 'cats', probability: 0.2 },\n    { term: 'small', probability: 0.1 },\n    { term: 'mice', probability: 0.1 },\n    { term: 'chase', probability: 0.1 } ],\n  [ { term: 'dogs', probability: 0.2 },\n    { term: 'cats', probability: 0.2 },\n    { term: 'bones', probability: 0.11 },\n    { term: 'eat', probability: 0.1 },\n    { term: 'big', probability: 0.099 } ] ]\n```\n\nThe result can be traversed as follows:\n\n```javascript\nvar result = lda(documents, 2, 5);\n\n// For each topic.\nfor (var i in result) {\n\tvar row = result[i];\n\tconsole.log('Topic ' + (parseInt(i) + 1));\n\t\n\t// For each term.\n\tfor (var j in row) {\n\t\tvar term = row[j];\n\t\tconsole.log(term.term + ' (' + term.probability + '%)');\n\t}\n\t\n\tconsole.log('');\n}\n```\n\n## Additional Languages\n\nLDA uses [stop-words](https://en.wikipedia.org/wiki/Stop_words) to ignore common terms in the text (for example: this, that, it, we). By default, the stop-words list uses English. To use additional languages, you can specify an array of language ids, as follows: \n\n```javascript\n// Use English (this is the default).\nresult = lda(documents, 2, 5, ['en']);\n\n// Use German.\nresult = lda(documents, 2, 5, ['de']);\n\n// Use English + German.\nresult = lda(documents, 2, 5, ['en', 'de']);\n```\n\nTo add a new language-specific stop-words list, create a file /lda/lib/stopwords_XX.js where XX is the id for the language. For example, a French stop-words list could be named \"stopwords_fr.js\". The contents of the file should follow the format of an [existing](https://github.com/primaryobjects/lda/blob/master/lib/stopwords_en.js) stop-words list. The format is, as follows:\n\n```javascript\nexports.stop_words = [\n    'cette',\n    'que',\n    'une',\n    'il'\n];\n```\n\n## Setting a Random Seed\n\nA specific random seed can be used to compute the same terms and probabilities during subsequent runs. You can specify the random seed, as follows:\n\n```javascript\n// Use the random seed 123.\nresult = lda(documents, 2, 5, null, null, null, 123);\n```\n\n## Author\n\nKory Becker\nhttp://www.primaryobjects.com\n\nBased on original javascript implementation\nhttps://github.com/awaisathar/lda.js\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimaryobjects%2Flda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprimaryobjects%2Flda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimaryobjects%2Flda/lists"}