{"id":19197114,"url":"https://github.com/dmlen/markovp","last_synced_at":"2025-02-23T04:45:35.813Z","repository":{"id":252962440,"uuid":"841889298","full_name":"DMLen/MarkovP","owner":"DMLen","description":"A basic Markov chain generator written in Python.","archived":false,"fork":false,"pushed_at":"2025-02-02T07:09:58.000Z","size":582,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-02T08:19:15.819Z","etag":null,"topics":["markov-chain","text-generation","text-generator"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DMLen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-13T08:49:12.000Z","updated_at":"2025-02-02T07:10:02.000Z","dependencies_parsed_at":"2024-08-13T17:59:15.036Z","dependency_job_id":"913c4fd3-cd2f-4312-9e42-7edd78a75750","html_url":"https://github.com/DMLen/MarkovP","commit_stats":null,"previous_names":["dmlen/markovp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DMLen%2FMarkovP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DMLen%2FMarkovP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DMLen%2FMarkovP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DMLen%2FMarkovP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DMLen","download_url":"https://codeload.github.com/DMLen/MarkovP/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240271531,"owners_count":19774859,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["markov-chain","text-generation","text-generator"],"created_at":"2024-11-09T12:15:35.610Z","updated_at":"2025-02-23T04:45:35.788Z","avatar_url":"https://github.com/DMLen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Markov-P\n*A basic Markov chain generator written in Python.*\n\nThis is a program that will analyse the contents of a given corpus (any large body of text), and attempt to randomly generate text that is similar. The process behind this is known as a [Markov Process](https://en.wikipedia.org/wiki/Pig_(dice_game)](https://en.wikipedia.org/wiki/Markov_chain))\n\n## How to Run?\nSimply run *markovp.py* and follow the on-screen prompts.\nA plaintext file *sherlock.txt* has been included in this repo as a sample testing file [(Source)](https://sherlock-holm.es/stories/plain-text/advs.txt). \nAfter execution, please note the creation of a new output file with the name you provided. This contains the program output.\n\n## How does it work?\nWhen it comes to text, Markov Chains generate the next word in the sentence based *only* on what is most likely to follow the current word. This is based on an analysis of some source text. As described on Wikipedia, \"What happens next depends only on the state of affairs now\".\n\nWhen this program is pointed at a plaintext file (the corpus), it iterates over every single word in the corpus. Each word is an object with an attached list, and every time the word is found in the text, the word after it is saved onto the original word's list. Thus we end up with a list of every unique word, and each unique word has its own list of words that occur after it. Duplicates are allowed in these lists, which we use to lazily represent the statistical side of it (more occurrences of a following word means it is represented more times in the associated list, meaning a higher chance it is chosen during generation).\n\nTo analyse large texts quickly, a hashtable with quadratic probing is used to store word information.\n\nThis corpus data is then used for generation. A random word from the lexicon is chosen as the starter word (\"seed\") for our generated text. From the list of words following it, we pick one at random and add it to our output. This newest word is then the current word, and from the list of words following it, we pick one at random. This repeats until the desired output length (default: 500 words) has been reached.\n\nThe result is a completely artificial, randomly-generated text that follows the statistical tendencies of the source text. It may appear to be grammatically correct, but it will largely be lackluster and incoherent (this isn't GPT, after all!). It can be something mildly amusing to toy with. Try it for yourself!\n\nSherlock Holmes remains the intellectual property of The Conan Doyle Estate Ltd.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmlen%2Fmarkovp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmlen%2Fmarkovp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmlen%2Fmarkovp/lists"}