{"id":16814544,"url":"https://github.com/findus23/se-simulator","last_synced_at":"2025-03-17T03:31:38.795Z","repository":{"id":76007646,"uuid":"124686195","full_name":"Findus23/se-simulator","owner":"Findus23","description":"Generating fun Stack Exchange questions using Markov chains","archived":false,"fork":false,"pushed_at":"2024-05-04T23:14:10.000Z","size":292,"stargazers_count":25,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-02-27T17:35:26.511Z","etag":null,"topics":["markov-chain","stackexchange","stackoverflow"],"latest_commit_sha":null,"homepage":"https://se-simulator.lw1.at/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Findus23.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-10T18:23:52.000Z","updated_at":"2024-05-04T23:14:13.000Z","dependencies_parsed_at":null,"dependency_job_id":"8c15f9eb-f67c-4b1d-ab1b-c10bc2cf7728","html_url":"https://github.com/Findus23/se-simulator","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Findus23%2Fse-simulator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Findus23%2Fse-simulator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Findus23%2Fse-simulator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Findus23%2Fse-simulator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Findus23","download_url":"https://codeload.github.com/Findus23/se-simulator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243841207,"owners_count":20356443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["markov-chain","stackexchange","stackoverflow"],"created_at":"2024-10-13T10:29:44.759Z","updated_at":"2025-03-17T03:31:38.484Z","avatar_url":"https://github.com/Findus23.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# se-simulator\nGenerating fun Stack Exchange questions using Markov chains\n\n### [try it out](http://se-simulator.lw1.at/)\n\n### Requirements\n\n- python 3.5+ (only tested with python 3.6)\n- 7z\n\nFor Debian and similar distribution install with:\n\n```bash\nsudo apt-get install p7zip-full\n```\n\n### Setup\n\n- git clone with submodules\n\n```bash\ngit clone https://github.com/Findus23/se-simulator\ncd se-simulator\ngit submodule init\ngit submodule update\n```\n\n- `pip install -r requirements.txt`\n- create a MySQL database called `se-simulator`\n- rename `config.sample.py` to `config.py` and fill in the database details and create a `secret_key`\n- run `create.py`, which creates the database and fetches the list of SE sites\n- run `apply_colors.py` (which should run really quickly)\n- create folders called `chains`, `download` and `raw` (or syminks to somewhere where more disk space is left)\n- [download](https://archive.org/details/stackexchange] `.7z` files for the sites you want to generate (it's recommend to start with a file \u003c100MB)\n    - If the `.7z` has another name as the site has now, rename it\n- run `consume.py`\n    - It should check the hash, move the file to `raw/`, unpack it and extract the needed content from the `.xml` files into new `.jsonl` files. It also writes the data of the file into the db, so it won't be imported again.\n- now the most important step: run `todb.py`\n    - this will generate the markov chains and save them (or use existing ones on the next run)\n    - afterwards 100 questions will be added to the db, with corresponding answers, titles and usernames\n- run `shuffle.py`\n    - I haven't found a performant way to get a random question without asigning every question an integer and saving the maximum to `count.txt`\n- run `server.py`\n    - this starts the Flask server on `http://127.0.0.1:5000/`\n    - if I didn't miss an important step, the site should be working fine now.\n    \n### other files\n\n- `app.py`: needed for Flask\n- `basemodel.py` and `models.py`: [peewee](https://github.com/coleifer/peewee/) ORM\n- `extra_data.py`: manually collected colors of every site with an custom theme\n- `markov.py`: extending the great [markovify library](https://github.com/jsvine/markovify/) for my use case\n- `parsexml.py`: reading in the Stack Exchange dump XML files with no more than 40MB RAM usage.\n- `text_generator.py`: everything that creates the content and handles the Markov chains\n- `updater.py`: probably not working anymore, checks for newer dump files\n- `utils.py`: everything else\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffindus23%2Fse-simulator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffindus23%2Fse-simulator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffindus23%2Fse-simulator/lists"}