{"id":32525510,"url":"https://github.com/dasantonym/node-cesspoll","last_synced_at":"2025-10-28T09:50:13.725Z","repository":{"id":22363606,"uuid":"25699806","full_name":"dasantonym/node-cesspoll","owner":"dasantonym","description":":poop: Turd Miner Node Module","archived":false,"fork":false,"pushed_at":"2014-10-29T16:45:20.000Z","size":236,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-03-15T16:11:15.248Z","etag":null,"topics":["crawler","news","poopetry","potty-humour"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dasantonym.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-10-24T17:29:59.000Z","updated_at":"2024-03-15T16:11:15.248Z","dependencies_parsed_at":"2022-08-20T10:40:32.362Z","dependency_job_id":null,"html_url":"https://github.com/dasantonym/node-cesspoll","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dasantonym/node-cesspoll","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasantonym%2Fnode-cesspoll","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasantonym%2Fnode-cesspoll/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasantonym%2Fnode-cesspoll/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasantonym%2Fnode-cesspoll/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dasantonym","download_url":"https://codeload.github.com/dasantonym/node-cesspoll/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dasantonym%2Fnode-cesspoll/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281418064,"owners_count":26497723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","news","poopetry","potty-humour"],"created_at":"2025-10-28T09:50:12.120Z","updated_at":"2025-10-28T09:50:13.720Z","avatar_url":"https://github.com/dasantonym.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cesspoll #\n\n\n## About ##\n\nNode Module to retrieve and save reader comments from major german news sources.\n\nUses jsdom to extract posts from the website's homepages and then stores the news article and the comments to the article.\n\nThe resulting mongodb entries can then be further indexed and analysed.\n\n\n## Sources ##\n\nCurrent news sources are:\n\n* [Spiegel Online](http://www.spiegel.de/)\n* [taz](http://www.taz.de/)\n\n\n## Install ##\n\nYou need nodejs, redis and mongodb.\n\nInstall with\n\n```\nnpm install git://github.com/dasantonym/node-cesspoll.git\n```\n\nTo run it go to ``example/``, copy ``config.default.js`` to ``config.js`` and run\n\n```\nnode app.js\n```\n\n\n## Analysis ##\n\nAs an optional basic form of analysis the comments are broken up into basic fragments, whitespace is removed and then the example from the [Hyphen](http://sourceforge.net/projects/hunspell/files/Hyphen/) library together with a [hyphenation dictionary](https://www.openoffice.org/lingucomponent/download_dictionary.html) is used to extract syllables (see config file). The analysis results are then stored in the mongodb and are constantly analysed while updating the index.\n\n\n## Notes from the author ##\n\nThis is a quick and dirty crawler for a specific art installation so it is not meant to be a fully optimized super-fancy news crawler or something.\n\nIt is not very performant, currently redownloads already crawled pages again and does only pull new articles from the front page as well as comments for already pulled articles.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasantonym%2Fnode-cesspoll","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdasantonym%2Fnode-cesspoll","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdasantonym%2Fnode-cesspoll/lists"}