{"id":31796942,"url":"https://github.com/yang/web-reader","last_synced_at":"2025-10-10T20:54:04.733Z","repository":{"id":27007505,"uuid":"30471620","full_name":"yang/web-reader","owner":"yang","description":"Listen to web page articles read aloud as a podcast! This web app converts URLs you submit (via bookmarklet/app) into an MP3 audio feed using text-to-speech","archived":false,"fork":false,"pushed_at":"2025-06-08T00:09:45.000Z","size":80,"stargazers_count":9,"open_issues_count":8,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-06-08T01:25:37.425Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-02-07T21:54:30.000Z","updated_at":"2025-06-08T00:09:48.000Z","dependencies_parsed_at":"2025-01-13T00:23:56.781Z","dependency_job_id":"a8e2aed9-7001-47fd-b793-22b85ce40190","html_url":"https://github.com/yang/web-reader","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/yang/web-reader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yang%2Fweb-reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yang%2Fweb-reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yang%2Fweb-reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yang%2Fweb-reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yang","download_url":"https://codeload.github.com/yang/web-reader/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yang%2Fweb-reader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005235,"owners_count":26083864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-10T20:53:23.108Z","updated_at":"2025-10-10T20:54:04.728Z","avatar_url":"https://github.com/yang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- @format --\u003e\n\n# AudioLizard\n\nThis is a REST API server that speech-synthesizes articles (stripping boilerplate, doing sentence segmentation, etc.) into a podcast feed of MP3s which you can consume with your favorite podcasting app. You can feed in articles via bookmarklet or [PWA].\n\nThe PWA is available at \u003chttps://github.com/yang/audiolizard-pwa\u003e.\n\n## Installation\n\n### Ubuntu 18.04\n\nInstall prerequisites on Ubuntu:\n\n    sudo apt-get install \\\n      postgresql-10 \\\n      ffmpeg \\\n      ubuntu-restricted-extras \\\n      postgresql-server-dev-10 \\\n      build-essential \\\n      python-dev \\\n      postfix \\\n      openjdk-11-jdk # Needed for boilerpipe\n    sudo snap install google-cloud-sdk  --classic\n\nRecommended: set up postfix to send via gmail:\n\n    https://www.howtoforge.com/tutorial/configure-postfix-to-use-gmail-as-a-mail-relay/\n\n### OS X\n\nInstall prerequisites on OS X:\n\n    sudo port install postgresql95-server postgresql95 openjdk10\n\n### All platforms\n\nIn this source dir, install the application (e.g. into a virtualenv):\n\n    pip install --process-dependency-links -e .\n\nDownload nltk data:\n\n    python -c 'import nltk; nltk.download()'\n\nCreate the necessary postgresql user and DB (and store the password in your [pgpass file]):\n\n    sudo -u postgres createuser -P webreader\n    sudo -u postgres createdb -O webreader webreader\n    touch ~/.pgpass\n    chmod 600 ~/.pgpass\n\n    cat \u003e\u003e ~/.pgpass\n    localhost:5432:webreader:webreader:PASSWORD\n    ^D\n\n    psql -h localhost -U webreader webreader\n    create table articles (id serial primary key, url text, created timestamp not null, title text, body text, converted timestamp);\n    ^D\n\n### Google Cloud Setup\n\nFollow https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries.\n\nConfigure the `GOOGLE_APPLICATION_CREDENTIALS` env var as described, pointing to your json credentials.\nYou may want to put this in a `.mise.local.toml`.\n\nMake sure the `gcloud` command works!\nAnd try it out with:\n\n    web-reader convert http://example.com out.mp3\n\n## Usage\n\n### Converting Single Document\n\nRun:\n\n    web-reader convert URL OUTMP3\n\nFor example:\n\n    web-reader convert https://medium.com/@AnandWrites/209ffc24ab90 aspen.mp3\n\nYou can also select a local plain-text file:\n\n    web-reader convert-file ~/Documents/article.txt article.mp3\n\n### Basic App Server\n\nOne-time: run `web-reader init` to set up the DB and MP3 dir.\n\nRun the web server with just `web-reader webserver`.\n\nRun the converter daemon with `web-reader converter`.\n\nTo set up Google TTS API auth, run with the appropriate environment, e.g.:\n\n    GOOGLE_APPLICATION_CREDENTIALS=... web-reader converter\n\nTry submitting a web page with \u003chttp://localhost:5000/api/v1/enqueue?url=SOMEURL\u003e.\n\nFor normal on-going use, you can use a handy bookmarklet for one-click submission of your current page. It tries to extract the main body content by default, but you can also just have some text on the page already selected when you press the bookmarklet to process just that selection:\n\n    javascript:var r=new XMLHttpRequest();try{r.open('POST','http://localhost:5000/api/v1/enqueue',false);r.setRequestHeader(\"Content-Type\", \"application/json;charset=UTF-8\");r.send(JSON.stringify({url:document.location.href,body:window.getSelection?window.getSelection().toString():document.selection.createRange().text}));alert('done');}catch(e){alert('failed');}\n\nFinally, you can reap the fruits by subscribing to the output Podcast RSS feed with \u003chttp://localhost:5000/feed\u003e.\n\nIn all of the above, you should replace `localhost:5000` with whatever final server you're hosting on.\n\n### Extended App Server Setup\n\nTo make your app server run at system startup, you can use something like the following Upstart scripts:\n\nhttps://github.com/yang/personal-cm/tree/master/roles/webreader/etc/init\n\nTo make your web server accessible outside your firewall, [localtunnel] is a quick solution.\n\n[localtunnel]: http://localtunnel.me/\n\n### Resubmitting Old Failed Articles\n\nYou can use the `reconvert` sub-command to force certain article IDs to get\nre-processed. This will just produce new MP3s in place. No new feed entries\nare created. Text won't be re-extracted - only the audio synthesis is rerun.\n\nYou can also use the `resubmit` sub-command to retry submitting old articles that\nfailed. This will create new feed entries (and won't affect old entries).\nFor instance,\n\n    web-reader resubmit http://localhost:5000 -d 2015-11-01 -o oldest -n 20\n\nwill resubmit to the AudioLizard web server running on port 5000 all the URLs\nthat were _ever only_ failures (i.e., ignore failures that got converted\nsuccessfully another time), limited to a batch of 20 (starting with the oldest\nfirst).\n\nThis logic only considers distinct URLs as candidates for resubmission, which\nis usually the correct behavior.\n\n## How It Works\n\n1. Enqueue the web article URL into a [PQ] queue.\n2. Fetch the page.\n3. Extract the main body content (remove boilerplate) using trafilatura.\n4. Clean up the text with [ftfy].\n5. Split up text into sentences (iSpeech API only takes small chunks of text) using [nltk] sentence segmentation.\n6. Submit each sentence to the Google Cloud TTS API to get back many MP3s.\n7. Combine the MP3s (with 500ms of silence in between sentences) using [pydub].\n8. Generate the podcast feed with [feedgen].\n\n[pq]: https://github.com/malthe/pq/\n[ftfy]: https://github.com/LuminosoInsight/python-ftfy\n[nltk]: http://www.nltk.org/\n[pydub]: http://pydub.com/\n[feedgen]: https://github.com/lkiesow/python-feedgen\n[pgdg]: https://wiki.postgresql.org/wiki/Apt\n[pgpass file]: http://www.postgresql.org/docs/9.3/static/libpq-pgpass.html\n[pwa]: https://github.com/yang/audiolizard-pwa\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyang%2Fweb-reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyang%2Fweb-reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyang%2Fweb-reader/lists"}