{"id":15009307,"url":"https://github.com/boneflame/gpipe43","last_synced_at":"2025-04-09T17:23:46.277Z","repository":{"id":184005036,"uuid":"107316432","full_name":"Boneflame/gpipe43","owner":"Boneflame","description":"A full text RSS generator which can hosted on google app engine","archived":false,"fork":false,"pushed_at":"2018-11-25T17:28:28.000Z","size":1875,"stargazers_count":29,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-23T19:23:02.483Z","etag":null,"topics":["chardet","google-appengine","google-cloud","google-cloud-platform","google-cloud-storage","lxml","python","python27","regex","rss","rss-generator","urllib2","webapp2","webapp2-framework","xpath"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Boneflame.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-10-17T19:47:16.000Z","updated_at":"2025-02-26T21:51:55.000Z","dependencies_parsed_at":null,"dependency_job_id":"75989418-57c0-487c-a70b-061cdf6b4580","html_url":"https://github.com/Boneflame/gpipe43","commit_stats":null,"previous_names":["boneflame/gpipe43"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Boneflame%2Fgpipe43","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Boneflame%2Fgpipe43/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Boneflame%2Fgpipe43/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Boneflame%2Fgpipe43/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Boneflame","download_url":"https://codeload.github.com/Boneflame/gpipe43/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248075451,"owners_count":21043589,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chardet","google-appengine","google-cloud","google-cloud-platform","google-cloud-storage","lxml","python","python27","regex","rss","rss-generator","urllib2","webapp2","webapp2-framework","xpath"],"created_at":"2024-09-24T19:24:21.393Z","updated_at":"2025-04-09T17:23:46.256Z","avatar_url":"https://github.com/Boneflame.png","language":"Python","readme":"gpipe43 is a full text RSS generator which can hosted on Google App Engine. Use Regex to search and format full text from a article, or any other content that you want.\u003cbr\u003e\nInspired by Yahoo Pipes and Feed43.\u003cbr\u003e\nYahoo Pipe RIP.\n\nFeature\n===\n* Support multi page.\n* Display all images of article's gallery.\n* Appending article's comment is possible.\n\nPrepare\n====\n* [Create a new Cloud Platform project and App Engine application](https://cloud.google.com/appengine/docs/standard/python/quickstart)\n* [Create a bucket in google cloud storage](https://cloud.google.com/storage/docs/quickstart-console)\n* [Install Google Cloud SDK Python](https://cloud.google.com/sdk/docs/)\n\nSimple quickstart\n====\n### Edit /main/user_agents.py\n* add UA\n### Edit config.py\n* `prjname`: Name of your project on app engine\n* `bucket_name`: Name of bucket\n* `subdir4bg`: The crawler working under: http://[prjname].appspot.com/[subdir4bg]/[rssname]\n* `subdir4rss`: This is your RSS site: http://[prjname].appspot.com/[subdir4rss]/[rssname]\n### Edit example.py，replace 'example' to your own RSS's name\n* `rssname`: RSS's name.\n* `siteurl`: The website or a RSS feed that you want to generat fulltext RSS.\n* `reg4site`: Regex that can find articles' URL. Leave a blank if siteurl is a feed.\n* `reg4title`: Regex for title of a article. Leave a blank if siteurl is a feed.\n* `reg4pubdate`: Regex for publish date of a article. Leave a blank if siteurl is a feed. The format of pubdate must contain '%Y-%m-%d', otherwise leave a blank.\n* `reg4text`: Regex for main body of a article.\n* `reg4comment`: Regex for comment. Not necessary, can leave it blank. You can also use this Regex to find all the image of a gallery in the article.\n* `reg4nextpage`: Regex for article's next page if there's more than one page.\n* `Anzahl`: How much article will be generated. If there's not only one siteurl, this limit for EVERY SINGLE siteurl instead of for all articleurl from all siteurl. 0 = no limit.\u003cbr\u003e\u003cbr\u003e\n* `*encoding`: Optional. Generally chardet can detect the right encoding, but sometimes it cannot(for example, recognize gb18030 as gb3212), so I use 'replace' option of decode method to avoid illegal character, then there's replacement character in generated feed. So you can specify the encoding of the website. It only influence main text. \n* `rssgen.ausfuehren('use_urllib/use_urlfetch', 'st/mt', siteurl, reg4site, reg4title, reg4pubdate, reg4text, reg4comment, reg4nextpage, Anzahl)`: Generat a RSS from a website.\n* `feed_fulltext.ausfuehren('use_urllib/use_urlfetch', siteurl, reg4nextpage, reg4text, reg4comment, Anzahl, rssname)`: Use this to generat fulltext from a RSS feed.\n\t* `use_urllib`: Use urllib2，with UA\n\t* `use_urlfetch`: Use urlfetch，no UA\n\t* `mt`: Multi threading\n\t* `st`: Single threading\n\n\n### Edit feed_list.py\n* Replace 'example' to your own RSS's name\n\n### app.yaml, cron.yaml\n* Replace subdir4bg, subdir4rss, example to your own.\u003cbr\u003e\nSee official guide: [app.yaml Reference](https://cloud.google.com/appengine/docs/standard/python/config/appref), [Scheduling Tasks With Cron for Python](https://cloud.google.com/appengine/docs/standard/python/config/cron)\n\n### Optional\n* Edit ./main/Vorlage.xml and Vorlage_Error.xml, you can fill the properties of elements 'generator', 'webMaster' and 'copyright'.\n* If you just would like to format an existing feed, see example\\_02.py, then add url and script to app.yaml. It's not necessary to add it in feed\\_list.py and cron.yaml, because the feed will not save in cloud storage.\n\nTest\n====\n    dev_appserver.py [PATH_TO_YOUR_APP]/app.yaml\nStart the crawler: http://localhost:8080/[subdir4bg]/[rssname]\u003cbr\u003e\nWhen done, here to check your RSS: http://localhost:8080/[subdir4rssg]/[rssname]\n\nSee official guide: [Using the Local Development Server](https://cloud.google.com/appengine/docs/standard/python/tools/using-local-server)\n\nUpload to app engine\n====\n* cd to the directory of your project\n\u003egcloud config set project PROJECT_NAME\u003cbr\u003e\n\u003egcloud app deploy app.yaml cron.yaml --version=VERSION_NUMBER\u003cbr\u003e\n\nSee official guide: [Deploying a Python App](https://cloud.google.com/appengine/docs/standard/python/tools/uploadinganapp)\n\n\nExamples\n====\n* [Autoblog](http://misaka19003.appspot.com/feed/autoblog)\n* [Auto Motor und Sport](http://misaka19003.appspot.com/feed/ams)\n* [Motor1](http://misaka19003.appspot.com/feed/motor1)\n* [OMG!Ubuntu](http://misaka19002.appspot.com/feed/omgubuntu)\n* [Engadget中文版](http://misaka19002.appspot.com/feed/engadgetcn)\n* [游民星空|单机游戏](http://misaka19002.appspot.com/feed/gamersky_pcgame)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboneflame%2Fgpipe43","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboneflame%2Fgpipe43","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboneflame%2Fgpipe43/lists"}