{"id":13660742,"url":"https://github.com/mylk/ojah","last_synced_at":"2025-04-24T23:30:40.637Z","repository":{"id":52285597,"uuid":"90907406","full_name":"mylk/ojah","owner":"mylk","description":"An aggregator of positive news","archived":false,"fork":false,"pushed_at":"2021-05-01T19:27:16.000Z","size":211,"stargazers_count":10,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-10T15:44:10.208Z","etag":null,"topics":["python3","rss","rss-aggregator","sentiment-analysis"],"latest_commit_sha":null,"homepage":"http://ojah.io/rss","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mylk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-10T20:53:31.000Z","updated_at":"2023-07-08T12:38:09.000Z","dependencies_parsed_at":"2022-09-25T05:25:21.456Z","dependency_job_id":null,"html_url":"https://github.com/mylk/ojah","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mylk%2Fojah","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mylk%2Fojah/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mylk%2Fojah/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mylk%2Fojah/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mylk","download_url":"https://codeload.github.com/mylk/ojah/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250727441,"owners_count":21477316,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python3","rss","rss-aggregator","sentiment-analysis"],"created_at":"2024-08-02T05:01:25.254Z","updated_at":"2025-04-24T23:30:40.183Z","avatar_url":"https://github.com/mylk.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Ojah!\n\n`Ojah!` is a news aggregator that filters out the news to give you only those that are positive!\n\n## How it works\n\nA list of RSS feeds is provided to the application. The application crawls the feeds every now and then, stores the news\nand then scores them by performing sentiment analysis on their title. Finally, you have to visit the web page of `Ojah!`\nor subscribe to the RSS feed to get your positive news!\n\n\n## Technical information\n\n`Ojah!` is written in python3 on top of the `Django` web framework. For sentiment analysis, the `TextBlob`\nmodule is being used.\n\n### The components\n\nCurrently the components are three:\n\n- web app\n- crawler\n- classifier\n\nEach one is placed on a separate Docker container, in case more than one instance is required for any of them.\n\n### Sentiment analysis\n\nThe sentiment analysis classifier currently returns either the `neg` or the `pos` value for negative or positive\nresult respectively. News that are scored with `pos` are served by `Ojah!`.\n\n### Used classifier\n\nUnfortunatelly, the default classifier which I initially used, was not that accurate. I ran a comparison for a short\nperiod of time to find the most accurate between:\n\n- TextBlob \"default\" sentiment polarity classifier\n- TextBlob \"NaiveBayesClassifier\" classifier\n- vaderSentiment \"SentimentIntensityAnalyzer\" classifier\n\nThe most accurate was \"NaiveBayesClassifier\" which I finally kept.\n\n### Corpora\n\nI initially used the Twitter corpora provided by the `nltk` module. Then I used only the corpora produced\nby `Ojah!` to improve the accuracy of the classification.\n\nCustom corpora can be added using the administration dashboard of `Ojah!` where we can change the classification\nof a news item and use it as a corpus, in order to make `Ojah!` learn from its mistakes.\n\n### Used database\n\nPreviously an embedded database was used (SQLite3), but for the case of the need to scale any of the components\n(app, crawler, classifier), a migration to MySQL was performed.\n\n## Use the application\n\nOf course you can clone it and then use, distribute, or hack it.\n\nYou can either run it in your host or use Docker to run it in containers (the Docker container recipes are included).\nIn any case, you firstly have to clone this repository.\n\n### Host installation\n\nSo, you are a traditional type of guy. For this installation you should:\n\n- Install the dependencies of the project:\n\n```\nmake deps_app\nmake deps_corpora\nmake deps_crawler\nmake deps_classifier\n```\n\n- Setup the database, the initial data and static files:\n\n```\nmake init\n```\n\n- Start the application:\n\n```\n./manage.py runserver\n```\n\n- Trigger the crawling of the news items:\n\n```\n./manage.py crawl\n```\n\n- Trigger the classification of the news items:\n\n```\n./manage.py classify\n```\n\n- Re-queue for classification the news items missing score:\n\n```\n./manage.py classify_requeue\n```\n\n- Pre-calculate stats shown in about page:\n\n```\n./manage.py stats_calculate\n```\n\n- Re-queue for classification the news items that have been previously scored as negative:\n\n```\n./manage.py train_self\n```\n\n### Run in containers\n\nSo you love containers like me. Things are really simple here, you can have everything being ran\nwith a couple of commands:\n\n- Build the images (will take a few minutes for the first build):\n\n```\ndocker-compose -f docker-compose.yml -f docker-compose.prod.yml build\n```\n\n- Start the containers:\n\n```\ndocker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d\n```\n\n### Last but not least\n\nIrrelevant to the environment you run the application (your host or containers) the application can\nbe visited at `http://127.0.0.1:8000`.\n\n- Add the RSS feeds you want `Ojah!` to crawl using the administration dashboard at:\n\n```\nhttp://127.0.0.1:8000/admin\n```\n\nThe default username is `ojah` and the password is `ojah` too.\n\n- Point your favorite RSS reader at:\n\n```\nhttp://127.0.0.1:8000/rss\n```\n\n- There is also a web interface (clone of [Hacker news](https://news.ycombinator.com/)) to see the news from your web browser:\n\n```\nhttp://127.0.0.1:8000/web\n```\n\nThe administration dashboard has a few more cool things, like statistics and a simple graph for classification accuracy.\n\n## Hack it!\n\nIf you are hacking using the host environment (instead of containers), you will need to install the development\nenvironment dependencies too:\n\n```\nmake deps_dev\n```\n\nIn case you are hacking using the containers, replace \"docker-compose.prod.yml\" with \"docker-compose.dev.yml\"\nin the above commands to include debugging, testing and linting tools.\n\nAlso, irrelevant to the environment you run the application (your host or containers), you can run the tests by running:\n\n\n```\nmake test\n```\n\nThis will run the tests in a separate \"test\" container.\n\n\nYou can also run a linting process:\n\n```\nmake analyze\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmylk%2Fojah","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmylk%2Fojah","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmylk%2Fojah/lists"}