{"id":13961457,"url":"https://github.com/tzano/wren","last_synced_at":"2025-07-21T06:30:58.651Z","repository":{"id":94238288,"uuid":"126643515","full_name":"tzano/wren","owner":"tzano","description":"Wren enables users to discover and explore daily news stories 🗞️📻 📺","archived":false,"fork":false,"pushed_at":"2018-06-06T23:10:53.000Z","size":1463,"stargazers_count":260,"open_issues_count":0,"forks_count":19,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-11-28T07:34:31.335Z","etag":null,"topics":["chatbots","conversational-interface","conversational-journalism","news-hack","nlu"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tzano.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-03-24T21:57:17.000Z","updated_at":"2024-08-22T08:44:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa675cc0-491b-4b11-9880-81b8670774c0","html_url":"https://github.com/tzano/wren","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tzano/wren","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzano%2Fwren","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzano%2Fwren/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzano%2Fwren/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzano%2Fwren/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tzano","download_url":"https://codeload.github.com/tzano/wren/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tzano%2Fwren/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266253498,"owners_count":23900051,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbots","conversational-interface","conversational-journalism","news-hack","nlu"],"created_at":"2024-08-08T17:01:09.582Z","updated_at":"2025-07-21T06:30:57.650Z","avatar_url":"https://github.com/tzano.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Wren\n\n![ScreenShot](/docs/images/wren.jpg)\n\nWren is a tool that enables users to monitor, discover and explore daily news stories.\n\nAs some readers (including myself) want to read the news beyond their cozy filter bubble, Wren has been designed as an experiment to explore, read and listen to news stories through conversational interfaces (Slack,.. etc). \n\n\n## What is Wren ? \n\nWren is a system that allows users to automatically parse news content from myriad sources, use NLP technology to enrich news discovery, and store data to easily source and search the enriched news data through conversational interfaces. The system consists of three layers: \n\n(1) **Data Ingestion:** News aggregation layer to monitor, and ingest data from a list of international news sources. The tool can monitor RSS feeds, parse, acquire, and store the content of several online news sources. \n\n(2) **Data Discovery:** A Content Analysis layer to process and analyze news articles. This modular layer can extract entities, concepts, keywords, taxonomies, perform sentiment analysis on news content and monitor the news popularity on social media. \n\n(3) **Dialog:** A conversational system, trained using [RASA](https://github.com/RasaHQ/rasa_nlu) to query the enriched news content.\n\nThe tool can be also helpful for data journalists to find stories, easily.\n\nThe general view of the system is depicted in Figure 1.\n\n![ScreenShot](/docs/images/wren_news_analytics.png)\n**Figure 1.** Wren Technical Architecture \n\nThe process starts by listening to RSS Feeds to collect recent published content both raw text and metadata (e.g. date, time, title, news source …), the sources are formatted in [Yaml file](wren/config/rss_feeds.yml). There are three main media types: Article, Podcast, and Video. Once the data is collected, content is processed and analyzed through the discovery services based on the media type (text, audio, video). We use [Kafka, as it’s a scalable, fault-tolerant, publish-subscribe messaging system](https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/) at this stage to ease data processing \u0026 ingestion. Data can be stored then to a database, we use MongoDB, a document-based db. Finally, to facilitate the consumption of our services, and make it possible to integrate them in a comprehensive way with other platforms, the enriched news media data are exposed for consumption using conversational layer, powered by RASA. \n\nIn order to build a conversational system, we designed all the intents that users can use to query the data. A survey has been designed to explore how smart assistants are used to consume news. Then, a list of intents has been determined to reflect how people's choices. The full list of intents can be accessed through this [page](/docs/news_assistant.md). We used [Fountain](https://github.com/tzano/fountain), a natural language data augmentation tool, to generate more than [20,000 samples](/wren/data/wren_training_dataset.json). In case you want to build upon the project, you can use the same template to expand it and create more intents that meets your requirements. The file is accessible [here](/wren/data/wren_training_gen_fountain.yaml). \n\nThe idea is to cover the main and the most reliable news sources around the world. We started with dozen of sources, and we are expanding the list to cover more sources. This list of sources does not claim to be a representive sample of all news sources. Recently, Facebook released a list of 1,000 [RSS feeds](https://fbnewsroomus.files.wordpress.com/2016/05/rss-urls.pdf) that it says it uses to crawl for interesting news stories. The goal is to use a similar list to enrich our list of news organizations.\n\n\n### Functionalities \n- Collecting News Media Content (Articles, Podcasts, Videos) from [different news organizations](wren/config/rss_feeds.yml)\n- Extracting entities, concepts, keywords \u0026 taxonomies from News Media Content.\n- Storing Data in one centralized DB\n- We crowdsourced, and trained [the model](/docs/news_assistant.md) with more than 20,000 queries. This [data](/wren/data/wren_training_dataset.json) is freely available for other developers to use. \n- Built Slackbot to query News Media Content through a conversational interface, served by the model.\n- Added support to read articles, by converting Article Content (Text) to Audio. \n- Find \u0026 Listen to podcasts.\n- Find \u0026 Watch videos.  \n\n\n### Project Structure \n- **core** includes main classes to `Article`, `Podcast`, `Video`, `Media Organization`\n- **data_ingestion** contains classes to collect data from [different news organizations](wren/config/rss_feeds.yml).\n- **data_discovery** contains classes to enrich the data using NLP services.\n- **conversations** contains implementations to connect to external messaging platforms \n- **nlu** contains NLU parser to translate questions to queries using wren model\n- **actions** contains classes that handles queries like querying news db, sharing content, reading news, listening to podcasts,..etc\n- **data** includes training dataset and [Fountain](http://github.com/tzano/Fountain) template if you need to generate and enrich training dataset\n- **models** includes models that has been trained using RASA.\n- **config** includes all the configuration files. \n- **connectors** contains interfaces and implementation to connect to database engines.\n\n- **docker** includes `docker-compose` along with other `Dockerfiles` to run our services on Docker.\n\n\n\n### Getting Started\n\nYou can find all the commands to run `Wren` in `Makefile`. There is a `docker-compose` file to launch all the services\n\n- Build\n```bash\n$ build\n```\n\nalternatively, you can navigate to `docker folder`, build the images `docker-compose build` and launch the services `docker-compose up -d` \n\n- Check the existing containers \n```bash\n$ docker-compose ps\n```\n\n- Connect to the container\n```bash\n$ docker exec -it wren_app sh\n```\n\n- Inspect servers in the network\n```bash\n$ docker inspect docker_wren_network\n```\n\n- Train your NLU model, you will find it under `./model/` folder\n```bash\npython -m rasa_nlu.train -c config/nlu_config.json\n```\n\n- Test Rasa Server\n```bash\ncurl 'http://\u003cSERVER\u003e:\u003cPORT\u003e/status'\n```\n\n\n### Configuration Files \nIn order to better manage the project, we use 4 main configuration files\n\n- MongoDB configuration file `db.yml`\n- RSS Feeds configuration file `rss_feeds.yml`\n- Supported discovery services configuration file `services.yml`\n- Social Services keys file `keys.yml`\n- Messaging Platforms `messaging_platforms.yml`\n\n\n### Services \n\nSome of third-party services use libraries that can be downloaded, and installed locally. We use `resources` folder to store these materials.\n\n* SentStrength: \nYou need to download `SentStrength.jar` and `SentStrength_data` dictionaries from SentStrength website. \n \n* Standford NER: \nYou need to download a 151M zipped file (mainly consisting of classifier data objects) from [Stanford Named Entity Recognizer (NER) website](https://nlp.stanford.edu/software/CRF-NER.shtml)\n\n\n## Tests\n\n```sh\npython -m wren.tests.test_news_scrapers\npython -m wren.tests.test_content_parser\npython -m wren.tests.test_content_summarizer\npython -m wren.tests.test_sentiment_analyzer\npython -m wren.tests.test_social_popularity\npython -m wren.tests.test_transcriber\n```\n\n## Demo\n\n![ScreenShot](/docs/images/wren_slack.png)\n**Figure 2.** Wren Slack Chatbot\n\n## References\n- [Why messaging is the future of the news brand](https://splinternews.com/why-messaging-is-the-future-of-the-news-brand-1793854684) by Felix Salmon\n\n## Support\nIf you are having issues, please let us know or submit a pull request.\n\n## Acknowledgement\nThe project uses RSS feeds and APIs to get news content. The content is not distributed, used only for non-profit research project. \n\n## License\nThe project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftzano%2Fwren","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftzano%2Fwren","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftzano%2Fwren/lists"}