{"id":18300309,"url":"https://github.com/docnow/dnflow","last_synced_at":"2025-04-05T13:36:04.143Z","repository":{"id":151429008,"uuid":"55012102","full_name":"DocNow/dnflow","owner":"DocNow","description":"A design prototype for DocNow to learn with","archived":false,"fork":false,"pushed_at":"2017-04-08T01:27:04.000Z","size":1023,"stargazers_count":14,"open_issues_count":4,"forks_count":6,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-21T05:32:46.686Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DocNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-03-29T21:35:42.000Z","updated_at":"2019-10-30T08:30:35.000Z","dependencies_parsed_at":"2023-07-14T16:04:00.149Z","dependency_job_id":null,"html_url":"https://github.com/DocNow/dnflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Fdnflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Fdnflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Fdnflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DocNow%2Fdnflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DocNow","download_url":"https://codeload.github.com/DocNow/dnflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247342706,"owners_count":20923642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-05T15:11:59.837Z","updated_at":"2025-04-05T13:36:04.131Z","avatar_url":"https://github.com/DocNow.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dnflow\n\nAn early experiment in automating a series of actions with Twitter\ndata for docnow. If you want to install dnflow and don't want to manually \nset things up yourself give our \n[Ansible playbook](https://github.com/docnow/dnflow-ansible) a try.\n\nUses [Luigi](http://luigi.readthedocs.org/) for workflow automation.\n\n\n## running it for yourself\n\nFirst create your dnflow configuration file, and add your Twitter application\nkeys to it:\n\n    cp dnflow.cfg.template dnflow.cfg\n\nIf you are running on a non-standard HTTP port, such as the flask default,\n`localhost:5000`, be sure to include the port number in the value of\n`HOSTNAME`, e.g.:\n\n    HOSTNAME = 'localhost:5000'\n\nThe current `summarize.py` is set up to collect a handful of tweets\nbased on a search, then execute a series of counts against it.  This\nwill result in one data file (the source tweets) and several count\nfiles (with the same name under `data/` but with extensions like\n`-urls`, `-hashtags` added on.\n\nAssuming you either have an activated virtualenv or similar sandbox,\ninstall the requirements first:\n```\n% pip install -r requirements\n```\n\nStart the `luigid` central scheduler, best done in another terminal:\n```\n% luigid\n```\n\nTo test the workflow, run the following to kick it off (substituting a\nsearch term of interest):\n```\n% python -m luigi --module summarize RunFlow --term lahoreblast\n```\n\nIt may take a moment to execute the search, which will require repeated\ncalls to the Twitter API.  As soon as it completes, you should have all\nthe mentioned files in your `data/` directory.  The naming scheme isn't\nwell thought out.  This is only a test.\n\nWhile you're at it, take a look at the web ui for luigi's scheduler at:\n\n    http://localhost:8082/\n\n(Assuming you didn't change the port when you started luigid.)\n\n\n## adding the flask UI\n\n`ui.py` contains a simple web app that allows a search to be specified\nthrough the web, queueing workflows to execute in the background, and\nshowing workflow process status and links to completed summaries as well.\nRunning the web UI takes a few more steps.\n\n * Install and run [Redis](http://redis.io/)\n\nRedis can be run without configuration changes, best done in another\nterminal:\n\n```\n% redis-server\n```\n\n * Start a [Redis Queue](http://python-rq.org/) worker\n\nRQ requires a running instance of Redis and one or more workers, also\nbest done in another terminal.\n\n```\n% rq worker\n```\n\n * Create the flask UI backend\n\nA simple SQLite3 database tracks the searches you will create and their\nworkflow status.  Within your dnflow virtual environment:\n\n```\n% sqlite3 db.sqlite3 \u003c schema.sql\n```\n\n * Start the [flask](http://flask.pocoo.org/) UI\n\nThe flask UI shows a list of existing searches, lets you add new ones,\nand links to completed search summaries.  Again, within your dnflow\nvirtual environment, and probably in yet another terminal window:\n\n```\n% python ui.py\n```\n\n\n### The flow, for now\n\nThe luigi workflow is not automated; it needs to be invoked explicitly.\nThe web UI is the wrong place to invoke the workflow because the\nworkflow can run for a long time, yet the UI needs to remain\nresponsive.  For these reasons, the process is separated out with\nthe queue.\n\nWhen a search is added, dnflow adds a job to the queue by defining\na Python subprocess to call the luigi workflow from the commandline.\nRQ enqueues this task for later processing.  If one or more RQ\nworkers are available, the job is assigned and begins.  Because\ndnflow's enqueueing of the job is very fast, it can return an updated\nview promptly.\n\nThe luigi workflow takes as long as it needs, generating static files\nin a distinct directory for each requested search.\n\nIntegration between the web UI and workflows occurs in the UI's\nSQLite database, where search terms are stored with a job id.  When\nthe workflow is assigned to an RQ worker, that search record is\nupdated through an HTTP PUT to the web app at the URL `/job`, with\na reference to the job id and its output directory.  Each individual\ntask within the workflow further updates this same URL with additional\nPUTs upon task start, success, or failure.  This is handled using\n[Luigi's event\nmodel](http://luigi.readthedocs.io/en/stable/api/luigi.event.html) and\nHTTP callbacks/hooks to the UI keep the integration between the two\npieces of the environment simple.  During a workflow, the most recent\ntask status will be recorded in the database, and is available for\ndisplay in the UI.\n\nWith these pieces in place, several requests for new searches can\nbe added rapidly within the UI.  Each search will be run by the\nnext available RQ worker process, so if only one process is available,\nthey will execute in succession, but with more than one worker running,\nmultiple workflows can run in parallel.  The main limitation here\nis the rate limit on Twitter's API.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocnow%2Fdnflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocnow%2Fdnflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocnow%2Fdnflow/lists"}