{"id":13740581,"url":"https://github.com/d2iq-archive/RENDLER","last_synced_at":"2025-05-08T20:31:55.126Z","repository":{"id":16918887,"uuid":"19680225","full_name":"d2iq-archive/RENDLER","owner":"d2iq-archive","description":"A rendering web crawler for Apache Mesos.","archived":true,"fork":false,"pushed_at":"2019-08-19T18:56:55.000Z","size":2523,"stargazers_count":244,"open_issues_count":15,"forks_count":76,"subscribers_count":175,"default_branch":"master","last_synced_at":"2025-05-01T06:04:26.787Z","etag":null,"topics":["dcos","dcos-orchestration-guild"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/d2iq-archive.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-05-11T23:32:16.000Z","updated_at":"2025-02-22T16:22:18.000Z","dependencies_parsed_at":"2022-07-17T09:46:17.578Z","dependency_job_id":null,"html_url":"https://github.com/d2iq-archive/RENDLER","commit_stats":null,"previous_names":["d2iq-archive/rendler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2iq-archive%2FRENDLER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2iq-archive%2FRENDLER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2iq-archive%2FRENDLER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/d2iq-archive%2FRENDLER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/d2iq-archive","download_url":"https://codeload.github.com/d2iq-archive/RENDLER/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253144874,"owners_count":21861139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dcos","dcos-orchestration-guild"],"created_at":"2024-08-03T04:00:50.118Z","updated_at":"2025-05-08T20:31:54.808Z","avatar_url":"https://github.com/d2iq-archive.png","language":"Python","funding_links":[],"categories":["Frameworks","Python"],"sub_categories":["Experimental/Example/Unsorted"],"readme":"RENDLER :interrobang:\n=====================\n\nA rendering web-crawler framework for [Apache Mesos](http://mesos.apache.org/).\n\n![YES RENDLER](./riddler.jpg?raw=true \"RENDLER\") \n\nSee the [accompanying slides](http://mesosphere.github.io/oscon-mesos-2014/#/) for more context.\n\nRENDLER consists of three main components:\n\n- `CrawlExecutor` extends `mesos.Executor`\n- `RenderExecutor` extends `mesos.Executor`\n- `RenderingCrawler` extends `mesos.Scheduler` and launches tasks with the executors\n\n## Quick Start with Vagrant\n\n### Requirements\n\n- [VirtualBox](http://www.virtualbox.org/) 4.1.18+\n- [Vagrant](http://www.vagrantup.com/) 1.3+\n- [git](http://git-scm.com/downloads) (command line tool)\n\n### Start the `mesos-demo` VM\n\n```bash\n$ wget http://downloads.mesosphere.io/demo/mesos.box -O /tmp/mesos.box\n$ vagrant box add --name mesos-demo /tmp/mesos.box\n$ git clone https://github.com/mesosphere/RENDLER.git\n$ cd RENDLER\n$ vagrant up\n```\n\nNow that the VM is running, you can view the Mesos Web UI here:\n[http://10.141.141.10:5050](http://10.141.141.10:5050)\n\nYou can see that 1 slave is registered and you've got some idle CPUs and Memory. So let's start the Rendler!\n\n### Run RENDLER in the `mesos-demo` VM\n\nCheck implementations of the RENDLER scheduler in the `python`, `go`,\n`scala`, and `cpp` directories. Run instructions are here:\n\n- [Python RENDLER framework](python/README.md)\n- [Go RENDLER framework](go/README.md)\n- [Scala RENDLER framework](scala/README.md)\n- [C++ RENDLER framework](cpp/README.md)\n\nFeel free to contribute your own!\n\n### Generating a pdf of your render graph output\nWith [GraphViz](http://www.graphviz.org) (`which dot`) installed:\n\n```bash\nvagrant@mesos:hostfiles $ bin/make-pdf\nGenerating '/home/vagrant/hostfiles/result.pdf'\n```\n\nOpen `result.pdf` in your favorite viewer to see the rendered result!\n\n**Sample Output**\n\n![Sample Crawl Crawl](http://downloads.mesosphere.io/demo/sample_output.png)\n\n### Shutting down the `mesos-demo` VM\n\n```bash\n# Exit out of the VM\nvagrant@mesos:hostfiles $ exit\n# Stop the VM\n$ vagrant halt\n# To delete all traces of the vagrant machine\n$ vagrant destroy\n```\n\n## Rendler Architecture\n\n### Crawl Executor\n\n- Interprets incoming tasks' `task.data` field as a URL\n- Fetches the resource, extracts links from the document\n- Sends a framework message to the scheduler containing the crawl result.\n\n### Render Executor\n\n- Interprets incoming tasks' `task.data` field as a URL\n- Fetches the resource, saves a png image to a location accessible to the scheduler.\n- Sends a framework message to the scheduler containing the render result.\n\n### Intermediate Data Structures\n\nWe define some common data types to facilitate communication between the scheduler\nand the executors.  Their default representation is JSON.\n\n```python\nresults.CrawlResult(\n    \"1234\",                                 # taskId\n    \"http://foo.co\",                        # url\n    [\"http://foo.co/a\", \"http://foo.co/b\"]  # links\n)\n```\n\n```python\nresults.RenderResult(\n    \"1234\",                                 # taskId\n    \"http://foo.co\",                        # url\n    \"http://dl.mega.corp/foo.png\"           # imageUrl\n)\n```\n\n### Rendler Scheduler\n\n#### Data Structures\n\n- `crawlQueue`: list of urls\n- `renderQueue`: list of urls\n- `processedURLs`: set or urls\n- `crawlResults`: list of url tuples\n- `renderResults`: map of urls to imageUrls\n\n#### Scheduler Behavior\n\nThe scheduler accepts one URL as a command-line parameter to seed the render\nand crawl queues.\n\n1. For each URL, create a task in both the render queue and the crawl queue.\n\n1. Upon receipt of a crawl result, add an element to the crawl results\n   adjacency list.  Append to the render and crawl queues each URL that is\n   _not_ present in the set of processed URLs.  Add these enqueued urls to\n   the set of processed URLs.\n\n1. Upon receipt of a render result, add an element to the render results map.\n\n1. The crawl and render queues are drained in FCFS order at a rate determined\n   by the resource offer stream.  When the queues are empty, the scheduler\n   declines resource offers to make them available to other frameworks running\n   on the cluster.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd2iq-archive%2FRENDLER","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fd2iq-archive%2FRENDLER","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fd2iq-archive%2FRENDLER/lists"}