{"id":13797433,"url":"https://github.com/bwhite/hadoopy","last_synced_at":"2026-01-10T04:55:36.130Z","repository":{"id":696141,"uuid":"340897","full_name":"bwhite/hadoopy","owner":"bwhite","description":"Python MapReduce library written in Cython. Visit us in #hadoopy on freenode.  See the link below for documentation and tutorials.","archived":false,"fork":false,"pushed_at":"2016-01-08T21:07:56.000Z","size":4015,"stargazers_count":243,"open_issues_count":57,"forks_count":59,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-05-08T11:17:27.780Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"wendal/goweixin","license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bwhite.png","metadata":{"files":{"readme":"README","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2009-10-18T01:25:29.000Z","updated_at":"2024-12-13T16:21:37.000Z","dependencies_parsed_at":"2022-07-05T09:41:33.256Z","dependency_job_id":null,"html_url":"https://github.com/bwhite/hadoopy","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bwhite%2Fhadoopy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bwhite%2Fhadoopy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bwhite%2Fhadoopy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bwhite%2Fhadoopy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bwhite","download_url":"https://codeload.github.com/bwhite/hadoopy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253876745,"owners_count":21977629,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-04T00:00:19.371Z","updated_at":"2026-01-10T04:55:36.057Z","avatar_url":"https://github.com/bwhite.png","language":"C","readme":"Brandyn White \u003cbwhite@dappervision.com\u003e\nAndrew Miller \u003camiller@dappervision.com\u003e\n\nSource  https://github.com/bwhite/hadoopy/\nIssues  https://github.com/bwhite/hadoopy/issues\nDocs    http://bwhite.github.com/hadoopy/\n\nIRC: #hadoopy @ freenode.net\n\nRequirements\npython development headers (python-dev), build tools (build-essential)\n\nOptional\ncython (\u003e=.13) (without this it falls back to the pregenerated .c files)\n\nFeatures\n- oozie support\n- Automated job parallelization 'auto-oozie' available in the hadoopy_flow project (maintained out of branch)\n- typedbytes support (very fast)\n- Local execution of unmodified MapReduce job with launch_local\n- Read/write sequence files of TypedBytes directly to HDFS from python (readtb, writetb)\n- Works on OS X\n- Allows printing to stdout and stderr in Hadoop tasks without causing problems (uses the 'pipe hopping' technique, both are available in the task's stderr)\n- critical path is in Cython\n- works on clusters without any extra installation, Python, or any Python libraries (uses Pyinstaller that is included in this source tree)\n- Simple HDFS access (readtb and ls) inside Python, even inside running jobs\n- Unit test interface\n- Reporting using status and counters (and print statements! no need to be scared of them in Hadoopy)\n- Supports design patterns in the Lin/Dyer book (http://www.umiacs.umd.edu/~jimmylin/book.html)\n\nLimitations\n- Hadoop Local currently unsupported due to a bug in Hadoop's handling of the distributed cache in this mode.  Use psuedo-distributed instead for now.  (https://github.com/bwhite/hadoopy/issues/40)\n\nUsed in\n- A Case for Query by Image and Text Content: Searching Computer Help using Screenshots and Keywords (to appear in WWW'11)\n- Web-Scale Computer Vision using MapReduce for Multimedia Data Mining (at KDD'10)\n- Vitrieve: Visual Search engine\n- Picarus: Hadoop computer vision toolbox\n\nUbuntu Install (others are similar)\nsudo apt-get install python-dev build-essential\nsudo python setup.py install","funding_links":[],"categories":["Hadoop"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbwhite%2Fhadoopy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbwhite%2Fhadoopy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbwhite%2Fhadoopy/lists"}