{"id":18552466,"url":"https://github.com/andrejewski/hoo","last_synced_at":"2025-05-15T11:13:14.743Z","repository":{"id":24577719,"uuid":"27985465","full_name":"andrejewski/hoo","owner":"andrejewski","description":"command-line contact information scrapper","archived":false,"fork":false,"pushed_at":"2015-01-02T21:36:24.000Z","size":152,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-15T11:13:06.225Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"isc","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrejewski.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-12-14T06:14:01.000Z","updated_at":"2024-08-12T11:48:54.000Z","dependencies_parsed_at":"2022-08-23T00:11:13.254Z","dependency_job_id":null,"html_url":"https://github.com/andrejewski/hoo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrejewski%2Fhoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrejewski","download_url":"https://codeload.github.com/andrejewski/hoo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254328389,"owners_count":22052633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T21:14:19.166Z","updated_at":"2025-05-15T11:13:14.709Z","avatar_url":"https://github.com/andrejewski.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"Hoo\n===\n\nA contact information scrapping tool for programmatic and command-line use. Hoo will scrape webpages looking for personal websites, email addresses, Twitter handles, and Github usernames and returns completed user profiles in JSON or CSV.\n\n```bash\nnpm install -g hoo\n```\n\n## Command-line Usage\n\nThis is a tool for quick contact information. Just provide a Twitter handle `@compooter` or Github username `^andrejewski` or even just a plain website url `chrisandrejewski.com`, and Hoo will try figure out the remaining details.\n\n```bash\n# these all do the same thing\nhoo @compooter\nhoo ^andrejewski\nhoo chrisandrejewski.com\n```\n\n```js\n{ fullname: 'Chris Andrejewski',\n  github: [ 'andrejewski' ],\n  url: [ 'http://chrisandrejewski.com' ],\n  email: [ 'christopher.andrejewski@gmail.com' ],\n  twitter: [ 'compooter' ] }\n```\n\nHoo works fine with multiple names, although too many will take longer.\n\n```bash\nhoo @compooter ^tj @iamdevloper\n```\n\n### Output as JSON or CSV\n\nBy default, all output is in JSON. Passing the `--csv` flag will change all output to CSV.\n\n```bash\nhoo @compooter --csv\nhoo @compooter -c\n```\n\n```\nfullname,twitter,email,url,github\nChris Andrejewski,compooter,christopher.andrejewski@gmail.com,http://chrisandrejewski.com,andrejewski\n```\n\n### Writing to a file\n\nPass `--output \u003cfilename\u003e` and Hoo will save output to a file instead. It works how you would expect passing the CSV flag as well.\n\n```bash\nhoo @compooter ^tj --output output.json\nhoo @compooter ^tj -o output.json\n```\n\nFor JSON, the results array is grouped into the \"people\" key.\n\n```js\n{\n  \"people\": [\n    {\n      \"fullname\": \"Chris Andrejewski\",\n      \"twitter\": [\n        \"compooter\"\n      ],\n      \"url\": [\n        \"http://chrisandrejewski.com\"\n      ],\n      \"email\": [\n        \"christopher.andrejewski@gmail.com\"\n      ],\n      \"github\": [\n        \"andrejewski\"\n      ]\n    },\n    {\n      \"fullname\": \"TJ Holowaychuk\",\n      \"github\": [\n        \"tj\"\n      ],\n      \"url\": [\n        \"http://tjholowaychuk.com\"\n      ],\n      \"email\": [\n        \"tj@vision-media.ca\"\n      ]\n    }\n  ]\n}\n```\n\n### More options\n\nSee `hoo --help` for more options including colored output, debugging activity, and selecting only certain fields.\n\n## Programmatic Usage\n\nHoo is designed to be entirely configurable. The command-line interface uses some default scrappers but an instance of the `Hoo` class initially has none. Any scrappers are added just as you would add Express/Connect middleware.\n\n```js\nvar Hoo = require('hoo');\nvar hoo = new Hoo()\n\t.use(Hoo.TwitterScrapper)\n\t.use(Hoo.GithubScrapper)\n\t.use(Hoo.DefaultScrapper);\n\nvar names = ['@compooter', '^tj'];\nhoo.run(names, function(error, records) {\n\t// do something awesome\n});\n```\n\n## Scrappers\n\nHoo includes Email (Default), Twitter, and Github web scrappers, but that doesn't mean new ones cannot be made. In fact that is why they all extend the same base `Scrapper` class. Building a new scrapper is easy.\n\n```js\nvar Scrapper = require('hoo').Scrapper;\n\nclass MyScrapper extends Scrapper {\n\tconstructor(options) {\n\t\t/* options passed to new Hoo() are passed to each Scrapper added to it */\n\t}\n\n\texpandArg(arg) {\n\t\t/* this allows the twitter/github scrappers to expand usernames to urls */\n\t\treturn arg;\n\t}\n\n\tprocessWebpage(webpage, record, next) {\n\t\t/* \n\t\t\ttake any webpage and extract contact information to put on the record\n\t\t\tfind new webpage urls to call\n\t\t\tcalling next when done\n\t\t*/\n\t\t/*\n\t\t\tProcess `webpage` like it's jQuery like:\n\t\t\t\tvar $ = webpage; $('#myElement').text();\n\t\t\t(See https://github.com/cheeriojs/cheerio)\n\t\t*/\n\t\tnext(err, [optional urls])\n\t}\n}\n```\n\nNote that while ES6 classes are used, you do not need to extend the Scrapper class for your own scrapper. Just be sure to implement the methods in your prototyped class.\n\n## Contributing\n\nIf you like Hoo enough to contribute, sweet. As the markup of scrapped webpages change, Hoo will need to be updated to match, so open a issue/pull if a scrapper is broken. If you have scrapper you would like to add to Hoo, pull request. Any other issues are welcome too.\n\n```bash\nnpm install # dependencies\nnpm run build # to build\nnpm run pre-publish # to pre-publish for pull requests\n```\n\nFollow me on [Twitter](https://twitter.com/compooter) for updates or just for the lolz and please check out my other [repositories](https://github.com/andrejewski) if I have earned it. I thank you for reading.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrejewski%2Fhoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrejewski%2Fhoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrejewski%2Fhoo/lists"}