{"id":13415502,"url":"https://github.com/rchipka/node-osmosis","last_synced_at":"2025-05-14T05:10:29.181Z","repository":{"id":26978384,"uuid":"30442018","full_name":"rchipka/node-osmosis","owner":"rchipka","description":"Web scraper for NodeJS","archived":false,"fork":false,"pushed_at":"2023-12-13T04:18:37.000Z","size":856,"stargazers_count":4113,"open_issues_count":115,"forks_count":247,"subscribers_count":73,"default_branch":"master","last_synced_at":"2025-05-09T11:04:42.559Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rchipka.png","metadata":{"files":{"readme":"Readme.md","changelog":"Changes.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2015-02-07T02:17:26.000Z","updated_at":"2025-04-28T04:22:29.000Z","dependencies_parsed_at":"2023-01-14T05:43:59.138Z","dependency_job_id":"6be96b36-4048-4310-811c-a1a8229cfa72","html_url":"https://github.com/rchipka/node-osmosis","commit_stats":{"total_commits":130,"total_committers":11,"mean_commits":"11.818181818181818","dds":"0.20769230769230773","last_synced_commit":"baed7239fc5c22ea8d00a5d2dc45f97b2d64b5c5"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rchipka%2Fnode-osmosis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rchipka%2Fnode-osmosis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rchipka%2Fnode-osmosis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rchipka%2Fnode-osmosis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rchipka","download_url":"https://codeload.github.com/rchipka/node-osmosis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254076848,"owners_count":22010611,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T21:00:49.792Z","updated_at":"2025-05-14T05:10:29.132Z","avatar_url":"https://github.com/rchipka.png","language":"JavaScript","readme":"# Osmosis\n\nHTML/XML parser and web scraper for NodeJS.\n\n[![NPM](https://nodei.co/npm/osmosis.png)](https://www.npmjs.com/package/osmosis)\n\n[![Build Status](https://travis-ci.org/rchipka/node-osmosis.svg)](https://travis-ci.org/rchipka/node-osmosis)\n\n![Downloads](https://img.shields.io/npm/dm/osmosis.svg)\n\n## Features\n\n- Uses native libxml C bindings\n- Clean promise-like interface\n- Supports CSS 3.0 and XPath 1.0 selector hybrids\n- [Sizzle selectors](https://github.com/jquery/sizzle/wiki#other-selectors-and-conventions),\n  [Slick selectors](http://mootools.net/core/docs/1.6.0/Slick/Slick), and\n  [more](https://github.com/rchipka/node-osmosis/blob/master/docs/Selectors.md)\n- No large dependencies like jQuery, cheerio, or jsdom\n- Compose deep and complex data structures\n\n- HTML parser features\n    - Fast parsing\n    - Very fast searching\n    - Small memory footprint\n\n- HTML DOM features\n    - Load and search ajax content\n    - DOM interaction and events\n    - Execute embedded and remote scripts\n    - Execute code in the DOM\n\n- HTTP request features\n    - Logs urls, redirects, and errors\n    - Cookie jar and custom cookies/headers/user agent\n    - Login/form submission, session cookies, and basic auth\n    - Single proxy or multiple proxies and handles proxy failure\n    - Retries and redirect limits\n\n## Example\n\n```javascript\nvar osmosis = require('osmosis');\n\nosmosis\n.get('www.craigslist.org/about/sites')\n.find('h1 + div a')\n.set('location')\n.follow('@href')\n.find('header + div + div li \u003e a')\n.set('category')\n.follow('@href')\n.paginate('.totallink + a.button.next:first')\n.find('p \u003e a')\n.follow('@href')\n.set({\n    'title':        'section \u003e h2',\n    'description':  '#postingbody',\n    'subcategory':  'div.breadbox \u003e span[4]',\n    'date':         'time@datetime',\n    'latitude':     '#map@data-latitude',\n    'longitude':    '#map@data-longitude',\n    'images':       ['img@src']\n})\n.data(function(listing) {\n    // do something with listing data\n})\n.log(console.log)\n.error(console.log)\n.debug(console.log)\n```\n\n## Documentation\n\nFor documentation and examples check out [https://rchipka.github.io/node-osmosis/global.html](https://rchipka.github.io/node-osmosis/global.html)\n\n## Dependencies\n\n- [libxmljs-dom](https://github.com/rchipka/node-libxmljs-dom) - DOM wrapper for [libxmljs](https://github.com/libxmljs/libxmljs) C bindings\n- [needle](https://github.com/tomas/needle) - Lightweight HTTP wrapper\n\n## Donate\n\nPlease consider a donation if you depend on web scraping and Osmosis makes your job a bit easier.\nYour contribution allows me to spend more time making this the best web scraper for Node.\n\n[![Donate](https://www.paypalobjects.com/en_US/i/btn/btn_donate_LG.gif)](https://www.paypal.com/cgi-bin/webscr?item_name=node-osmosis\u0026cmd=_donations\u0026business=NAXMWBMWKUWUU)\n","funding_links":["https://www.paypal.com/cgi-bin/webscr?item_name=node-osmosis\u0026cmd=_donations\u0026business=NAXMWBMWKUWUU"],"categories":["JavaScript","All","Repository","Uncategorized"],"sub_categories":["Crawler","Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frchipka%2Fnode-osmosis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frchipka%2Fnode-osmosis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frchipka%2Fnode-osmosis/lists"}