{"id":13548488,"url":"https://github.com/kormanowsky/jextract","last_synced_at":"2025-04-12T13:12:54.610Z","repository":{"id":57280609,"uuid":"113070673","full_name":"kormanowsky/jextract","owner":"kormanowsky","description":"Allows extracting data from DOM","archived":false,"fork":false,"pushed_at":"2020-07-18T13:09:19.000Z","size":143,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-12T13:12:48.928Z","etag":null,"topics":["css","css-selector","dom","extract-data","html","javascript","jextract","jquery","js","selector"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kormanowsky.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-04T17:04:15.000Z","updated_at":"2025-01-18T19:14:13.000Z","dependencies_parsed_at":"2022-09-02T21:41:25.302Z","dependency_job_id":null,"html_url":"https://github.com/kormanowsky/jextract","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kormanowsky%2Fjextract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kormanowsky%2Fjextract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kormanowsky%2Fjextract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kormanowsky%2Fjextract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kormanowsky","download_url":"https://codeload.github.com/kormanowsky/jextract/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248571865,"owners_count":21126522,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","css-selector","dom","extract-data","html","javascript","jextract","jquery","js","selector"],"created_at":"2024-08-01T12:01:11.051Z","updated_at":"2025-04-12T13:12:54.581Z","avatar_url":"https://github.com/kormanowsky.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"# jExtract\n## What is this function for?\nIt makes possible extracting data from DOM. \nMay be useful when you are working with data from websites that don't have any data APIs. In that case you can use this function to read data directly from DOM (or HTML string).\n### Live Demo: https://jsfiddle.net/475d24ts/\n### Live Node.js Demo: https://runkit.com/embed/roh6q5x6p919\n### Warning! Internet Explorer is NOT supported since v1.0.0. If you need that browser to be supported, please use v0.0.7 (no longer maintained).\n## Installation \n### Browser \nJust download jextract.min.js and include it in your page. \n### Node.js\n```bash \nnpm install jextract \n```\n## Basic usage\n0. Create a HTML document and include jExtract\n```html\n\u003ch1 id=\"page-title\"\u003eHello, world!\u003c/h1\u003e\n\u003cp id=\"page-content\"\u003eLorem ipsum dolor sit amet.\u003c/p\u003e\n\u003c!-- Your HTML continues here --\u003e\n\u003cscript src=\"/path/to/jextract.min.js\"\u003e\u003c/script\u003e\n\u003cscript\u003e\n    //Your JavaScript goes here\n\u003c/script\u003e\n```\nor \n```javascript \n// Import jExtract \nconst jExtract = require(\"jextract\")\n```\n1. Create a data structure containing CSS selectors of elements from which you need to extract data: \n```javascript\nvar structure = {\n    title: \"#page-title\", \n    content: \"#page-content\"\n};\n```\n2. Just pass your structure as a parameter to jExtract function and call .fromDocument(): \n```javascript\nvar data = jExtract(structure).fromDocument();\n```\n`data` will be:\n```javascript \n{\n    title: \"Hello, world!\",\n    content: \"Lorem ipsum dolor sit amet.\"\n}\n```\n3. Now you can do anything you want with extracted data. \n## Extended usage\n### jExtract has the following ways of usage: \n```javascript \n// 1. Extract data from specified root (root may be an element, a CSS selector or a HTML string)\njExtract(structure).using(options).from(root); // .using() is optional\n// 2. Extract data from the whole document (only in browser)\njExtract(structure).using(options).fromDocument(); // Does not work in Node.js\n```\n### Methods\n|Method|Description|\n|--|--|\n|.fromDocument (Required)|Extracts the data from the whole document. Works only in browser.|\n|.from(root) (Required)|Extracts the data from given `root` (`root` may be an element, a jQuery instance, a CSS selector, a HTML string)|\n|.using(options) (Optional)|Allows passing `options` object to jExtract|\n#### Possible options \n|Name|Description|Possible Values|Default Value|\n|-----|-----------|---------------|-------------|\n|json|Should the output be in JSON format?|`true/false`|`false`|\n\n### JSON as input\nYou can pass structure as JSON.\n```javascript\nvar data = jExtract(\"JSON here\").fromDocument(); \n```\n### Substructures\nYou can add substructures into your main structure. \n```javascript\nvar struct = {\n    key1: 'selector1',\n    key2: {\n        subkey1: 'selector2',\n        subkey2: 'selector3'\n    }\n}, data = jExtract(struct).fromDocument();\n```\n### Options per key\nBy default, jExtract returns the text of matched element(s). But you can change this behavior by passing more than argument in your structure keys (`key: [selector, dataGettingMethod, filterMethod, options]` instead of `key: selector`).\n#### Data getting method\nIt's a function that returns data that is extracted from element. \nDefault: `text()`.\nBefore v0.0.4, jExtract used its own element object that was based on jQuery. \nSince v0.0.4 until v0.0.6, jExtract used a plain jQuery object without any additions/deletions, so you were able to call any jQuery object methods while extracting data with jExtract.\nThere are a few ways to pass data getting method to jExtract: \n1. a string that is a jQuery object method (e. g.: `width`);\n2. an array in which first element is a jQuery object method, and others are parameters for this method (e.g.: `['attr', 'href']`);\n3. your own function that recieves three parameters: `element`, `index`, `elements`: \n```javascript\nvar struct = {\n    key1: ['div', function(element, index, elements){\n        //in element -\u003e one div (current in the loop)\n        //in index -\u003e index of this div\n        //in elements -\u003e all matched elements\n    }]\n}\n```\n\nSince v0.0.7, jExtract uses its own object again, but its behavior was changed. Here's what jExtract does while extracting data: \n1. It looks for method in its own object.\n2. If method was not found, jExtract chacks if jQuery was loaded and then looks for method in jQuery object created from jExtract's own object.\n3. If nothing was found again, it calls `console.error()` and stops extracting data from this structure key.\n\n#### Filter method\n##### Note: filter method is called only if data getting method returns a string.\nIt's a function that filters extracted data.\nDefault: `jExtractText.get()`\njExtractText is a class that exists only in jExtract function, so you can't access it outside of it. In this class I collect useful methods for working with strings. Currently supported: \n1.`jExtractText.get(trim)`: if trim is true - trims text that is stored in jExtractText and returns it. Defaults: `trim = true`.\n2.`jExtractText.match(regexp, index)`: tries to match text with a given regular expression and returns a match with index = `index` if it is given or full match if is not. Defaults: no defaults.\n3.`jExtractText.toInt(leaveNaN)`: tries to parseInt() text. If `leaveNaN` is true returns NaN if it appears. If `leaveNaN` is false returns 0 instead of NaN. Defaults: `leaveNaN = false`.\n4.`jExtractText.toFloat(leaveNaN)`: tries to parseFloat() text. If `leaveNaN` is true returns NaN if it appears. If `leaveNaN` is false returns 0 instead of NaN. Defaults: `leaveNaN = false`.\nIf you want to use one of these methods, just pass a name and optionally arguments as a third parameter in your value: \n```javascript\nvar struct = {\n    key1: [selector, dataGettingMethod, [name, ...arguments]]\n}\n```\nIf you want to use your own method, pass it as third parameter. Your method will recieve 2 parameters: `value` and `index`.\n```javascript\nvar struct = {\n    key1: ['div', 'text', function(value, index){\n        // Value will be a number/boolean/etc., that was returned by dataGettingMethod.\n        // Index is an ordinal number of element (starting with 0)\n        return value;\n    }]\n}\n```\n\nSince v0.0.7, jExtract's behavior with filter methods is the following: \n1. It looks for method in its own jExtractText object.\n2. If nothing was found, it looks for method in a String object created from its own object.\n3. If nothing was found again, it throws an error and stops extracting data from this structure key.\n\n#### Possible options per key\n|Name|Description|Possible Values|Default Value|\n|-----|-----------|---------------|-------------|\n|keepArray|What to do with a single value? jExtract generates an array of values during its loop. If there is less than two elements in the resulting array, the result will contain only first element of this array. If you don't want to lose an array in the result, set this to `true`|`true/false`|`false`|\n\n### Parent elements\n\nBy default, jExtract searches for elements in `\u003chtml\u003e` tag. You can call .from() instead of .fromDocument() to change this:\n```javascript\nvar data = jExtract(structure).from($(\"#someElement\"));\n```\nAlso you can create a substructure that will be applied to each matched element. \nThink of the following HTML: \n```html\n\u003cdiv class=\"user\"\u003e\n  \u003cdiv class=\"uname\"\u003eUser 1\u003c/div\u003e\n  \u003cdiv class=\"uid\"\u003e1\u003c/div\u003e\n  \u003cdiv class=\"uemail\"\u003euser1example.com\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"user\"\u003e\n  \u003cdiv class=\"uname\"\u003eUser 2\u003c/div\u003e\n  \u003cdiv class=\"uid\"\u003e2\u003c/div\u003e\n  \u003cdiv class=\"uemail\"\u003euser2example.com\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"user\"\u003e\n  \u003cdiv class=\"uname\"\u003eUser 3\u003c/div\u003e\n  \u003cdiv class=\"uid\"\u003e3\u003c/div\u003e\n  \u003cdiv class=\"uemail\"\u003euser3example.com\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"user\"\u003e\n  \u003cdiv class=\"uname\"\u003eUser 4\u003c/div\u003e\n  \u003cdiv class=\"uid\"\u003e4\u003c/div\u003e\n  \u003cdiv class=\"uemail\"\u003euser4example.com\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv class=\"user\"\u003e\n  \u003cdiv class=\"uname\"\u003eUser 5\u003c/div\u003e\n  \u003cdiv class=\"uid\"\u003e5\u003c/div\u003e\n  \u003cdiv class=\"uemail\"\u003euser5example.com\u003c/div\u003e\n\u003c/div\u003e\n```\nYou can create an array of all users with each element filled with each user data. In your key pass two parameters: selector and a structure that you want to be applied to this selector.\n```javascript\njExtract({\n    user: ['.user', {\n        name: '.uname',\n        id: ['.uid', [], 'toInt'],\n        email: '.uemail'\n      }]\n}).fromDocument();\n```\nYou will get an array: \n```javascript\n[\n    {name: \"User 1\", id: 1, email: \"user1example.com\"},\n    {name: \"User 2\", id: 2, email: \"user2example.com\"},\n    {name: \"User 3\", id: 3, email: \"user3example.com\"},\n    {name: \"User 4\", id: 4, email: \"user4example.com\"},\n    {name: \"User 5\", id: 5, email: \"user5example.com\"}\n]\n```\n\nSince v0.0.6, it's possible to pass a HTML string as parent element.\n\nSince v0.0.7, it's possible to pass a selector as parent element.\n\n### Referring to current element\n\nIn your values you can refer to current element using `\".\"` as a selector.\n\n### Extending jExtract (since v0.0.7)\n\nYou can extend jExtract's Element and Text objects. \n- To extend jExtract's element:\n```javascript\n//1. Register your method\njExtract.extendElement({ \n    methodName: function(elementInstance){\n        // do something with Element instance \n    }, \n    otherMethodName: ...\n    // etc \n});\n//2. Use your method\nvar data = jExtract({\n    key: ['selector', 'methodName']\n});\n```\n- To extend jExtract's text object:\n```javascript\n//1. Register your method\njExtract.extendText({ \n    methodName: function(textInstance){\n        // do something with Text instance \n    }, \n    otherMethodName: ...\n    // etc \n});\n//2. Use your method\nvar data = jExtract({\n    key: ['selector', false, 'methodName']\n});\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkormanowsky%2Fjextract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkormanowsky%2Fjextract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkormanowsky%2Fjextract/lists"}