{"id":16417574,"url":"https://github.com/dilaouid/FandomScraper","last_synced_at":"2025-10-26T20:31:07.015Z","repository":{"id":169499303,"uuid":"645024364","full_name":"dilaouid/FandomScraper","owner":"dilaouid","description":"📓 [TS] A NodeJS package to scrap fandoms wikis characters page. Only scraps the characters info section and the list of all repertoried characters.","archived":false,"fork":false,"pushed_at":"2025-02-09T16:09:48.000Z","size":459,"stargazers_count":31,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-09T16:33:47.757Z","etag":null,"topics":["api","fandom","nodejs","scraper","typescript","wiki"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dilaouid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-24T18:53:24.000Z","updated_at":"2025-02-09T16:09:51.000Z","dependencies_parsed_at":"2025-02-01T16:36:17.834Z","dependency_job_id":null,"html_url":"https://github.com/dilaouid/FandomScraper","commit_stats":{"total_commits":139,"total_committers":1,"mean_commits":139.0,"dds":0.0,"last_synced_commit":"2ae3b2547a8bb7b097c55de3bfdd9c2011c366d4"},"previous_names":["dilaouid/fandomscraper"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dilaouid%2FFandomScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dilaouid%2FFandomScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dilaouid%2FFandomScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dilaouid%2FFandomScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dilaouid","download_url":"https://codeload.github.com/dilaouid/FandomScraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237985788,"owners_count":19397799,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","fandom","nodejs","scraper","typescript","wiki"],"created_at":"2024-10-11T07:11:35.913Z","updated_at":"2025-10-26T20:31:07.009Z","avatar_url":"https://github.com/dilaouid.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"# FandomScraper\n![logo](https://github.com/dilaouid/FandomScraper/blob/media/logo.png?raw=true)\n\n*First of all, this project wouldn't exist without the hard work of the wiki maintainers. It involves a lot of effort, and this project does not aim to devalue or exploit their work*\n\n## Introduction\n\n**FandomScraper** is a TypeScript library working a **NodeJS** environment designed to simplify and accelerate the process of scraping character lists and informations from various Fandom wikias. With this library, you can effortlessly retrieve data from any Fandom wiki, making it quick and easy to access informations about characters. FandomScraper is highly scalable, allowing for seamless integration with a growing number of wikias.\n\n## How to use\n\n### Installation\n\nYou can install the FandomScraper library using either npm or Yarn.\n#### npm\n```shell\nnpm install fandomscraper\n```\n#### yarn\n```shell\nyarn add fandomscraper\n```\n\n### How to use\n\n#### FandomScraper\n1. Once installed, you can import the `FandomScraper` class into your project as follows:\n```js\nimport { FandomScraper } from 'fandomscraper';\n```\n\nMake sure to adjust the import statement according to your project's setup.\n\n2. Create an instance of `FandomScraper` by providing the wiki name and language in the constructor:\n\n```js\nconst scraper = new FandomScraper('shiki', { lang:  'en' });\n```\n\n- The `name` property should be the name of the wiki you want to scrape from. First, check if the wiki is available. To know the list of the current available wikis, use the following method:\n\n```js\nconst wikis = scraper.getAvailableWikis();\n```\n- The `language` property represents the language of the wiki (e.g., 'fr' for French or 'en' for English).\n\nPlease note that you can still change the characters page url with the method `setCharactersPage(url)` like this:\n```js\nscraper.setCharactersPage('https://shiki.fandom.com/wiki/Category:Characters');\n```\n\n3. Use the available methods to retrieve the desired information. Here are some examples:\n---\n\n-  **Get all characters of the current wiki:**\n\n```js\nconst allCharacters = await scraper\n\t.findAll({ base64:  false, withId:  true, recursive:  true })\n\t.limit(100)\n\t.offset(5)\n\t.attr('age kanji status episode images affiliation occupations')\n\t.attrToArray('affiliation occupations')\n\t.ignore(['muroi'])\n\t.exec();\n```\n\nThis method allows you to retrieve all characters from the wiki.\n\n- The `findAll()` method takes an options object as an argument, which can be used to customize the query. It supports the following options:\n\n-  `base64`: A boolean value that determines whether to return character images in base64 format.\n\n-  `withId`: A boolean value that indicates whether to include the character's ID (corresponding to the wikia's `pageId` value).\n\n-  `recursive`: A boolean value that specifies whether to retrieve additional information from the character's infobox along with their name, URL, and optional ID.\n\n- The `limit()` method is used to set the maximum number of characters to return. In the example above, it's `.limit(100)`.\n\n- The `offset()` method is used to set the starting point for retrieving characters. In the example above, it's `.offset(5)`.\n\n- The `attr()` method is used to specify which properties (keys) from the data source schema should be returned in the query result. It takes a string as an argument, where each property is separated by a space. In the example above, it's `.attr('age kanji status episode images')` (used only if `recursive` is set to `true`).\n\n- The `attrToArray()` method converts the values of specific attributes from strings to arrays. This method is useful when you want the values of certain attributes to be returned as arrays instead of strings, allowing you to retrieve multiple values for those attributes if they exist in the data source. (used only if `recursive` is set to `true`).\n\n- The `ignore()` method is used to specify substrings. Characters whose names contain any of the specified substrings will be ignored. It takes an array of strings as an argument. In the example above, it's `.ignore(['muroi'])`.\n\n- The `exec()` method must be called at the end of the query chain to execute the query and retrieve the result.\n\n  \n\nMake sure to adjust the options, methods, and property names according to your specific use case and schema.\n\n  \n  \n\n---\n\n-  **Get a character by name:**\n\n```js\nconst character = await scraper\n\t.findByName('toshio ozaki', { base64:  false, withId:  true })\n\t.attr('age kanji status episode images affiliation occupations')\n\t.attrToArray('affiliation occupations')\n\t.exec()\n```\nThis method allows you to retrieve a character from the wiki based on their name.\n-   The `findByName()` method takes two arguments:\n    \n    -   The first argument is the name of the character you want to find. In the example above, it's `'toshio ozaki'`.\n    -   The second argument is an options object that can be used to customize the query. It supports the following options:\n        -   `base64`: A boolean value that determines whether to return character images in base64 format.\n        -   `withId`: A boolean value that indicates whether to include the character's ID (corresponding to the wikia's `pageId` value).\n-   The `attr()` method is used to specify which properties (keys) from the data source schema should be returned in the query result. It takes a string as an argument, where each property is separated by a space. In the example above, it's `.attr('age kanji status episode images')` (used only if `recursive` is set to `true`).\n    \n- The `attrToArray()` method converts the values of specific attributes from strings to arrays. This method is useful when you want the values of certain attributes to be returned as arrays instead of strings, allowing you to retrieve multiple values for those attributes if they exist in the data source.\n\n-   The `exec()` method must be called at the end of the query chain to execute the query and retrieve the result.\n\nMake sure to adjust the options, methods, and property names according to your specific use case and schema.\n\n---\n\n-  **Get a character by ID:**\n\n```js\nconst characterById = await scraper\n\t.findById(24013, { base64:  false })\n\t.attr('age kanji status episode images affiliation occupations')\n\t.attrToArray('affiliation occupations')\n\t.exec()\n```\nThis method allows you to retrieve a character from the wiki based on their ID.\n\n-   The `findById()` method takes two arguments:\n    \n    -   The first argument is the ID of the character you want to find. In the example above, it's `24013`.\n    -   The second argument is an options object that can be used to customize the query. It supports the following option:\n        -   `base64`: A boolean value that determines whether to return character images in base64 format.\n-   The `attr()` method is used to specify which properties (keys) from the data source schema should be returned in the query result. It takes a string as an argument, where each property is separated by a space. In the example above, it's `.attr('name kanji age affiliations')`.\n    \n-   The `exec()` method must be called at the end of the query chain to execute the query and retrieve the result.\n\nMake sure to adjust the options, methods, and property names according to your specific use case and schema.\n\n---\n\n-  **Get the metadatas of the wiki:**\n\n```js\nconst metadatas = await scraper.getMetadata({withCount: true});\n```\n\nThis method returns an object specifying global informations about the scrapped wikias. The returned object follows this interface:\n```ts\ninterface IMetaData {\n    // the name of the wiki\n    name: string;\n\n    // the language of the wiki\n    language: 'en' | 'fr';\n\n    // the available attributes of the wiki\n    attributes: string[];\n\n\t// the number of characters in the wiki\n    count?: number;\n\n\t// the available languages of the wiki\n    availableLanguages: string[];\n};\n```\n\n---\n\n-  **Get the total count of characters in the wiki:**\n\n```js\nconst characterCount = await scraper.count();\n```\n\nThis method returns the total number of characters in the specified wikia.\n\n---\n\n4. Handle the retrieved character data based on the IData interface:\n\n```ts\ninterface  IData {\n\tid?: number;\n\tname: string;\n\turl: string;\n\tdata?: IDataset;\n}\n\ninterface  IDataset {\n\t// Standard fields (built-in)\n\tname?: TDataset; // name of the character\n\tkanji?: string; // kanji name of the character\n\tquote?: string | string[]; // quote of the character\n\tromaji?: string; // romaji name of the character\n\tstatus?: string; // status of the character (dead, alive, etc.)\n\tspecies?: TDataset; // race\n\tgender?: string; // gender of the character\n\timages?: string[]; // array of image urls\n\tepisode?: TDataset; // array of episode names where the character first appeared\n\tmanga?: string; // manga chapter where the character first appeared\n\tage?: TDataset; // age of the character\n\tbirthday?: string; // birthday of the character\n\tbloodType?: string; // blood type of the character\n\tzodiac?: string; // zodiac sign of the character\n\thairColor?: string; // hair color of the character\n\teyeColor?: string; // eye color of the character\n\theight?: TDataset; // height of the character\n\tweight?: TDataset; // weight of the character\n\trelatives?: TDataset; // array of relatives of the character\n\taffiliation?: string; // affiliation of the character\n\toccupations?: TDataset; // array of occupations of the character\n\tnationality?: string; // nationality of the character\n\tseiyu?: TDataset; // seiyu of the character\n\tvoiceActor?: TDataset; // voice actor of the character\n\t\n\t// Custom fields support - Add any field your wiki needs\n\t[key: string]: TDataset | string | string[] | undefined;\n}\n\n```\n- The `IData` interface represents the structure of a character object.\n- The `IDataset` interface defines the structure of the character's data.\n\nFeel free to customize the options and explore the capabilities of FandomScraper to efficiently retrieve character informations from various Fandom wikis.\n\nRemember to handle any errors that may occur and adjust the method names and options according to your specific use case.\n\n#### FandomPersonalScraper\n\n**FandomPersonalScraper** is a child class of **FandomScraper** that allows you to specify your own schema for scraping a wiki. It has the same methods as the parent class, but instead of relying on an existing schema, you define the schema in its constructor.\n\nTo use FandomPersonalScraper, follow these steps:\n\n1.  Import the `FandomPersonalScraper` class into your project:\n```js\nimport { FandomPersonalScraper } from 'fandomscraper';\n```\nMake sure to adjust the import statement according to your project's setup.\n\n2.  Create an instance of `FandomPersonalScraper` by providing the scraper schema in the constructor:\n\n```js\nconst personalScraper = new FandomPersonalScraper({\n\turl: 'https://mywikia.fandom.com/wiki/Category:Characters',\n\tpageFormat: 'classic',\n\tdataSource: {\n\t\tname: 'Name',\n\t\tage: 'Age',\n\t\tkanji: 'Kanji',\n\t\tromaji: 'Romaji',\n\t\tstatus: 'Status',\n\t\tgender: 'Gender',\n\t\tspecies: 'Kind',\n\t\timages: {\n\t\t\tidentifier: '.mw-parser-output table img',\n\t\t\tget: function(page) {\n\t\t\t\treturn  page.querySelectorAll(this.identifier);\n\t\t\t}\n\t\t},\n\t\tquotes: {\n\t\t\tidentifier: '.quote',\n\t\t\tget: function(page) {\n\t\t\t\treturn  page.querySelectorAll(this.identifier);\n\t\t\t}\n\t\t},\n\t\tepisode: 'First appearance',\n\t\taffiliation: 'Affiliations'\n\t}\n});\n```\nThe constructor of `FandomPersonalScraper` expects an object that adheres to the `ISchema` interface.\n\n```ts\ntype TPageFormats = 'classic' | 'table-1' | 'table-2' | 'table-3';\n\ninterface IImage {\n\tidentifier: string;\n\tget: Function;\n};\n\n// Interface of where to scrap the page to get the data of the characters (data-source)\ninterface IDataSource {\n\t// Standard fields (built-in) - optional, use what you need\n\tname?: string;\n\tkanji?: string;\n\tquote?: string | IQuote;\n\tromaji?: string;\n\tstatus?: string;\n\tspecies?: string;\n\tgender?: string;\n\timages?: IImage;\n\tepisode?: string;\n\tmanga?: string;\n\tage?: string;\n\taffiliation?: string;\n\thairColor?: string;\n\teyeColor?: string;\n\toccupations?: string;\n\tseiyu?: string;\n\tvoiceActor?: string;\n\trelatives?: string;\n\tbirthday?: string;\n\tzodiac?: string;\n\theight?: string;\n\tweight?: string;\n\tnationality?: string;\n\tbloodType?: string;\n\t\n\t// Custom fields support - Add ANY field your wiki has\n\t[key: string]: string | IImage | IQuote | undefined;\n};\n\ninterface ISchema {\n\t// the url of the wiki characters list to scrape (ex: 'https://dragonball.fandom.com/wiki/Characters')\n\turl:  string;\n\n\t// the format of the characters list page (ex: 'classic')\n\tpageFormat: TPageFormats;\n\n\t// the data-source of the wiki (ex: DragonBallFRDataSource) which will be used to scrape the wiki\n\tdataSource:  IDataSource;\n};\n```\n-   `url`: The URL of the wiki's characters list page, for example: `'https://dragonball.fandom.com/wiki/Characters'`.\n\n-   `pageFormat`: The format of the characters list page, which can be `'classic'`, `'table-1'`, `table-3` or `'table-4'` depending on how the characters page list is structured.\n\n-   `dataSource`: An object specifying the data sources for scraping character pages. It defines properties like `name`, `age`, `kanji`, etc. **You can use the standard fields provided OR add any custom field your wiki has** (e.g., `likes`, `dislikes`, `masters`, `class`, `power_level`, etc.). Each property corresponds to a piece of information about the character. If an element on the character page has a `data-source` attribute, the value of that attribute is used as the property value. Otherwise, the value is taken from the adjacent cell in the table.\n\t-   `images`: An object specifying the data source for scraping character images. It follows the `IImage` interface, which has two properties:\n\t    -   `identifier`: A string that identifies the HTML element(s) containing the images. This can be a CSS selector, XPath, or any other valid selector format.\n\t    -   `get`: A function that takes the `page` document as an argument and returns the selected image elements. This function is responsible for extracting and returning all the image elements that match the specified identifier.\n\nHere's an example of how to define the `images` property in the `dataSource` object:\n```ts\nimages: {\n    identifier: '.mw-parser-output table img',\n    get: function(page: Document) {\n        return page.querySelectorAll(this.identifier);\n    },\n}\n```\n\n(And it works the same way for the the `quotes` property)\n\nIn this example, the `identifier` uses a CSS selector format to select all the image elements within a specific table on the character page. The `get` function receives the `page` document and uses the `querySelectorAll` method to retrieve and return all the selected image elements.\n\nThis allows you to customize the image scraping process based on the specific structure and location of images on your wiki's character pages.\n\nMake sure to provide the appropriate values for your specific wiki.\n\nThis allows you to create a customized scraper that fits the structure and data sources of your wiki.\n\n---\n\n### Key Features\n\n- **Unlimited Custom Fields**: Not limited to predefined fields - add ANY field your wiki has (likes, dislikes, powers, relationships, etc.)\n- **Rapid and Simple Retrieval**: Fast and straightforward approach to fetching informations from any Fandom wikia\n- **Scalability**: Effortless and speedy addition of new wikias with `FandomPersonalScraper`\n- **Database Integration**: The `withId` option provides unique IDs for database storage\n\nFeel free to explore FandomScraper and leverage its capabilities for efficiently gathering informations from various Fandom wikias.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdilaouid%2FFandomScraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdilaouid%2FFandomScraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdilaouid%2FFandomScraper/lists"}