Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arshadkazmi42/scraplink
Scraplink library, for scraping links and images url from a webpage
https://github.com/arshadkazmi42/scraplink
crawler mongdb nodejs scraplink url web
Last synced: 2 months ago
JSON representation
Scraplink library, for scraping links and images url from a webpage
- Host: GitHub
- URL: https://github.com/arshadkazmi42/scraplink
- Owner: arshadkazmi42
- License: mit
- Created: 2019-06-15T18:18:40.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2022-08-06T12:35:17.000Z (over 2 years ago)
- Last Synced: 2024-10-07T00:07:50.875Z (3 months ago)
- Topics: crawler, mongdb, nodejs, scraplink, url, web
- Language: JavaScript
- Homepage:
- Size: 177 KB
- Stars: 3
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# scraplink
[![Build](https://github.com/arshadkazmi42/scraplink/actions/workflows/nodejs.yml/badge.svg)](https://github.com/arshadkazmi42/scraplink/actions/workflows/nodejs.yml)
[![NPM Version](https://img.shields.io/npm/v/scraplink.svg)](https://www.npmjs.com/package/scraplink)
[![NPM Downloads](https://img.shields.io/npm/dt/scraplink.svg)](https://www.npmjs.com/package/scraplink)
[![Github Repo Size](https://img.shields.io/github/repo-size/arshadkazmi42/scraplink.svg)](https://github.com/arshadkazmi42/scraplink)
[![LICENSE](https://img.shields.io/npm/l/scraplink.svg)](https://github.com/arshadkazmi42/scraplink/blob/master/LICENSE)
[![Contributors](https://img.shields.io/github/contributors/arshadkazmi42/scraplink.svg)](https://github.com/arshadkazmi42/scraplink/graphs/contributors)
[![Commit](https://img.shields.io/github/last-commit/arshadkazmi42/scraplink.svg)](https://github.com/arshadkazmi42/scraplink/commits/master)Scralink library, for scraping links and assets url from a webpage
## Install
```
npm install scraplink
```## Usage
```javascript
const { Scrapper } = require('scraplink');(async () => {
const { assets, links } = await Scrapper('http://kaspat.com');
console.log(assets);
console.log(links);
})();// Assets URLS
// 'http://www.theie6countdown.com/images/upgrade.jpg',
// 'http://kaspat.com/images/img1.jpg',
// 'http://kaspat.com/images/img2.jpg',
// 'http://kaspat.com/images/img3.jpg',
// 'http://kaspat.com/images/img4.jpg',
// 'http://kaspat.com/images/page1_img1.jpg',
// 'http://kaspat.com/images/icon1.jpg',
// 'http://kaspat.com/images/icon2.jpg',
// 'http://kaspat.com/images/icon3.jpg',
// 'http://kaspat.com/images/icon4.jpg',
// 'http://www.e-zeeinternet.com/count.php?page=986859&style=odometer&nbdigits=8&reloads=1'// Links
// 'http://www.microsoft.com/windows/internet-explorer/default.aspx?ocid=ie6_countdown_bannercode',
// 'http://kaspat.com/index.php',
// 'http://kaspat.com/index.php',
// 'http://kaspat.com/News.php',
// 'http://kaspat.com/Services.php',
// 'http://kaspat.com/Kaspat.php',
// 'http://kaspat.com/Clients.php',
```## API
- `Scrapper`
- Takes url input and scraps assets url and links from the page- `Parse`
- Parse exposes two functions, as defined below- `assets`
- Fetches all the assets from the html data- `links`
- Fetches all the links from the html data- `ScrapperUtil`
- `formatRelativeUrls`
- Formats relative urls to absolute (takes rootUrl and array urls as input)## Contributing
Interested in contributing to this project?
You can log any issues or suggestion related to this library [here](https://github.com/arshadkazmi42/scraplink/issues/new)Read our contributing [guide](CONTRIBUTING.md) on getting started with contributing to the codebase