https://github.com/yureien/text-fragment-scraper

Scrape highlighted text using text fragments
https://github.com/yureien/text-fragment-scraper

hacktoberfest puppeteer scraper text-fragment text-fragment-url text-fragments

Last synced: 3 months ago
JSON representation

Scrape highlighted text using text fragments

Host: GitHub
URL: https://github.com/yureien/text-fragment-scraper
Owner: Yureien
Created: 2022-07-18T01:25:47.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2022-10-09T20:25:34.000Z (over 2 years ago)
Last Synced: 2025-02-08T01:57:02.982Z (3 months ago)
Topics: hacktoberfest, puppeteer, scraper, text-fragment, text-fragment-url, text-fragments
Language: TypeScript
Homepage: https://www.npmjs.com/package/text-fragment-scraper
Size: 46.9 KB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text Fragment Scraper

Obtains the entire highligted text from URLs (using text fragments) and then returns them as an array.

If the text fragment can be extracted directly from the URL without having to open the website, it does that. Else, it scrapes the website text to extract the entire highlighted text.

Uses [Puppeteer](https://www.npmjs.com/package/puppeteer) to scrape the website.

## Example

```js
scrapeURL("https://web.dev/text-fragments/#:~:text=Text%20Fragments%20let%20you%20specify%20a%20text%20snippet%20in%20the%20URL%20fragment");

// Returns the following
[ 'Text Fragments let you specify a text snippet in the URL fragment' ]
// In the above case, it does not scrape the site since the text is present in URL itself.

scrapeURL("https://web.dev/text-fragments/#:~:text=The%20fact%20though,Text%20Fragments%20solve");

// Returns the following
[
'The fact though that I had to open the Developer Tools to find the id of an element speaks volumes about the probability this particular section of the page was meant to be linked to by the author of the blog post.What if I want to link to something without an id? Say I want to link to the ECMAScript Modules in Web Workers heading. As you can see in the screenshot below, the

in question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve'
]
// In this case though, it actually scrapes the entire site for this text.
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yureien/text-fragment-scraper

Awesome Lists containing this project

README

in question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve'
]
// In this case though, it actually scrapes the entire site for this text.
```

https://github.com/yureien/text-fragment-scraper

Awesome Lists containing this project

README

in question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve' ] // In this case though, it actually scrapes the entire site for this text. ```

in question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve'
]
// In this case though, it actually scrapes the entire site for this text.
```