Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yureien/text-fragment-scraper
Scrape highlighted text using text fragments
https://github.com/yureien/text-fragment-scraper
hacktoberfest puppeteer scraper text-fragment text-fragment-url text-fragments
Last synced: 3 months ago
JSON representation
Scrape highlighted text using text fragments
- Host: GitHub
- URL: https://github.com/yureien/text-fragment-scraper
- Owner: Yureien
- Created: 2022-07-18T01:25:47.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-10-09T20:25:34.000Z (about 2 years ago)
- Last Synced: 2024-10-10T20:02:55.384Z (3 months ago)
- Topics: hacktoberfest, puppeteer, scraper, text-fragment, text-fragment-url, text-fragments
- Language: TypeScript
- Homepage: https://www.npmjs.com/package/text-fragment-scraper
- Size: 46.9 KB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text Fragment Scraper
Obtains the entire highligted text from URLs (using text fragments) and then returns them as an array.
If the text fragment can be extracted directly from the URL without having to open the website, it does that. Else, it scrapes the website text to extract the entire highlighted text.
Uses [Puppeteer](https://www.npmjs.com/package/puppeteer) to scrape the website.
## Example
```js
scrapeURL("https://web.dev/text-fragments/#:~:text=Text%20Fragments%20let%20you%20specify%20a%20text%20snippet%20in%20the%20URL%20fragment");// Returns the following
[ 'Text Fragments let you specify a text snippet in the URL fragment' ]
// In the above case, it does not scrape the site since the text is present in URL itself.scrapeURL("https://web.dev/text-fragments/#:~:text=The%20fact%20though,Text%20Fragments%20solve");
// Returns the following
[
'The fact though that I had to open the Developer Tools to find the id of an element speaks volumes about the probability this particular section of the page was meant to be linked to by the author of the blog post.What if I want to link to something without an id? Say I want to link to the ECMAScript Modules in Web Workers heading. As you can see in the screenshot below, thein question does not have an id attribute, meaning there is no way I can link to this heading. This is the problem that Text Fragments solve'
]
// In this case though, it actually scrapes the entire site for this text.
```