Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danp/scraperlite
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
https://github.com/danp/scraperlite
golang scraping sqlite
Last synced: about 1 month ago
JSON representation
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
- Host: GitHub
- URL: https://github.com/danp/scraperlite
- Owner: danp
- License: bsd-3-clause
- Created: 2022-02-20T18:10:56.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-02-20T21:11:01.000Z (almost 3 years ago)
- Last Synced: 2024-11-30T16:50:48.605Z (about 1 month ago)
- Topics: golang, scraping, sqlite
- Language: Go
- Homepage:
- Size: 10.7 KB
- Stars: 14
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scraperlite
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
Repeated runs save changed content and the observation timestamp.
## Example
``` shell
scraperlite https://go.dev \
whyGo.html 'body > header > div > nav > div > ul > li:nth-child(1)' \
firstEventWhenWhere.txt '#event_slide0 > div.GoCarousel-eventBody > div > div.GoCarousel-eventDate'
```In a sqlite3 shell:
``` shell
sqlite> select t, json_extract(content, '$.firstEventWhenWhere.txt') as when_where,
substr(json_extract(content, '$.whyGo.html'), 1, 20) || '...' as why_go_html
from observations join contents on (id=content_id)
order by t;
+----------------------------------+-------------------------------+-------------------------+
| t | when_where | why_go_html |
+----------------------------------+-------------------------------+-------------------------+
| 2022-02-20 14:19:34.115801-04:00 | Feb 21, 2022 | Graz, Austria |