https://github.com/danp/scraperlite
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
https://github.com/danp/scraperlite
golang scraping sqlite
Last synced: 4 days ago
JSON representation
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
- Host: GitHub
- URL: https://github.com/danp/scraperlite
- Owner: danp
- License: bsd-3-clause
- Created: 2022-02-20T18:10:56.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-01-26T18:03:50.000Z (6 months ago)
- Last Synced: 2025-06-30T00:46:18.346Z (14 days ago)
- Topics: golang, scraping, sqlite
- Language: Go
- Homepage:
- Size: 22.5 KB
- Stars: 14
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# scraperlite
Scrape text and HTML based on CSS selectors and save contents to a SQLite database.
Repeated runs save changed content and the observation timestamp.
## Example
``` shell
scraperlite https://go.dev \
popularCLIPackages.html '#main-content > section.WhyGo > div > ul > li:nth-child(2) > div.WhyGo-reasonFooter > div.WhyGo-reasonPackages > ul' \
whyWebDevelopment.txt '#main-content > section.WhyGo > div > ul > li:nth-child(3) > div.WhyGo-reasonDetails > div.WhyGo-reasonText > p'
```In a sqlite3 shell:
``` shell
sqlite> select t, substr(content->'popularCLIPackages'->>'html', 1, 20) || '...' as popular_packages_html,
content->'whyWebDevelopment'->>'txt' as why_web_development
from observations join contents on (contents.id=content_id)
order by t;
+----------------------------------+-------------------------+-----------------------------------------------------------+
| t | popular_packages_html | why_web_development |
+----------------------------------+-------------------------+-----------------------------------------------------------+
| 2025-01-05T18:59:27.496327-04:00 |