Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/orvn/sidmine
A series of mining scripts for scraping metadata from URLs right in the Shell
https://github.com/orvn/sidmine
Last synced: 12 days ago
JSON representation
A series of mining scripts for scraping metadata from URLs right in the Shell
- Host: GitHub
- URL: https://github.com/orvn/sidmine
- Owner: orvn
- License: apache-2.0
- Created: 2018-09-05T23:16:16.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-03-11T04:59:17.000Z (almost 3 years ago)
- Last Synced: 2024-12-09T16:13:44.926Z (28 days ago)
- Language: Shell
- Size: 12.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sidmine
## WIP pending updates
A series of mining scripts for scraping metadata from URLs right in the Shell***
Currently this shell script is tested in the Bourne Shell. It uses `gawk`, `curl` and unix regex.
Runs on bash 3.2 and above.Run it as a shell script: `./sidemine.sh`
### Roadmap
- [ ] Better error handling
- [ ] Option to deal with multiple matches per page
- [ ] Parse through an array of URLs from an input source
- [ ] Option to extract attributes
- [ ] Reference an external file like a sitemap
- [ ] Reference a series of nested sitemaps
- [ ] Improve memory use and performance by refactoring without pipes
- [ ] Accept content with any character encoding