https://github.com/chifisource/toolipscrawl.jl
toolips-based web-crawlers for julia
https://github.com/chifisource/toolipscrawl.jl
Last synced: 8 months ago
JSON representation
toolips-based web-crawlers for julia
- Host: GitHub
- URL: https://github.com/chifisource/toolipscrawl.jl
- Owner: ChifiSource
- License: mit
- Created: 2023-03-03T01:08:32.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-06-22T20:29:33.000Z (about 1 year ago)
- Last Synced: 2025-09-30T23:47:52.283Z (9 months ago)
- Language: Julia
- Homepage:
- Size: 62.5 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[documentation](https://chifidocs.com/toolips/ToolipsCrawl.jl)
toolips crawl provides web-crawling for all!
This package builds a web-scraping and web-crawling library atop the [toolips](https://github.com/ChifiSource/Toolips.jl) web-development framework. This package prominently features high-level syntax atop the `Toolips` `Component` structure.
- [get started](#get-started)
- [installation](#installation)
- [documentation](#documentation)
- [overview](#overview)
- [contributing guidelines](#contributing)
### get started
- To get started with `ToolipsCrawl`, you will need [julia.](https://julialang.org).
###### installation
After installing Julia, `ToolipsCrawl` may be installed with `Pkg`
```julia
using Pkg; Pkg.add("ToolipsCrawl")
```
Alternatively, `Unstable` may be added for the latest (sometimes broken) changes.
```julia
using Pkg; Pkg.add(name = "ToolipsCrawl", rev = "Unstable")
```
##### documentation
Documentation for `ToolipsCrawl` is available on [chifidocs](https://chifidocs.com/toolips/ToolipsCrawl)
## overview
`ToolipsCrawl` usage centers around the `Crawler` type. This constructor is never called directly in conventional usage of the package, **instead** we use the high-level methods for `scrape` and `crawl`.
- `scrape(f::Function, address::String)` -> `::Crawler`
- `scrape(f::Function, address::String, components::String ...)` -> `::Crawler`
- `crawl(f::Function, address::String)` -> `::Crawler`
- `crawl(f::Function, addresses::String ...)` -> `::Crawler`
As of right now, there are two main functions for grabbing components...
- `get_by_name(crawler::Crawler, name::String)` and `get_by_tag(crawler::Crawler, tag::String)`.
These *getters* are used on the `Crawler` within a scraping function provided to `crawl` or `scrape`.
```julia
using ToolipsCrawl
rows = []
scrape("https://github.com/ChifiSource") do c::Crawler
current_rows = get_by_tag(c, "td")
for row::ToolipsCrawl.Component{:td} in current_rows
push!(rows, row[:text])
end
end
```
```julia
using ToolipsCrawl
titles = []
crawl("https://chifidocs.com") do crawler::Crawler
title_comps = get_by_tag(crawler, "title")
if length(title_comps) > 0
@info "scraped title from " * crawler.address
push!(titles, title_comps[1][:text])
end
end
```
### contributing
`chifi` tries to be quite leniant in accepting pull requests, but following these guidelines will help speed up our processes and make merging your pull-request easier. Please consider the following guidelines:
- Ensure the issue or the upgrade is applicable to the current version of the project on the `Unstable` branch.
- **please pull-request to `Unstable`**
- Open a unique issue for each issues, please do not group multiple problems into a single issue.