https://github.com/lilithhafner/extractlinks.jl
Extract links from a webpage
https://github.com/lilithhafner/extractlinks.jl
Last synced: 2 months ago
JSON representation
Extract links from a webpage
- Host: GitHub
- URL: https://github.com/lilithhafner/extractlinks.jl
- Owner: LilithHafner
- License: gpl-3.0
- Created: 2024-01-04T18:12:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-06T17:42:54.000Z (about 1 year ago)
- Last Synced: 2025-01-20T02:06:55.904Z (4 months ago)
- Language: HTML
- Size: 195 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ExtractLinks
[](https://github.com/LilithHafner/ExtractLinks.jl/actions/workflows/CI.yml?query=branch%3Amain)
[](https://codecov.io/gh/LilithHafner/ExtractLinks.jl)
[](https://JuliaCI.github.io/NanosoldierReports/pkgeval_badges/E/ExtractLinks.html)
[](https://github.com/JuliaTesting/Aqua.jl)Extract links from a web-page
```
julia> extract_links("https://julialang.org")
210-element Vector{String}:
"https://julialang.org/libs/bootstrap/bootstrap.min.css"
"https://julialang.org/css/app.css"
"https://julialang.org/css/franklin.css"
⋮
"https://julialang.org/libs/bootstrap/bootstrap.min.js"
"https://www.youtube.com/iframe_api"
```That's all.
Features
- Resolves relative links
- Heuristics to determine what is and is not a link
- Handles malformed web-pages gracefully
- Low compile and precompile times
- Option to avoid web query if you provide the body of the page as a keyword argument
- Filters out duplicates