https://github.com/eteubert/web_inspector
Elixir web inspector to unfurl URLs
https://github.com/eteubert/web_inspector
elixir
Last synced: 4 months ago
JSON representation
Elixir web inspector to unfurl URLs
- Host: GitHub
- URL: https://github.com/eteubert/web_inspector
- Owner: eteubert
- Created: 2019-01-21T11:35:40.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-10-29T09:00:00.000Z (over 1 year ago)
- Last Synced: 2025-10-31T09:37:23.314Z (7 months ago)
- Topics: elixir
- Language: Elixir
- Homepage:
- Size: 190 KB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WebInspector
Elixir web inspector to unfurl URLs.
## Installation
The package can be installed by adding `web_inspector` to your list of dependencies in `mix.exs`:
```elixir
def deps do
[
{:web_inspector, git: "https://github.com/eteubert/web_inspector.git"}
]
end
```
## Usage
```elixir
WebInspector.unfurl("https://podlove.org")
{:ok,
%{
description: nil,
embed: nil,
icon: %{
height: "32",
type: "icon",
url: "https://podlove.org/files/2014/06/cropped-podlove-avatar-bkd-1024-32x32.png",
width: "32"
},
locations: ["https://podlove.org"],
original_url: "https://podlove.org",
providers: %{
misc: %{
"canonical_url" => nil,
"icons" => [
%{
height: "32",
type: "icon",
url: "https://podlove.org/files/2014/06/cropped-podlove-avatar-bkd-1024-32x32.png",
width: "32"
},
%{
height: "192",
type: "icon",
url: "https://podlove.org/files/2014/06/cropped-podlove-avatar-bkd-1024-192x192.png",
width: "192"
}
],
"title" => "Podlove | Personal Media Development"
},
oembed: %{},
open_graph: %{},
twitter: %{}
},
site_name: "podlove.org",
site_url: "https://podlove.org",
title: "Podlove | Personal Media Development",
url: "https://podlove.org"
}}
```
## Config
### Puppeteer
You can setup your own Puppeteer server if you need full control over the page visit and extracted data. It should return data in the following format:
```js
{
site_name,
title,
description,
image,
url
}
```
The other extractors still run, but data extracted from the Puppeteer takes precedence.
Enable Puppeteer and define its hostname in the config:
```elixir
config :web_inspector, puppeteer_enabled: true
config :web_inspector, puppeteer_host: "localhost:5000"
```
### Scraping Ant
Instead of plain HTTP requests, there is a [Scraping Ant](https://scrapingant.com/) adapter for more reliable results in case you run into bot detection issues. You need to put the API key into the environment variable `SCRAPINGANT_API_KEY`.
For example, create a `.env` file with the content:
```
SCRAPINGANT_API_KEY=abcde...789
```
It will be loaded automatically. Or use your own way of setting environment variables.