https://github.com/jancurn/actor-metadata-extractor

An Apify actor that crawls a list of web pages and extracts various metadata from them.
https://github.com/jancurn/actor-metadata-extractor

actor

Last synced: 5 months ago
JSON representation

An Apify actor that crawls a list of web pages and extracts various metadata from them.

Host: GitHub
URL: https://github.com/jancurn/actor-metadata-extractor
Owner: jancurn
Created: 2020-02-19T16:53:40.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-12-12T02:15:33.000Z (over 3 years ago)
Last Synced: 2025-04-08T04:52:04.631Z (about 1 year ago)
Topics: actor
Language: JavaScript
Homepage: https://apify.com/jancurn/probe-page-resources
Size: 304 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 10
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Metadata extractor

The actor takes a URL of a web page on input,
loads the HTML using a raw HTTP request and then extracts metadata from the HTML.
The result is stored as a JSON file into the default Key-value store associated with
actor run, under the `OUTPUT` key.

For example, for `https://www.apify.com`, the JSON result looks as follows:

```
{
"url": "https://www.apify.com/",
"title": "Web Scraping, Data Extraction and Automation · Apify",
"meta": {
"X-UA-Compatible": "IE=edge,chrome=1",
"viewport": "width=device-width,minimum-scale=1,initial-scale=1",
"copyright": "Copyright© 2019 Apify Technologies s.r.o. All rights reserved.",
"keywords": "web scraper, web crawler, scraping, data extraction, API",
"robots": "index,follow",
"referrer": "origin",
"googlebot": "index,follow",
"description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
"twitter:card": "summary_large_image",
"twitter:creator": "@apify",
"fb:app_id": "1636933253245869",
"og:url": "https://apify.com/",
"og:type": "website",
"og:title": "Web Scraping, Data Extraction and Automation · Apify",
"og:description": "Apify extracts data from websites, crawls lists of URLs and automates workflows on the web. Turn any website into an API in a few minutes!",
"og:image": "https://apify.com/img/og-image.png",
"og:image:alt": "Apify",
"og:image:width": "1200",
"og:image:height": "630",
"og:locale": "en_IE",
"og:site_name": "Apify",
"next-head-count": "19"
}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jancurn/actor-metadata-extractor

Awesome Lists containing this project

README