{"id":38590964,"url":"https://github.com/devmehq/open-graph-extractor","last_synced_at":"2026-01-17T08:24:01.642Z","repository":{"id":37052120,"uuid":"442245303","full_name":"devmehq/open-graph-extractor","owner":"devmehq","description":"Extract Open Graph and Metadata from html in node.js","archived":false,"fork":false,"pushed_at":"2025-08-27T03:54:18.000Z","size":1423,"stargazers_count":9,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-03T01:29:57.599Z","etag":null,"topics":["extractor","metadata-extractor","opengraph","opengraph-tags"],"latest_commit_sha":null,"homepage":"https://dev.me/products/url-scrapper","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devmehq.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-12-27T18:46:36.000Z","updated_at":"2025-09-24T14:25:09.000Z","dependencies_parsed_at":"2023-12-18T01:25:29.607Z","dependency_job_id":"a6ea7fba-97f9-44ef-aa2e-842865b92240","html_url":"https://github.com/devmehq/open-graph-extractor","commit_stats":{"total_commits":158,"total_committers":6,"mean_commits":"26.333333333333332","dds":0.5189873417721519,"last_synced_commit":"99948f8898758a732ce19fed9bd6dcb88a48e3cf"},"previous_names":[],"tags_count":72,"template":false,"template_full_name":null,"purl":"pkg:github/devmehq/open-graph-extractor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmehq%2Fopen-graph-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmehq%2Fopen-graph-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmehq%2Fopen-graph-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmehq%2Fopen-graph-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devmehq","download_url":"https://codeload.github.com/devmehq/open-graph-extractor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmehq%2Fopen-graph-extractor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28504363,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T06:57:29.758Z","status":"ssl_error","status_checked_at":"2026-01-17T06:56:03.931Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["extractor","metadata-extractor","opengraph","opengraph-tags"],"created_at":"2026-01-17T08:24:01.530Z","updated_at":"2026-01-17T08:24:01.616Z","avatar_url":"https://github.com/devmehq.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Open Graph Extractor 🚀\n\n[![Build Status](https://github.com/devmehq/open-graph-extractor/actions/workflows/ci.yml/badge.svg)](https://github.com/devmehq/open-graph-extractor/actions/workflows/ci.yml)\n[![NPM version](https://img.shields.io/npm/v/@devmehq/open-graph-extractor.svg)](https://www.npmjs.com/package/@devmehq/open-graph-extractor)\n[![Downloads](https://img.shields.io/npm/dm/@devmehq/open-graph-extractor.svg)](https://www.npmjs.com/package/@devmehq/open-graph-extractor)\n\n**Fast, lightweight, and comprehensive Open Graph extractor for Node.js with advanced features** \n\nExtract Open Graph tags, Twitter Cards, structured data, and 60+ meta tag types with built-in caching, validation, and bulk processing. Optimized for performance and security.\n\n## ✨ Why Choose This Library?\n\n- 🚀 **Lightning Fast**: Built-in caching with tiny-lru and optimized parsing\n- 🎯 **Production Ready**: Comprehensive error handling, validation, and security features  \n- 🏆 **Most Complete**: Extracts Open Graph, Twitter Cards, JSON-LD, Schema.org, and 60+ meta tags\n- 📊 **Smart Analytics**: Built-in validation, social scoring, and performance metrics\n- 🛡️ **Security First**: HTML sanitization, URL validation, and PII protection (Node.js only)\n- 🔧 **Developer Friendly**: Full TypeScript support, modern async/await API\n\n## 🌟 Key Features\n\n### Core Extraction\n- ✅ **60+ Meta Tags**: Open Graph, Twitter Cards, Dublin Core, App Links\n- ✅ **JSON-LD Extraction**: Complete structured data parsing\n- ✅ **Schema.org Support**: Microdata and RDFa extraction\n- ✅ **Smart Fallbacks**: Intelligent content detection when tags are missing\n\n### Advanced Features  \n- 🖼️ **Smart Media**: Automatic format detection and best image selection\n- 📹 **Rich Metadata**: Video, audio, and responsive image support\n- 💾 **Smart Caching**: Built-in memory cache with tiny-lru\n- 🚀 **Bulk Processing**: Concurrent extraction for multiple URLs\n\n### Quality \u0026 Analytics\n- ✨ **Data Validation**: Comprehensive Open Graph and Twitter Card validation  \n- 📈 **Social Scoring**: 0-100 score for social media optimization\n- 🎯 **SEO Insights**: Performance metrics and recommendations\n- ⏱️ **Performance Tracking**: Detailed timing and statistics\n\n### Security \u0026 Privacy\n- 🛡️ **HTML Sanitization**: XSS protection using Cheerio (Node.js only)\n- 🔐 **PII Protection**: Automatic detection and masking of sensitive data\n- 🌐 **URL Security**: Domain filtering and validation\n- 🚫 **Content Safety**: Malicious content detection\n\n## 📦 Installation\n\n```bash\n# Using yarn (recommended)\nyarn add @devmehq/open-graph-extractor\n\n# Using npm\nnpm install @devmehq/open-graph-extractor\n```\n\n## 🚀 Quick Start\n\n### Basic Usage (Synchronous)\n\n```typescript\nimport axios from 'axios';\nimport { extractOpenGraph } from '@devmehq/open-graph-extractor';\n\n// Fetch HTML and extract Open Graph data\nconst { data: html } = await axios.get('https://example.com');\nconst ogData = extractOpenGraph(html);\n\nconsole.log(ogData);\n// {\n//   ogTitle: 'Example Title',\n//   ogDescription: 'Example Description',\n//   ogImage: 'https://example.com/image.jpg',\n//   twitterCard: 'summary_large_image',\n//   favicon: 'https://example.com/favicon.ico'\n//   // ... 60+ more fields\n// }\n```\n\n### Advanced Usage (Async with All Features)\n\n```typescript\nimport { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';\n\n// Extract with validation, caching, and structured data\nconst result = await extractOpenGraphAsync(html, {\n  extractStructuredData: true,\n  validateData: true,\n  generateScore: true,\n  cache: {\n    enabled: true,\n    ttl: 3600, // 1 hour\n    storage: 'memory'\n  },\n  security: {\n    sanitizeHtml: true,\n    validateUrls: true\n  }\n});\n\nconsole.log(result);\n// {\n//   data: { /* Complete Open Graph data */ },\n//   structuredData: { /* JSON-LD, Schema.org, etc */ },\n//   confidence: 95,\n//   errors: [],\n//   warnings: [],\n//   metrics: { /* Performance data */ }\n// }\n```\n\n## 🎯 Advanced Features\n\n### JSON-LD \u0026 Structured Data Extraction\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  extractStructuredData: true\n});\n\nconsole.log(result.structuredData);\n// {\n//   jsonLD: [...],        // All JSON-LD scripts  \n//   schemaOrg: {...},     // Schema.org microdata\n//   dublinCore: {...},    // Dublin Core metadata\n//   microdata: {...},     // Microdata\n//   rdfa: {...}          // RDFa data\n// }\n```\n\n### Bulk Processing\n\n```typescript\nimport { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';\n\nconst urls = ['url1', 'url2', 'url3'...];\n\nconst results = await extractOpenGraphBulk({\n  urls,\n  concurrency: 5,\n  rateLimit: {\n    requests: 100,\n    window: 60000 // 1 minute\n  },\n  onProgress: (completed, total, url) =\u003e {\n    console.log(`Processing ${completed}/${total}: ${url}`);\n  }\n});\n```\n\n### Validation \u0026 Scoring\n\n```typescript\nimport { validateOpenGraph, generateSocialScore } from '@devmehq/open-graph-extractor';\n\n// Validate Open Graph data\nconst validation = validateOpenGraph(ogData);\nconsole.log(validation);\n// {\n//   valid: false,\n//   errors: [...],\n//   warnings: [...],\n//   score: 75,\n//   recommendations: [...]\n// }\n\n// Get social media score\nconst score = generateSocialScore(ogData);\nconsole.log(score);\n// {\n//   overall: 82,\n//   openGraph: { score: 90, ... },\n//   twitter: { score: 75, ... },\n//   recommendations: [...]\n// }\n```\n\n### Security Features\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  security: {\n    sanitizeHtml: true,      // XSS protection using Cheerio\n    detectPII: true,         // PII detection\n    maskPII: true,           // Mask sensitive data\n    validateUrls: true,      // URL validation\n    allowedDomains: ['example.com'],\n    blockedDomains: ['malicious.com']\n  }\n});\n```\n\n### Caching\n\n```typescript\n// With built-in memory cache (tiny-lru)\nconst result = await extractOpenGraphAsync(html, {\n  cache: {\n    enabled: true,\n    ttl: 3600,              // 1 hour\n    storage: 'memory',\n    maxSize: 1000\n  }\n});\n\n// With custom cache (Redis example)\nimport Redis from 'ioredis';\nconst redis = new Redis();\n\nconst result = await extractOpenGraphAsync(html, {\n  cache: {\n    enabled: true,\n    ttl: 3600,\n    storage: 'custom',\n    customStorage: {\n      async get(key) {\n        const value = await redis.get(key);\n        return value ? JSON.parse(value) : null;\n      },\n      async set(key, value, ttl) {\n        await redis.setex(key, ttl, JSON.stringify(value));\n      },\n      async delete(key) {\n        await redis.del(key);\n      },\n      async clear() {\n        await redis.flushdb();\n      },\n      async has(key) {\n        return (await redis.exists(key)) === 1;\n      }\n    }\n  }\n});\n```\n\n### Enhanced Media Support\n\n```typescript\nconst result = await extractOpenGraphAsync(html);\n\n// Automatically detects and prioritizes best images\nconsole.log(result.data.ogImage);\n// {\n//   url: 'https://example.com/image.jpg',\n//   type: 'jpg',\n//   width: '1200',\n//   height: '630',\n//   alt: 'Description'\n// }\n\n// For multiple images, set allMedia: true\nconst allMediaResult = extractOpenGraph(html, { allMedia: true });\nconsole.log(allMediaResult.ogImage);\n// [\n//   { url: '...', width: '1200', height: '630', type: 'jpg' },\n//   { url: '...', width: '800', height: '600', type: 'png' }\n// ]\n```\n\n## 📋 Complete API Reference\n\n### Core Functions\n\n#### `extractOpenGraph(html, options?)`\n**Synchronous extraction** - Fast and lightweight for basic use cases.\n\n```typescript\nimport { extractOpenGraph } from '@devmehq/open-graph-extractor';\n\nconst data = extractOpenGraph(html, {\n  customMetaTags: [\n    { multiple: false, property: 'article:author', fieldName: 'author' }\n  ],\n  allMedia: true,              // Extract all images/videos\n  ogImageFallback: true,       // Fallback to page images\n  onlyGetOpenGraphInfo: false  // Include fallback content\n});\n```\n\n#### `extractOpenGraphAsync(html, options?)`\n**Asynchronous extraction** - Full feature set with advanced capabilities.\n\n```typescript\nimport { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';\n\nconst result = await extractOpenGraphAsync(html, {\n  // Core options\n  extractStructuredData: true,    // JSON-LD, Schema.org, Microdata\n  validateData: true,             // Data validation\n  generateScore: true,            // SEO/social scoring\n  extractArticleContent: true,    // Article text extraction\n  detectLanguage: true,           // Language detection\n  normalizeUrls: true,           // URL normalization\n  \n  // Advanced features\n  cache: { enabled: true, ttl: 3600 },\n  security: { sanitizeHtml: true, validateUrls: true }\n});\n```\n\n### Configuration Options\n\n#### `IExtractOpenGraphOptions` (Sync)\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `customMetaTags` | Array | `[]` | Custom meta tags to extract |\n| `allMedia` | boolean | `false` | Extract all images/videos instead of just the first |\n| `onlyGetOpenGraphInfo` | boolean | `false` | Skip fallback content extraction |\n| `ogImageFallback` | boolean | `false` | Enable image fallback from page content |\n\n#### `IExtractOpenGraphOptions` (Async) - Extends Sync Options\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `extractStructuredData` | boolean | `false` | Extract JSON-LD, Schema.org, Microdata |\n| `validateData` | boolean | `false` | Validate extracted Open Graph data |\n| `generateScore` | boolean | `false` | Generate SEO/social media score (0-100) |\n| `extractArticleContent` | boolean | `false` | Extract main article text content |\n| `detectLanguage` | boolean | `false` | Detect content language and text direction |\n| `normalizeUrls` | boolean | `false` | Normalize and clean all URLs |\n| `cache` | ICacheOptions | `undefined` | Caching configuration |\n| `security` | ISecurityOptions | `undefined` | Security and validation settings |\n\n#### `ICacheOptions`\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `enabled` | boolean | `false` | Enable caching |\n| `ttl` | number | `3600` | Time-to-live in seconds |\n| `storage` | string | `'memory'` | Storage type: 'memory', 'redis', 'custom' |\n| `maxSize` | number | `1000` | Maximum cache entries (memory only) |\n| `keyGenerator` | Function | - | Custom cache key generator |\n| `customStorage` | ICacheStorage | - | Custom storage implementation |\n\n#### `ISecurityOptions`\n| Option | Type | Default | Description |\n|--------|------|---------|-------------|\n| `sanitizeHtml` | boolean | `false` | Sanitize HTML content (XSS protection) |\n| `detectPII` | boolean | `false` | Detect personally identifiable information |\n| `maskPII` | boolean | `false` | Mask detected PII in results |\n| `validateUrls` | boolean | `false` | Validate and filter URLs |\n| `maxRedirects` | number | `5` | Maximum URL redirects to follow |\n| `timeout` | number | `10000` | Request timeout in milliseconds |\n| `allowedDomains` | string[] | `[]` | Allowed domains whitelist |\n| `blockedDomains` | string[] | `[]` | Blocked domains blacklist |\n\n### Return Types\n\n#### `IOGResult` (Sync)\nBasic extraction result with 60+ fields:\n\n```typescript\n{\n  ogTitle?: string;\n  ogDescription?: string;\n  ogImage?: string | string[] | IOgImage | IOgImage[];\n  ogUrl?: string;\n  ogType?: OGType;\n  twitterCard?: TwitterCardType;\n  favicon?: string;\n  // ... 50+ more fields including:\n  // Twitter Cards, App Links, Article metadata,\n  // Product info, Music data, Dublin Core, etc.\n}\n```\n\n#### `IExtractionResult` (Async)\nEnhanced result with validation and metrics:\n\n```typescript\n{\n  data: IOGResult;              // Extracted Open Graph data\n  structuredData: {             // Structured data extraction\n    jsonLD: any[];\n    schemaOrg: any;\n    microdata: any;\n    rdfa: any;\n    dublinCore: any;\n  };\n  errors: IError[];             // Validation errors\n  warnings: IWarning[];        // Validation warnings\n  confidence: number;           // Confidence score (0-100)\n  confidenceLevel: 'high' | 'medium' | 'low';\n  fallbacksUsed: string[];      // Which fallbacks were used\n  metrics: IMetrics;            // Performance metrics\n  validation?: IValidationResult;  // Validation details (if enabled)\n  socialScore?: ISocialScore;      // Social media scoring (if enabled)\n}\n```\n\n### Utility Functions\n\n#### `validateOpenGraph(data)`\nValidates Open Graph data against specifications.\n\n```typescript\nimport { validateOpenGraph } from '@devmehq/open-graph-extractor';\n\nconst validation = validateOpenGraph(ogData);\nconsole.log(validation);\n// {\n//   valid: boolean,\n//   errors: IError[],\n//   warnings: IWarning[],\n//   score: number,\n//   recommendations: string[]\n// }\n```\n\n#### `generateSocialScore(data)`\nGenerates social media optimization score (0-100).\n\n```typescript\nimport { generateSocialScore } from '@devmehq/open-graph-extractor';\n\nconst score = generateSocialScore(ogData);\nconsole.log(score);\n// {\n//   overall: number,\n//   openGraph: { score, present, missing, issues },\n//   twitter: { score, present, missing, issues },\n//   schema: { score, present, missing, issues },\n//   seo: { score, present, missing, issues },\n//   recommendations: string[]\n// }\n```\n\n#### `extractOpenGraphBulk(options)`\nProcess multiple URLs concurrently with rate limiting.\n\n```typescript\nimport { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';\n\nconst results = await extractOpenGraphBulk({\n  urls: ['url1', 'url2', 'url3'],\n  concurrency: 5,                    // Process 5 URLs simultaneously\n  rateLimit: {                       // Rate limiting\n    requests: 100,                   // Max 100 requests\n    window: 60000                    // Per 60 seconds\n  },\n  continueOnError: true,             // Don't stop on individual failures\n  onProgress: (completed, total, url) =\u003e {\n    console.log(`Progress: ${completed}/${total} - ${url}`);\n  },\n  onError: (url, error) =\u003e {\n    console.error(`Failed to process ${url}:`, error);\n  }\n});\n\nconsole.log(results.summary);\n// {\n//   total: number,\n//   successful: number,\n//   failed: number,\n//   totalDuration: number,\n//   averageDuration: number\n// }\n```\n\n## 🎨 Custom Meta Tags\n\n```typescript\n// Extract custom meta tags\nconst result = extractOpenGraph(html, {\n  customMetaTags: [\n    {\n      multiple: false,\n      property: 'article:author',\n      fieldName: 'articleAuthor'\n    },\n    {\n      multiple: true,\n      property: 'article:tag',\n      fieldName: 'articleTags'\n    }\n  ]\n});\n\nconsole.log(result.articleAuthor); // Custom field\nconsole.log(result.articleTags);   // Array of tags\n```\n\n## 🌟 **Complete Feature Guide**\n\n### **Core Extraction Features**\n\n#### **Meta Tag Extraction (60+ Types)**\n- **Open Graph**: Complete og:* tag support with type validation\n- **Twitter Cards**: All twitter:* tags including player and app cards  \n- **Dublin Core**: dc:* metadata extraction\n- **App Links**: al:* tags for mobile app deep linking\n- **Article Metadata**: Publishing dates, authors, sections, tags\n- **Product Info**: Prices, availability, condition, retailer data\n- **Music Metadata**: Albums, artists, songs, duration\n- **Place/Location**: GPS coordinates and location data\n\n```typescript\n// Automatically extracts all supported meta types\nconst data = extractOpenGraph(html);\nconsole.log(data.ogTitle, data.twitterCard, data.articleAuthor);\n```\n\n#### **Intelligent Fallbacks**\nWhen meta tags are missing, the library intelligently falls back to:\n- `\u003ctitle\u003e` tags for ogTitle\n- Meta descriptions for ogDescription  \n- Page images for ogImage\n- Canonical URLs for ogUrl\n- Page content analysis for missing data\n\n```typescript\n// Fallbacks work automatically\nconst data = extractOpenGraph(html, { ogImageFallback: true });\n// Will find images even if og:image is missing\n```\n\n### **Advanced Extraction Features**\n\n#### **Structured Data Extraction**\n- **JSON-LD**: Parses all `\u003cscript type=\"application/ld+json\"\u003e` blocks\n- **Schema.org**: Extracts microdata with itemscope/itemprop\n- **RDFa**: Resource Description Framework attributes\n- **Microdata**: HTML5 microdata extraction\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  extractStructuredData: true\n});\n\nconsole.log(result.structuredData);\n// {\n//   jsonLD: [{ \"@type\": \"Article\", \"headline\": \"...\" }],\n//   schemaOrg: { \"Product\": { \"name\": \"...\", \"price\": \"...\" }},\n//   microdata: { \"Review\": { \"rating\": \"5\" }},\n//   rdfa: { \"Person\": { \"name\": \"John Doe\" }}\n// }\n```\n\n#### **Content Analysis**\n- **Article Extraction**: Finds and extracts main article content\n- **Reading Time**: Calculates estimated reading time  \n- **Word Count**: Counts words in extracted content\n- **Language Detection**: Auto-detects content language and text direction\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  extractArticleContent: true,\n  detectLanguage: true\n});\n\nconsole.log(result.data.articleContent);  // Main article text\nconsole.log(result.data.readingTime);     // 5 (minutes)\nconsole.log(result.data.language);        // \"en-US\"\nconsole.log(result.data.textDirection);   // \"ltr\"\n```\n\n### **Data Quality Features**\n\n#### **Comprehensive Validation**\n- **Open Graph Validation**: Checks required fields and formats\n- **Twitter Card Validation**: Ensures proper card types and content\n- **URL Validation**: Verifies image and video URLs\n- **Content Validation**: Checks for reasonable field lengths\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  validateData: true\n});\n\nif (!result.validation.valid) {\n  console.log(\"Issues found:\");\n  result.validation.errors.forEach(error =\u003e {\n    console.log(`- ${error.field}: ${error.message}`);\n  });\n  \n  console.log(\"Recommendations:\");\n  result.validation.recommendations.forEach(rec =\u003e {\n    console.log(`- ${rec}`);\n  });\n}\n```\n\n#### **Social Media Scoring**\nGenerates SEO and social media optimization scores (0-100):\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  generateScore: true\n});\n\nconsole.log(`Overall Score: ${result.socialScore.overall}/100`);\nconsole.log(`Open Graph: ${result.socialScore.openGraph.score}/100`);\nconsole.log(`Twitter: ${result.socialScore.twitter.score}/100`);\n\n// Get actionable recommendations\nresult.socialScore.recommendations.forEach(rec =\u003e {\n  console.log(`💡 ${rec}`);\n});\n// 💡 Add og:image for better social sharing\n// 💡 Include twitter:card for Twitter optimization\n```\n\n### **Performance Features**\n\n#### **Smart Caching System**\n- **Memory Cache**: Built-in LRU cache with tiny-lru\n- **Redis Support**: Enterprise-ready Redis caching\n- **Custom Storage**: Implement your own cache backend\n- **TTL Control**: Configurable expiration times\n\n```typescript\n// Memory caching\nconst result = await extractOpenGraphAsync(html, {\n  cache: {\n    enabled: true,\n    ttl: 3600,        // 1 hour\n    maxSize: 1000,    // Max entries\n    storage: 'memory'\n  }\n});\n\n// Redis caching\nconst result = await extractOpenGraphAsync(html, {\n  cache: {\n    enabled: true,\n    ttl: 7200,        // 2 hours  \n    storage: 'redis'  // Requires Redis setup\n  }\n});\n```\n\n#### **Bulk Processing with Rate Limiting**\nProcess multiple URLs efficiently with concurrency control:\n\n```typescript\nconst results = await extractOpenGraphBulk({\n  urls: siteUrls,\n  concurrency: 10,           // 10 simultaneous requests\n  rateLimit: {\n    requests: 100,           // Max 100 requests\n    window: 60000           // Per minute\n  },\n  onProgress: (done, total, url) =\u003e {\n    updateProgressBar(done / total);\n  }\n});\n\nconsole.log(`Processed ${results.summary.successful}/${results.summary.total} URLs`);\n```\n\n#### **Performance Monitoring**\nDetailed metrics for optimization:\n\n```typescript\nconst result = await extractOpenGraphAsync(html);\n\nconsole.log(\"Performance Metrics:\");\nconsole.log(`- Total time: ${result.metrics.performance.totalTime}ms`);\nconsole.log(`- HTML parsing: ${result.metrics.performance.htmlParseTime}ms`);  \nconsole.log(`- Meta extraction: ${result.metrics.performance.metaExtractionTime}ms`);\nconsole.log(`- Found ${result.metrics.metaTagsFound} meta tags`);\nconsole.log(`- Used fallbacks: ${result.fallbacksUsed.join(', ')}`);\n```\n\n### **Security Features**\n\n#### **Content Sanitization**\n- **XSS Protection**: Sanitizes HTML content using Cheerio\n- **URL Validation**: Prevents SSRF attacks\n- **Domain Control**: Allow/block specific domains\n- **Content Filtering**: Remove malicious content\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  security: {\n    sanitizeHtml: true,        // Clean HTML content\n    validateUrls: true,        // Verify all URLs\n    allowedDomains: [          // Only allow these domains\n      'example.com',\n      'cdn.example.com'\n    ],\n    blockedDomains: [          // Block these domains\n      'malicious.com'\n    ],\n    maxRedirects: 3,          // Limit URL redirects\n    timeout: 5000             // 5 second timeout\n  }\n});\n```\n\n#### **Privacy Protection**\n- **PII Detection**: Automatically detects personal information\n- **Data Masking**: Optional masking of sensitive content\n- **Safe Extraction**: Removes potentially harmful data\n\n```typescript\nconst result = await extractOpenGraphAsync(html, {\n  security: {\n    detectPII: true,    // Detect emails, phones, addresses\n    maskPII: true       // Mask detected PII in results\n  }\n});\n\n// PII will be masked in the output\n// \"Contact: j***@example.com\" instead of \"Contact: john@example.com\"\n```\n\n### **Enhanced Media Support**\n\n#### **Smart Image Processing**\n- **Format Detection**: Supports JPG, PNG, GIF, WebP, AVIF, SVG\n- **Size Optimization**: Automatically selects best image sizes\n- **Responsive Images**: Handles srcset and multiple formats\n- **Fallback Images**: Finds images when og:image is missing\n\n```typescript\n// Enhanced image extraction\nconst result = await extractOpenGraphAsync(html, {\n  allMedia: true  // Extract all images, not just the first\n});\n\nconsole.log(result.data.ogImage);\n// [\n//   { url: 'image1.jpg', width: 1200, height: 630, type: 'jpg' },\n//   { url: 'image2.png', width: 800, height: 600, type: 'png' }\n// ]\n```\n\n#### **Video \u0026 Audio Metadata**\n- **Video Information**: Duration, thumbnails, captions, chapters\n- **Audio Metadata**: Track info, artists, albums, duration\n- **Streaming Support**: Handles video players and streaming URLs\n\n```typescript\nconst result = await extractOpenGraphAsync(videoPageHtml);\n\nconsole.log(result.data.ogVideo);\n// {\n//   url: 'video.mp4',\n//   duration: 300,\n//   thumbnails: [{ url: 'thumb.jpg', width: 1280, height: 720 }],\n//   captions: [{ language: 'en', url: 'captions.vtt' }]\n// }\n```\n\n## 📈 Metrics \u0026 Monitoring\n\n```typescript\nconst result = await extractOpenGraphAsync(html);\n\nconsole.log(result.metrics);\n// {\n//   extractionTime: 125,        // ms\n//   htmlSize: 54321,           // bytes\n//   metaTagsFound: 15,\n//   structuredDataFound: 3,\n//   imagesFound: 8,\n//   videosFound: 1,\n//   fallbacksUsed: ['title', 'description'],\n//   performance: {\n//     htmlParseTime: 20,\n//     metaExtractionTime: 10,\n//     structuredDataExtractionTime: 15,\n//     validationTime: 5,\n//     totalTime: 125\n//   }\n// }\n```\n\n## 🧪 Testing\n\n```bash\n# Run tests\nyarn test\n\n# Run with coverage\nyarn test --coverage\n```\n\n## 🔧 Development\n\n```bash\n# Install dependencies\nyarn install\n\n# Build\nyarn build\n\n# Lint and format with Biome\nyarn lint\nyarn format\n\n# Type check\nyarn typecheck\n```\n\n## 🤝 API / Cloud Service\n\nWe offer this as a managed Cloud API Service. Try it here: [URL Scraping \u0026 Metadata Service](https://dev.me/products/url-scrapper)\n\n## 📖 TypeScript Support\n\nThe library is fully typed with comprehensive TypeScript definitions:\n\n- `IOGResult` - Main result interface with 60+ fields\n- `IExtractionResult` - Async extraction result with metrics\n- `IExtractOpenGraphOptions` - Configuration options\n- `IStructuredData` - JSON-LD and structured data types\n- `IValidationResult` - Data validation results\n- `ISocialScore` - Social media scoring details\n- `IMetrics` - Performance tracking metrics\n\nAll types are exported for your use in TypeScript projects.\n\n## 🌟 Why Choose This Library?\n\n| Feature | This Library | Others |\n|---------|-------------|---------|\n| Open Graph | ✅ Complete (60+ fields) | ✅ Basic |\n| Twitter Cards | ✅ Complete | ⚠️ Partial |\n| JSON-LD | ✅ Full Extraction | ❌ No |\n| Schema.org | ✅ Microdata/RDFa | ❌ No |\n| Caching | ✅ Built-in (tiny-lru) | ❌ No |\n| Bulk Processing | ✅ Concurrent | ❌ No |\n| Validation | ✅ Comprehensive | ❌ No |\n| Security | ✅ Node.js optimized | ❌ No |\n| TypeScript | ✅ Full Types | ⚠️ Partial |\n| Performance | ✅ Optimized | ⚠️ Variable |\n| Maintenance | ✅ Active | ⚠️ Variable |\n\n## 🛡️ Security\n\n- **HTML Sanitization**: Uses Cheerio for safe HTML parsing (Node.js only)\n- **PII Detection**: Automatic detection and masking of sensitive data\n- **URL Validation**: Prevents SSRF attacks with domain filtering\n- **Content Security**: Malicious content detection and filtering\n\n## 📈 Performance\n\n- **Fast Extraction**: Sub-100ms for average pages\n- **Smart Caching**: Built-in tiny-lru cache reduces repeated processing\n- **Concurrent Processing**: Configurable concurrency for bulk operations\n- **Optimized Parsing**: Cheerio-based parsing for Node.js performance\n\n## 🤝 Contributing\n\nWe welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.\n\n## 📄 License\n\n[MIT](LICENSE.md)\n\n## 🙏 Acknowledgments\n\nBuilt with:\n- [Cheerio](https://cheerio.js.org/) - Fast, flexible \u0026 lean implementation of jQuery for Node.js\n- [tiny-lru](https://github.com/avoidwork/tiny-lru) - Tiny LRU cache for high-performance caching\n- [Biome](https://biomejs.dev/) - Fast formatter and linter for JavaScript and TypeScript\n\n---\n\n**Made with ❤️ by [DEV.ME](https://dev.me)**\n\n*Need help or custom features? [Contact us](https://dev.me/contact)*","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmehq%2Fopen-graph-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevmehq%2Fopen-graph-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmehq%2Fopen-graph-extractor/lists"}