Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/suhailroushan13/techcrunch-api
TechCrunch API is a Node.js package that allows you to scrape articles from TechCrunch based on categories or tags. This package is designed for systems using Ubuntu or other Debian-based distributions that support sudo commands, leveraging Puppeteer
https://github.com/suhailroushan13/techcrunch-api
api news newsapi nodejs npm npm-package techcrunch
Last synced: about 6 hours ago
JSON representation
TechCrunch API is a Node.js package that allows you to scrape articles from TechCrunch based on categories or tags. This package is designed for systems using Ubuntu or other Debian-based distributions that support sudo commands, leveraging Puppeteer
- Host: GitHub
- URL: https://github.com/suhailroushan13/techcrunch-api
- Owner: suhailroushan13
- Created: 2024-05-04T18:04:26.000Z (7 months ago)
- Default Branch: master
- Last Pushed: 2024-05-06T21:01:42.000Z (6 months ago)
- Last Synced: 2024-10-13T17:35:42.781Z (about 1 month ago)
- Topics: api, news, newsapi, nodejs, npm, npm-package, techcrunch
- Language: JavaScript
- Homepage: https://www.npmjs.com/package/techcrunch-api
- Size: 19.5 KB
- Stars: 20
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# TechCrunch API 🧑💻
TechCrunch API is a Node.js package that allows you to scrape articles from TechCrunch based on categories or tags. This package is designed for systems using Ubuntu or other Debian-based distributions that support `sudo` commands, leveraging Puppeteer to navigate and scrape content from a headless Chromium environment. 🌐
## Features 🚀
- **Scrape by Category:** Automatically retrieve all articles under a specified category. 📂
- **Scrape by Tag:** Collect articles that are tagged with a specific keyword. 🏷️
- **Headless Browser Support:** Runs Chromium in headless mode to scrape dynamic content. 👻
- **Optimized for Ubuntu:** Includes installation instructions specifically for Ubuntu, but compatible with other Linux distributions. 🐧## Prerequisites 📋
Before installing the TechCrunch Scraper, you need to ensure your system has the following dependencies installed:
- Node.js (Version 14 or later recommended) 🟢
- Puppeteer 🎭
- Dependencies required for Puppeteer and headless Chromium 🔧## Installation
Follow these steps to set up the TechCrunch Scraper package:
### Step 1: Install System Dependencies
Open a terminal and execute the following commands to install necessary libraries:
```bash
npm install puppeteer
sudo apt-get update
sudo apt-get install -y libgbm-dev xvfb chromium-browser libvpx7 libevent-2.1-7 libharfbuzz-icu0 libwebpdemux2 libenchant-2-2 libsecret-1-0 libmanette-0.2-0 libflite1 libgles2-mesa
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99
```### Step 2: Install TechCrunch API Package
Install the package via npm with the following command:
```bash
npm install techcrunch-api
```## Usage
#### After installation, you can use the package in your Node.js scripts as follows:
# ES6 Syntax
```javascript
import { getByCategory, getByTag } from "techcrunch-api";// Fetch articles by category using async/await
// Valid categories/tags for fetching articles (must be used in lowercase):
// 1. media-entertainment
// 2. transportation
// 3. cryptocurrency
// 4. security
// 5. artificial-intelligence
// 6. apps
// 7. fintech
// 8. startups
// 9. venture
// 10. hardwareconst fetchArticles = async () => {
try {
const articles = await getByCategory("security");
console.log(articles);
} catch (error) {
console.error("Error fetching articles:", error);
}
};fetchArticles();
const fetchTag = async () => {
try {
const tags = await getByTag("apis");
console.log(tags);
} catch (error) {
console.error("Error fetching tags:", error);
}
};fetchTag();
```## Running the Scraper
```bash
node app.js
```