Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mirelianavioletzyra/sandbox

Playing around with some JS
https://github.com/mirelianavioletzyra/sandbox

Last synced: 10 days ago
JSON representation

Playing around with some JS

Host: GitHub
URL: https://github.com/mirelianavioletzyra/sandbox
Owner: mirelianavioletzyra
Created: 2024-01-16T18:04:04.000Z (12 months ago)
Default Branch: main
Last Pushed: 2024-01-18T15:01:30.000Z (12 months ago)
Last Synced: 2024-11-08T02:56:52.560Z (2 months ago)
Language: JavaScript
Size: 6.17 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

### How This Code Works

1. **Robots.txt Check:** The script first fetches and parses the `robots.txt` file from the base URL of the target website. It then checks if the specific URL (`https://www.instructables.com/spinning-yarn/` in this case) is allowed to be scraped.

2. **Scraping the Page:** If scraping is allowed, the script then fetches the HTML content of the page using Axios.

3. **Extracting Data:** Cheerio is used to parse the HTML content and extract the title and meta description of the page.

4. **Output:** The title and description are printed to the console.

### Running the Script

To run this script, follow the steps previously mentioned: open your terminal or command prompt, navigate to the directory containing the script, and run it with Node.js using the command `node scriptname.js` (replace `scriptname.js` with the actual name of your file).

### Note

- **Dynamic Content:** If the page loads content dynamically with JavaScript, Cheerio might not be able to scrape it, as it only parses static HTML content. For dynamic content, you would need a tool like Puppeteer.
- **Error Handling:** The script includes basic error handling for network requests and parsing issues. It's always good practice to robustly handle possible errors in web scraping scripts.