Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mirelianavioletzyra/sandbox

Playing around with some JS
https://github.com/mirelianavioletzyra/sandbox

Last synced: 10 days ago
JSON representation

Playing around with some JS

Awesome Lists containing this project

README

        

### How This Code Works

1. **Robots.txt Check:** The script first fetches and parses the `robots.txt` file from the base URL of the target website. It then checks if the specific URL (`https://www.instructables.com/spinning-yarn/` in this case) is allowed to be scraped.

2. **Scraping the Page:** If scraping is allowed, the script then fetches the HTML content of the page using Axios.

3. **Extracting Data:** Cheerio is used to parse the HTML content and extract the title and meta description of the page.

4. **Output:** The title and description are printed to the console.

### Running the Script

To run this script, follow the steps previously mentioned: open your terminal or command prompt, navigate to the directory containing the script, and run it with Node.js using the command `node scriptname.js` (replace `scriptname.js` with the actual name of your file).

### Note

- **Dynamic Content:** If the page loads content dynamically with JavaScript, Cheerio might not be able to scrape it, as it only parses static HTML content. For dynamic content, you would need a tool like Puppeteer.
- **Error Handling:** The script includes basic error handling for network requests and parsing issues. It's always good practice to robustly handle possible errors in web scraping scripts.