Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mirelianavioletzyra/sandbox
Playing around with some JS
https://github.com/mirelianavioletzyra/sandbox
Last synced: 10 days ago
JSON representation
Playing around with some JS
- Host: GitHub
- URL: https://github.com/mirelianavioletzyra/sandbox
- Owner: mirelianavioletzyra
- Created: 2024-01-16T18:04:04.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2024-01-18T15:01:30.000Z (12 months ago)
- Last Synced: 2024-11-08T02:56:52.560Z (2 months ago)
- Language: JavaScript
- Size: 6.17 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### How This Code Works
1. **Robots.txt Check:** The script first fetches and parses the `robots.txt` file from the base URL of the target website. It then checks if the specific URL (`https://www.instructables.com/spinning-yarn/` in this case) is allowed to be scraped.
2. **Scraping the Page:** If scraping is allowed, the script then fetches the HTML content of the page using Axios.
3. **Extracting Data:** Cheerio is used to parse the HTML content and extract the title and meta description of the page.
4. **Output:** The title and description are printed to the console.
### Running the Script
To run this script, follow the steps previously mentioned: open your terminal or command prompt, navigate to the directory containing the script, and run it with Node.js using the command `node scriptname.js` (replace `scriptname.js` with the actual name of your file).
### Note
- **Dynamic Content:** If the page loads content dynamically with JavaScript, Cheerio might not be able to scrape it, as it only parses static HTML content. For dynamic content, you would need a tool like Puppeteer.
- **Error Handling:** The script includes basic error handling for network requests and parsing issues. It's always good practice to robustly handle possible errors in web scraping scripts.