Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/theritikchoure/crawlyx
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
https://github.com/theritikchoure/crawlyx
cli command-line-tool crawler crawlyx hacktoberfest hacktoberfest-2023 hacktoberfest-accepted nodejs npmjs open-source scraper web-scraping
Last synced: 4 months ago
JSON representation
Crawlyx is an open-source command-line interface (CLI) based web crawler built using Node.js. It is designed to crawl websites and extract useful information like links, images, and text. It is lightweight, fast, and easy to use.
- Host: GitHub
- URL: https://github.com/theritikchoure/crawlyx
- Owner: theritikchoure
- License: mit
- Created: 2023-03-20T09:45:27.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-10-11T08:39:40.000Z (over 1 year ago)
- Last Synced: 2024-09-28T15:21:06.294Z (4 months ago)
- Topics: cli, command-line-tool, crawler, crawlyx, hacktoberfest, hacktoberfest-2023, hacktoberfest-accepted, nodejs, npmjs, open-source, scraper, web-scraping
- Language: JavaScript
- Homepage: http://crawlyx.js.org/
- Size: 8.2 MB
- Stars: 9
- Watchers: 3
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Crawlyx
Crawlyx is a powerful CLI-based web crawler built using Node.js that can help you extract valuable data from websites and improve your website's SEO ranking. Whether you're a marketer, SEO professional, or web developer, Crawlyx can be an essential tool in your arsenal for website analysis, optimization, and monitoring.
With Crawlyx, you can easily crawl any website and extract data such as page titles, meta descriptions, headings, links, images, and more. You can also use Crawlyx to analyze the internal linking structure of a website, identify broken links, duplicate content, and other issues that may be hurting the SEO ranking of your website.
In addition, Crawlyx provides a custom report feature that allows you to generate detailed reports based on the data extracted from websites. You can generate reports in various output formats such as CSV, JSON, and HTML, and customize the report to include or exclude specific data fields.
With the HTML report feature, you can generate visually appealing reports that provide insights into the SEO ranking, user experience, and other aspects of a website. These reports can help you make data-driven decisions and optimize your website for better performance.
So if you want to improve your website's SEO ranking, optimize your content, and stay on top of changes to your website, Crawlyx is the tool for you. Try Crawlyx today and unleash the power of web crawling!
[![NPM](https://img.shields.io/npm/v/crawlyx.svg)](https://www.npmjs.com/package/crawlyx) [![JavaScript Style Guide](https://img.shields.io/badge/code_style-standard-brightgreen.svg)](https://standardjs.com) [![install size](https://packagephobia.com/badge?p=crawlyx)](https://packagephobia.com/result?p=crawlyx)
[![npm](https://img.shields.io/npm/dw/crawlyx?style=social)](https://www.npmjs.com/package/crawlyx)
[![NPM](https://nodei.co/npm/crawlyx.png)](https://nodei.co/npm/crawlyx/)
## Demo
![demo](https://raw.githubusercontent.com/theritikchoure/crawlyx/main/docs/assets/images/demo.gif)
## Installation
```bash
npm i -g crawlyx
```make sure you install it globally.
To check successful installation of crawlyx, open command prompt or windows terminal.
Type in your cmd -
```bash
crawlyx --version
```### Installation troubleshoot
If you are still getting an installation error after the global installation, You can change the execution policy of PowerShell to allow running unsigned scripts. Open your terminal in vs code or whatever ide you use and run the following command
```bash
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy Unrestricted
```## CLI Usage
Start crawling by the following command -```bash
crawlyx
```## Features
1. **Web Crawling:** Crawlyx can crawl any website and extract valuable data such as page titles, meta descriptions, headings, links, images, and more.2. **SEO Analysis:** Crawlyx can analyze the internal linking structure of a website, identify broken links, duplicate content, missing tags, and other issues that may be hurting the SEO ranking of your website.
3. **Customizable Reports:** Crawlyx provides a custom report feature that allows you to generate reports in various output formats such as CSV, JSON, and HTML. You can customize the report to include or exclude specific data fields and generate visually appealing reports that provide insights into the SEO ranking, user experience, and other aspects of a website.
4. **User-Friendly CLI:** Crawlyx has a user-friendly command-line interface that makes it easy to use, even for those who are not familiar with web crawling or programming.
5. **Cross-Platform Support:** Crawlyx works on multiple platforms, including Windows, Mac, and Linux.
6. **Open-Source:** Crawlyx is an open-source project, which means that its source code is freely available for anyone to use and contribute to.
*With these features, Crawlyx can be a valuable tool for marketers, SEO professionals, web developers, and anyone who needs to extract data from websites or monitor changes to a website.*
## Operating System supports
| Windows (7, 8, 10, and Server versions) | macOS (10.10 and higher) | Linux (Ubuntu, Debian, Fedora, CentOS, etc.) |
| ------ | ---- | ------- |
| ✅ | ✅ | ✅ |## How it works
1. **Parsing the command-line arguments:** Crawlyx uses the popular commander.js library to parse the command-line arguments and options. This allows users to specify the website URL and other options.2. **Crawling the website:** Crawlyx uses the `fetch` function and `JSDOM` library to crawl the website and extract data such
as page titles, meta descriptions, headings, links, images, and other elements. This data is stored in an internal data structure that can be processed and exported later.3. **Analyzing the website:** Crawlyx uses various algorithms to analyze the internal linking structure of the website, identify broken links, duplicate content, missing tags, and other issues that may be hurting the SEO ranking of the website.
4. **Generating the report:** Crawlyx uses the specified output format to generate the report. This can be in CSV, JSON, or HTML format, depending on the user's choice. The report contains various data fields such as page title, meta description, headings, links, images, and other data extracted from the website.
## Contribution
**Note** - Give a ⭐ to this project
- Fork this repository (Click the Fork button in the top right of this page, click your Profile Image)
- Clone your fork down to your local machine```bash
git clone https://github.com/your-username/crawlyx.git
```- Create a branch
```bash
git checkout -b branch-name
```- Make your changes (choose from any task below)
- Commit and push```bash
git add .
git commit -m 'Commit message'
git push origin branch-name
```- Create a new pull request from your forked repository (Click the New Pull Request button located at the top of your repo)
- Wait for your PR review and merge approval!
- Star this repository if you had fun!For more information, Please read [CONTRIBUTING.md](https://github.com/theritikchoure/crawlyx/blob/main/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us.
## Attribution
You can use this badge for attribution in your project's readme file.
[![](https://img.shields.io/badge/generated%20with-Crawlyx-%2328b76b?style=for-the-badge)](https://theritikchoure.github.io/crawlyx/docs/)
```js
[![](https://img.shields.io/badge/generated%20with-Crawlyx-%2328b76b?style=for-the-badge)](https://theritikchoure.github.io/crawlyx/docs/)
```## Author
- [@theritikchoure](https://github.com/theritikchoure)
## Feedback
If you have any feedback/queries, please reach out to us at [email protected]
## License
This package is licensed under the © [MIT](https://github.com/theritikchoure/crawlyx/blob/main/LICENSE) license