https://github.com/codedotjs/urlist
A Python script that extracts URLs from a text file containing a list of websites.
https://github.com/codedotjs/urlist
collection extractor python scraping urls
Last synced: 9 months ago
JSON representation
A Python script that extracts URLs from a text file containing a list of websites.
- Host: GitHub
- URL: https://github.com/codedotjs/urlist
- Owner: CodeDotJS
- License: mit
- Created: 2023-07-05T07:17:41.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-07-05T22:54:10.000Z (over 2 years ago)
- Last Synced: 2025-03-20T00:41:20.924Z (9 months ago)
- Topics: collection, extractor, python, scraping, urls
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

URLs from URL(s)
---
### Purpose
- This script extracts all the URLs from a text file containing a list of websites, and saves them in JSON format.
- Handles missing schemas and fixes relative URLs to ensure accurate results.
- Uses multithreading to concurrently process multiple websites, so it's fast!
---
### Usage
- Install the required modules
```sh
$ pip install aiohttp beautifulsoup4 fake_useragent
```
- Download the script
```sh
$ curl -OL https://raw.githubusercontent.com/CodeDotJS/urlist/master/extractor.py
```
- Run
```sh
$ python extractor.py
```
__Note:__ If you need to save all the links present in the JSON to a text file, you can download
```sh
$ curl -OL https://raw.githubusercontent.com/CodeDotJS/urlist/master/generateTxt.py
```
### Reason
I needed a tool to generate thousands of active URLs and dump them as JSON, so I built one.
### License
MIT