Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/basemax/kashan-university-phone-directory
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
https://github.com/basemax/kashan-university-phone-directory
crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan
Last synced: 3 days ago
JSON representation
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
- Host: GitHub
- URL: https://github.com/basemax/kashan-university-phone-directory
- Owner: BaseMax
- License: mit
- Created: 2025-01-03T09:18:31.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2025-01-03T09:29:10.000Z (about 1 month ago)
- Last Synced: 2025-02-07T10:17:23.151Z (5 days ago)
- Topics: crawler, crawlers, database, html-scraper, json, kashan, kashan-university, scraper, scraper-api, scraper-html, scrapers, university, university-of-kashan
- Language: HTML
- Homepage:
- Size: 128 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# University of Kashan Phone Directory Scraper
This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and personnel from the University of Kashan. It includes tools to scrape, parse, and export data from an HTML file into JSON format.
## Features
- HTML parsing to extract structured data.
- Export of extracted data in JSON format.
- Modular and adaptable code for similar scraping tasks.## Project Structure
```
organization-phone-118
.
├── demo.html # Sample HTML data file.
├── extract.php # Script for extracting data from HTML.
├── output.json # Extracted data in JSON format.
└── load.php # Configuration and utility script.
```## Prerequisites
- **PHP**: Version 7.4 or higher.
- **Web Server**: Optional, such as Apache or Nginx.## Usage
1. Clone the repository:
```bash
git clone https://github.com/BaseMax/kashan-university-phone-directory.git
cd kashan-university-phone-directory
```Place the HTML file to be parsed in the root directory and name it demo.html.
Run the extraction script:
```bash
php extract.php
```View the output in `output.json`:
```bash
cat output.json
```### Output Format
The extracted data is stored in a JSON file with a structure similar to this:
```json
[
["Name", "Position", "Phone Number"],
["Example User", "Lecturer", "123456789"]
]
```### Contribution
Contributions are welcome! Please submit issues or pull requests on the GitHub repository.
### License
This project is licensed under the MIT License.
### Disclaimer
Ensure compliance with local laws and regulations regarding the publication of personal data. Obtain permission if necessary before sharing extracted information.
### Copyright
Data source: 118 Kashan University Directory. https://118.kashanu.ac.ir/
### Author
Developed by BaseMax.
Copyright 2024-2025, Max Base