Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/basemax/kashan-university-phone-directory

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and other personnel from the University of Kashan. It includes tools to scrape, parse, and export data from a given HTML file into JSON format.
https://github.com/basemax/kashan-university-phone-directory

crawler crawlers database html-scraper json kashan kashan-university scraper scraper-api scraper-html scrapers university university-of-kashan

Last synced: 3 days ago
JSON representation

Host: GitHub
URL: https://github.com/basemax/kashan-university-phone-directory
Owner: BaseMax
License: mit
Created: 2025-01-03T09:18:31.000Z (about 1 month ago)
Default Branch: main
Last Pushed: 2025-01-03T09:29:10.000Z (about 1 month ago)
Last Synced: 2025-02-07T10:17:23.151Z (5 days ago)
Topics: crawler, crawlers, database, html-scraper, json, kashan, kashan-university, scraper, scraper-api, scraper-html, scrapers, university, university-of-kashan
Language: HTML
Homepage:
Size: 128 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# University of Kashan Phone Directory Scraper

This repository contains a scraper and dataset for extracting and publishing the phone directory of employees and personnel from the University of Kashan. It includes tools to scrape, parse, and export data from an HTML file into JSON format.

## Features

- HTML parsing to extract structured data.
- Export of extracted data in JSON format.
- Modular and adaptable code for similar scraping tasks.

## Project Structure

```
organization-phone-118
.
├── demo.html # Sample HTML data file.
├── extract.php # Script for extracting data from HTML.
├── output.json # Extracted data in JSON format.
└── load.php # Configuration and utility script.
```

## Prerequisites

- **PHP**: Version 7.4 or higher.
- **Web Server**: Optional, such as Apache or Nginx.

## Usage

1. Clone the repository:
```bash
git clone https://github.com/BaseMax/kashan-university-phone-directory.git
cd kashan-university-phone-directory
```

Place the HTML file to be parsed in the root directory and name it demo.html.

Run the extraction script:

```bash
php extract.php
```

View the output in `output.json`:

```bash
cat output.json
```

### Output Format

The extracted data is stored in a JSON file with a structure similar to this:

```json
[
["Name", "Position", "Phone Number"],
["Example User", "Lecturer", "123456789"]
]
```

### Contribution

Contributions are welcome! Please submit issues or pull requests on the GitHub repository.

### License

This project is licensed under the MIT License.

### Disclaimer

Ensure compliance with local laws and regulations regarding the publication of personal data. Obtain permission if necessary before sharing extracted information.

### Copyright

Data source: 118 Kashan University Directory. https://118.kashanu.ac.ir/

### Author

Developed by BaseMax.