{"id":24360410,"url":"https://github.com/dms-codes/scrape_dosen_unair","last_synced_at":"2025-07-24T03:06:52.911Z","repository":{"id":199617962,"uuid":"703329323","full_name":"dms-codes/scrape_dosen_unair","owner":"dms-codes","description":"Web Scraping with Python This Python script is used for web scraping data from the Universitas Airlangga (Unair) faculty directory. The script fetches faculty URLs, generates pages URLs for each faculty, and extracts information from lecturer pages. The collected data is then saved to a CSV file.","archived":false,"fork":false,"pushed_at":"2023-10-11T03:37:00.000Z","size":47,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-12T08:34:49.558Z","etag":null,"topics":["beautifulsoup4","docent","python","requests","webscraper","webscraping"],"latest_commit_sha":null,"homepage":"https://github.com/dms-codes/scrape_dosen_unair","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dms-codes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-10-11T03:35:44.000Z","updated_at":"2023-10-11T11:26:45.000Z","dependencies_parsed_at":"2023-10-11T08:50:33.056Z","dependency_job_id":null,"html_url":"https://github.com/dms-codes/scrape_dosen_unair","commit_stats":null,"previous_names":["dms-codes/scrape_dosen_unair"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dms-codes/scrape_dosen_unair","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dms-codes%2Fscrape_dosen_unair","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dms-codes%2Fscrape_dosen_unair/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dms-codes%2Fscrape_dosen_unair/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dms-codes%2Fscrape_dosen_unair/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dms-codes","download_url":"https://codeload.github.com/dms-codes/scrape_dosen_unair/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dms-codes%2Fscrape_dosen_unair/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266786798,"owners_count":23983871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup4","docent","python","requests","webscraper","webscraping"],"created_at":"2025-01-18T21:19:43.368Z","updated_at":"2025-07-24T03:06:52.670Z","avatar_url":"https://github.com/dms-codes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping with Python\n\nThis Python script is used for web scraping data from the Universitas Airlangga (Unair) faculty directory. The script fetches faculty URLs, generates pages URLs for each faculty, and extracts information from lecturer pages. The collected data is then saved to a CSV file.\n\n## Prerequisites\n\nBefore running the script, ensure you have the necessary packages installed. You can install them using pip:\n\n```\npip install requests beautifulsoup4\n```\n\n## Usage\n\n1. Update the `BASE_URL` constant with the URL of the Unair faculty directory.\n2. Set the desired `TIMEOUT` value for requests.\n3. Run the script using Python:\n\n```\npython your_script_name.py\n```\n\n## Script Explanation\n\n- `extract_text`: A function to extract and clean text from an element.\n\n- `extract_faculties`: A function to extract faculty URLs from the main page.\n\n- `extract_pages`: A function to generate pages URLs for each faculty.\n\n- `extract_dosen_pages`: A function to extract lecturer information from their respective pages.\n\n- The script initializes a session and sends a GET request to the specified `BASE_URL`.\n\n- It extracts faculty URLs, generates pages URLs for each faculty, and extracts information from lecturer pages.\n\n- The collected data is saved to a CSV file named `data_dosen_unair.csv`.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\n- [Requests library](https://docs.python-requests.org/en/master/)\n- [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n```\n\nMake sure to customize the README with appropriate file names, paths, and additional information if needed.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdms-codes%2Fscrape_dosen_unair","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdms-codes%2Fscrape_dosen_unair","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdms-codes%2Fscrape_dosen_unair/lists"}