{"id":21524179,"url":"https://github.com/alcidesrc/scraping-correos-with-php","last_synced_at":"2026-02-11T18:09:17.403Z","repository":{"id":254731508,"uuid":"844890637","full_name":"AlcidesRC/scraping-correos-with-php","owner":"AlcidesRC","description":"Obtaining the Spanish postal codes via web scrapping using PHP ","archived":false,"fork":false,"pushed_at":"2024-08-25T17:10:27.000Z","size":33,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-24T20:37:36.961Z","etag":null,"topics":["csv","csv-files","docker","php","scraper","scraping-websites"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AlcidesRC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-20T07:02:37.000Z","updated_at":"2024-09-06T12:39:55.000Z","dependencies_parsed_at":"2024-08-25T18:28:41.311Z","dependency_job_id":"e144bb66-5505-4db2-91b8-ca7dd772c004","html_url":"https://github.com/AlcidesRC/scraping-correos-with-php","commit_stats":null,"previous_names":["alcidesrc/scraping-correos-with-php"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AlcidesRC/scraping-correos-with-php","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlcidesRC%2Fscraping-correos-with-php","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlcidesRC%2Fscraping-correos-with-php/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlcidesRC%2Fscraping-correos-with-php/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlcidesRC%2Fscraping-correos-with-php/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AlcidesRC","download_url":"https://codeload.github.com/AlcidesRC/scraping-correos-with-php/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AlcidesRC%2Fscraping-correos-with-php/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29340497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T16:14:43.024Z","status":"ssl_error","status_checked_at":"2026-02-11T16:14:15.258Z","response_time":97,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","csv-files","docker","php","scraper","scraping-websites"],"created_at":"2024-11-24T01:21:23.771Z","updated_at":"2026-02-11T18:09:17.366Z","avatar_url":"https://github.com/AlcidesRC.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scraping Correos with PHP\n\n[TOC]\n\n\u003e [!TIP]\n\u003e\n\u003e This `Markdown` document may contains some [Mermaid](https://mermaid.js.org/) diagrams so please consider install [Typora](https://typora.io/) to read/manage `Markdown` files and don't miss any advanced feature. \n\n\n\n## Summary\n\nThis repository contains a web scraper that allows you to build the Spanish postal codes database from [Sociedad Estatal de Correos y Telégrafos](https://www.correos.es/).\n\nThe application is built on top of **PHP + Guzzle + concurrent requests** to improve the performance.\n\n\n\n------\n\n\n\n## Technical Requirements\n\n| Tool   | Required/Recommended | Description                                  |\n| ------ | -------------------- | -------------------------------------------- |\n| Git    | Required             | To interact with the VCS repository          |\n| Docker | Required             | To manage the development environment        |\n| Make   | Recommended          | To interact with the development environment |\n\n### Available Commands\n\n```bash\n╔══════════════════════════════════════════════════════════════════════════════╗\n║                                                                              ║\n║                           .: AVAILABLE COMMANDS :.                           ║\n║                                                                              ║\n╚══════════════════════════════════════════════════════════════════════════════╝\n\n· show-context                   Setup: show context\n· build                          Docker: builds the service\n· up                             Docker: starts the service\n· restart                        Docker: restarts the service\n· down                           Docker: stops the service\n· logs                           Docker: exposes the service logs\n· bash                           Docker: establish a bash session into main container\n· init                           Application: initializes the scrape process\n```\n\n\n\n------\n\n\n\n## Analysis\n\nPostal codes in Spain were created on July 1, 1984, when the Sociedad Estatal de Correos y Telégrafos (https://www.correos.es/) introduced automated mail sorting. \n\nA postal code **consists of a five-digit number between 01000..52999**, where the first two digits (01..52) correspond to one of the 50 provinces of Spain or to one of the two Spanish autonomous cities on the African coast. The last three digits correspond to the postal codes available in each province.\n\n\u003e You can find the list of provinces and their corresponding codes at the [Instituto Nacional de Estadística](https://www.ine.es/en/daco/daco42/codmun/cod_provincia_en.htm).\n\nFor example:\n\n| Province  | Province ID | Range  | Possible Postcodes |\n| --------- | ----------- | ------ | ------------------ |\n| Barcelona | 08          | 0..999 | 08000..08999       |\n| Málaga    | 29          | 0..999 | 29000..29999       |\n| Ceuta     | 51          | 0..999 | 51000..51999       |\n\nPostal codes in Spain were created on July 1, 1984 when the [Sociedad Estatal de Correos y Telégrafos](https://www.correos.es/), also known as Correos, introduced automated mail sorting. This company provides [a web form from which postal codes can be found](https://www.correos.es/es/en/tools/codigos-postales/details). If we open the *Developer Tools / Network* tab we will see that a background request is made to retrieve postal code suggestions.\n\n\n\n\u003e [!IMPORTANT]\n\u003e\n\u003e This application uses that *endpoint* to check if a postal code exists. \n\n\n\n### Postal Codes\n\nA postal code **consists of a five-digit number between 01000..52999**, where the first two digits (01..52) correspond to one of the 50 provinces of Spain or to one of the two Spanish autonomous cities on the African coast. The last three digits correspond to the postal codes available in each province.\n\n#### Examples\n\n| Province  | Province ID | Range  | Possible Postcodes |\n| --------- | ----------- | ------ | ------------------ |\n| Barcelona | 08          | 0..999 | 08000..08999       |\n| Málaga    | 29          | 0..999 | 29000..29999       |\n| Ceuta     | 51          | 0..999 | 51000..51999       |\n\n\n\n\u003e [!TIP]\n\u003e\n\u003e You can find the list of Spanish provinces and their corresponding codes at the [Instituto Nacional de Estadística](https://www.ine.es/en/daco/daco42/codmun/cod_provincia_en.htm).\n\n\n\n### Endpoint\n\n#### Valid Requests\n\n```text\nGET https://api1.correos.es/digital-services/searchengines/api/v1/suggestions?text=08001\n```\n\n##### Response\n\nHTTP 200 OK\n\n```json\n{\n  \"suggestions\": [\n    {\n      \"text\": \"08001, Barcelona, Barcelona, Cataluña, ESP\",\n      \"longitude\": 2.1686990270000592,\n      \"latitude\": 41.380160001000036\n    }\n  ]\n}\n```\n\n#### Unvalid Requests\n\n```text\nhttps://api1.correos.es/digital-services/searchengines/api/v1/suggestions?text=52999\n```\n\n##### Response\n\nHTTP 200 OK\n\n```json\n{\n  \"code\": \"404\",\n  \"message\": \"Not Found\",\n  \"moreInformation\": {\n    \"description\": \"Not results found.\",\n    \"link\": \"www.correos.es\"\n  }\n}\n```\n\n\n\n------\n\n\n\n## Implementation\n\nFor each province this application generates all possible postal code combinations and checks their existence using the endpoint described above.\n\n\n\n\u003e [!IMPORTANT]\n\u003e\n\u003e If the answer is valid then it stores the postal code details into a CSV file for easy processing.\n\n\n\n\u003e [!IMPORTANT]\n\u003e\n\u003e The scrape process is performed by a PHPUnit unit tests because using tests allows long executions without any timeout and additionally it allows to validate the imported data in each iteration. \n\n\n\n------\n\n\n\n## Getting Started\n\nJust clone the repository into your preferred path:\n\n```bash\n$ mkdir -p ~/path/to/my-new-project \u0026\u0026 cd ~/path/to/my-new-project\n$ git clone git@github.com:alcidesrc/scraping-correos-with-php.git .\n```\n\n### Start the scrape process\n\n```bash\n$ make init\n\n ℹ  Stopping the service \n\n[+] Running 2/2\n ✔ Container correos-app-run-7e80add999ab  Removed           10.2s \n ✔ Network correos_default                 Removed           0.2s \n\n ✓  Task done!\n\n\n ℹ  Building the image \n\n[+] Building 1.1s (24/24) FINISHED                           docker:default\n =\u003e [app internal] load build definition from Dockerfile     0.0s\n =\u003e =\u003e transferring dockerfile: 4.76kB                       0.0s\n =\u003e [app internal] load .dockerignore                                                                                               ...\n =\u003e =\u003e naming to docker.io/library/correos:dev               0.0s\n\n ✓  Task done!\n\n\n ℹ  Installing PHP dependecies... \n\ndocker compose run --rm --user 1000:1000 app composer install\n[+] Creating 1/1\n ✔ Network correos_default  Created                          0.1s \nInstalling dependencies from lock file (including require-dev)\nVerifying lock file contents can be installed on current platform.\nNothing to install, update or remove\nGenerating optimized autoload files\n32 packages you are using are looking for funding.\nUse the `composer fund` command to find out more!\n\n ✓  Task done!\n\n\n ℹ  Generating CSV files... \n \nPHPUnit 11.3.1 by Sebastian Bergmann and contributors.\n\nRuntime:       PHP 8.3.10 with PCOV 1.0.11\nConfiguration: /code/phpunit.xml\nRandom Seed:   2453001523\n\n.......................................................           55 / 55 (100%)\n\nTime: 18:15.756, Memory: 14.00 MB\n\nCorreos (Tests\\Unit\\Importers\\Correos)\n ✔ Check exception is raised with wrong province\n ✔ Scrape province with ALBACETE\n ✔ Scrape province with ÁVILA\n ✔ Scrape province with CÓRDOBA\n ✔ Scrape province with ZARAGOZA\n ✔ Scrape province with MELILLA\n ✔ Scrape province with HUESCA\n ✔ Scrape province with RIOJA,·LA\n ✔ Scrape province with BIZKAIA\n ✔ Scrape province with CORUÑA,·A\n ✔ Scrape province with BURGOS\n ✔ Scrape province with TOLEDO\n ✔ Scrape province with CASTELLÓN/CASTELLÓ\n ✔ Scrape province with MADRID\n ✔ Scrape province with LLEIDA\n ✔ Scrape province with BADAJOZ\n ✔ Scrape province with ARABA/ÁLAVA\n ✔ Scrape province with GUADALAJARA\n ✔ Scrape province with CEUTA\n ✔ Scrape province with BARCELONA\n ✔ Scrape province with BALEARS,·ILLES\n ✔ Scrape province with VALLADOLID\n ✔ Scrape province with PALMAS,·LAS\n ✔ Scrape province with TERUEL\n ✔ Scrape province with ALICANTE/ALACANT\n ✔ Scrape province with SEVILLA\n ✔ Scrape province with CUENCA\n ✔ Scrape province with SALAMANCA\n ✔ Scrape province with GRANADA\n ✔ Scrape province with NAVARRA\n ✔ Scrape province with GIRONA\n ✔ Scrape province with SEGOVIA\n ✔ Scrape province with ALMERÍA\n ✔ Scrape province with SORIA\n ✔ Scrape province with MÁLAGA\n ✔ Scrape province with ZAMORA\n ✔ Scrape province with SANTA·CRUZ·DE·TENERIFE\n ✔ Scrape province with GIPUZKOA\n ✔ Scrape province with TARRAGONA\n ✔ Scrape province with LEÓN\n ✔ Scrape province with CÁDIZ\n ✔ Scrape province with HUELVA\n ✔ Scrape province with JAÉN\n ✔ Scrape province with CANTABRIA\n ✔ Scrape province with ASTURIAS\n ✔ Scrape province with CÁCERES\n ✔ Scrape province with CIUDAD·REAL\n ✔ Scrape province with PONTEVEDRA\n ✔ Scrape province with LUGO\n ✔ Scrape province with VALENCIA/VALÈNCIA\n ✔ Scrape province with PALENCIA\n ✔ Scrape province with MURCIA\n ✔ Scrape province with OURENSE\n ✔ Validate first postal code from specific provinces with ARABA/ÁLAVA\n ✔ Validate first postal code from specific provinces with ALBACETE\n\nOK (55 tests, 110 assertions)\n\nGenerating code coverage report in HTML format ... done [00:00.015]\n\n\nCode Coverage Report:    \n  2024-08-20 12:36:15    \n                         \n Summary:                \n  Classes: 50.00% (1/2)  \n  Methods: 83.33% (5/6)  \n  Lines:   97.18% (69/71)\n\nApp\\CsvHandler\n  Methods:  50.00% ( 1/ 2)   Lines:  88.89% ( 16/ 18)\nApp\\Importers\\Correos\n  Methods: 100.00% ( 4/ 4)   Lines: 100.00% ( 53/ 53)\n\n ✓  Task done!\n```\n\n\n\n\u003e [!TIP]\n\u003e\n\u003e CSV files are stored at `./src/output/province-XX.csv` for easy processing\n\n \n\n\u003e [!IMPORTANT]\n\u003e\n\u003e Here you can find [a Gist with all Spanish postal codes](https://gist.github.com/AlcidesRC/14f80f7842acc91e14c11dc22b52d177) combined into one single file.\n\n\n\n------\n\n\n\n\n## Security Vulnerabilities\n\nPlease review our security policy on how to report security vulnerabilities:\n\n**PLEASE DON'T DISCLOSE SECURITY-RELATED ISSUES PUBLICLY**\n\n### Supported Versions\n\nOnly the latest major version receives security fixes.\n\n### Reporting a Vulnerability\n\nIf you discover a security vulnerability within this project, please [open an issue here](https://github.com/AlcidesRC/scraping-correos-with-php/issues). All security vulnerabilities will be promptly addressed.\n\n\n------\n\n\n\n## License\n\nThe MIT License (MIT). Please see [LICENSE](./LICENSE) file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falcidesrc%2Fscraping-correos-with-php","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falcidesrc%2Fscraping-correos-with-php","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falcidesrc%2Fscraping-correos-with-php/lists"}