https://github.com/abdullahashfaq-ds/earth-engine-data-scraper

A Python based web scraper designed to extract and organize dataset metadata from the Google Earth Engine Datasets Catalog for research, and analysis purposes.
https://github.com/abdullahashfaq-ds/earth-engine-data-scraper

beautifulsoup data data-science python requests scraper web-scraping

Last synced: 6 months ago
JSON representation

A Python based web scraper designed to extract and organize dataset metadata from the Google Earth Engine Datasets Catalog for research, and analysis purposes.

Host: GitHub
URL: https://github.com/abdullahashfaq-ds/earth-engine-data-scraper
Owner: abdullahashfaq-ds
License: mit
Created: 2024-03-03T10:54:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-20T06:11:48.000Z (9 months ago)
Last Synced: 2024-11-17T10:19:18.931Z (8 months ago)
Topics: beautifulsoup, data, data-science, python, requests, scraper, web-scraping
Language: Jupyter Notebook
Homepage:
Size: 19.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Earth Engine Data Scraper

This repository contains a Python based web scraper designed to extract dataset metadata from the [Google Earth Engine Datasets Catalog](https://developers.google.com/earth-engine/datasets/catalog). It can be utilized for data gathering in research, analysis, or integration into larger systems related to environmental or geospatial data exploration.

## Features

- Scrapes dataset information from multiple pages of the Google Earth Engine Datasets Catalog.
- Extracts detailed metadata, including:
- Dataset title
- Availability information
- Provider name and URL
- Associated tags
- Table values, when available
- Outputs the scraped data in a structured format for easy access and further analysis.

## Installation

To set up and run the scraper, follow these steps:

1. **Clone the Repository**

```bash
git clone [email protected]:abdullahashfaq-ds/Earth-Engine-Data-Scraper.git
cd Earth-Engine-Data-Scraper
```

2. **Create and Activate a Virtual Environment**

```bash
python -m venv venv

# For Windows, use:
venv\Scripts\activate

# For macOS/Linux, use:
source venv/bin/activate
```

3. **Install Dependencies**

```bash
pip install -r requirements.txt
```

4. **Run the Scraper**

The scraper logic is implemented in a Jupyter notebook located in the `Notebooks` directory. Open it with Jupyter Lab or Jupyter Notebook, and execute the cells to initiate the scraping process.

## Note

If you see an unverified signature in the commits, no worries—I've just misplaced my GPG key!

## License

This project is licensed under the MIT License. For more details, see the [LICENSE](LICENSE) file.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abdullahashfaq-ds/earth-engine-data-scraper

Awesome Lists containing this project

README