Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gosom/google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
https://github.com/gosom/google-maps-scraper

distributed-scraper distributed-scraping golang google-maps google-maps-scraping web-scraper web-scraping

Last synced: about 1 month ago
JSON representation

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place

Awesome Lists containing this project

README

        

# Google maps scraper
![build](https://github.com/gosom/google-maps-scraper/actions/workflows/build.yml/badge.svg)
[![Go Report Card](https://goreportcard.com/badge/github.com/gosom/google-maps-scraper)](https://goreportcard.com/report/github.com/gosom/google-maps-scraper)

> A command line and web UI google maps scraper

---






supported by the community


Special thanks to:






Google Maps API for easy SERP scraping

Google Maps API for easy SERP scraping







Capsolver banner

CapSolver automates CAPTCHA solving for efficient web scraping. It supports reCAPTCHA V2, reCAPTCHA V3, hCaptcha, and more. With API and extension options, itโ€™s perfect for any web scraping project.


---

![Google maps scraper](https://github.com/gosom/google-maps-scraper/blob/main/banner.png)

A command line and web based google maps scraper build using

[scrapemate](https://github.com/gosom/scrapemate) web crawling framework.

You can use this repository either as is, or you can use it's code as a base and
customize it to your needs

## Try it

### Web UI:

![Example GIF](img/example.gif)

```
mkdir -p gmapsdata && docker run -v $PWD/gmapsdata:/gmapsdata -p 8080:8080 gosom/google-maps-scraper -data-folder /gmapsdata
```

Or dowload the [binary](https://github.com/gosom/google-maps-scraper/releases) for your platform and run it.

Note: Even if you add one keyword the results will come in at least 3 minutes. This is a minimum configured runtime

Note: for MacOS the docker command should not work. **HELP REQUIRED**

### Command line:

```
touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m
```

file `results.csv` will contain the parsed results.

**If you want emails use additionally the `-email` parameter**

## ๐ŸŒŸ Support the Project!

If you find this tool useful, consider giving it a **star** on GitHub.
Feel free to check out the **Sponsor** button on this repository to see how you can further support the development of this project.
Your support helps ensure continued improvement and maintenance.

## Features

- Extracts many data points from google maps
- Exports the data to CSV, JSON or PostgreSQL
- Perfomance about 120 urls per minute (-depth 1 -c 8)
- Extendable to write your own exporter
- Dockerized for easy run in multiple platforms
- Scalable in multiple machines
- Optionally extracts emails from the website of the business
- SOCKS5/HTTP/HTTPS proxy support
- Serverless execution via AWS Lambda functions (experimental & no documentation yet)

## Notes on email extraction

By defaul email extraction is disabled.

If you enable email extraction (see quickstart) then the scraper will visit the
website of the business (if exists) and it will try to extract the emails from the
page.

For the moment it only checks only one page of the website (the one that is registered in Gmaps). At some point, it will be added support to try to extract from other pages like about, contact, impressum etc.

Keep in mind that enabling email extraction results to larger processing time, since more
pages are scraped.

## Extracted Data Points

```
input_id
link
title
category
address
open_hours
popular_times
website
phone
plus_code
review_count
review_rating
reviews_per_rating
latitude
longitude
cid
status
descriptions
reviews_link
thumbnail
timezone
price_range
data_id
images
reservations
order_online
menu
owner
complete_address
about
user_reviews
emails
```

**Note**: email is empty by default (see Usage)

**Note**: Input id is an ID that you can define per query. By default its a UUID
In order to define it you can have an input file like:

```
Matsuhisa Athens #!#MyIDentifier
```

## Quickstart

### Using docker:

```
touch results.csv && docker run -v $PWD/example-queries.txt:/example-queries -v $PWD/results.csv:/results.csv gosom/google-maps-scraper -depth 1 -input /example-queries -results /results.csv -exit-on-inactivity 3m
```

file `results.csv` will contain the parsed results.

**If you want emails use additionally the `-email` parameter**

### On your host

(tested only on Ubuntu 22.04)

```
git clone https://github.com/gosom/google-maps-scraper.git
cd google-maps-scraper
go mod download
go build
./google-maps-scraper -input example-queries.txt -results restaurants-in-cyprus.csv -exit-on-inactivity 3m
```

Be a little bit patient. In the first run it downloads required libraries.

The results are written when they arrive in the `results` file you specified

**If you want emails use additionally the `-email` parameter**

### Command line options

try `./google-maps-scraper -h` to see the command line options available:

```
-aws-access-key string
AWS access key
-aws-lambda
run as AWS Lambda function
-aws-lambda-chunk-size int
AWS Lambda chunk size (default 100)
-aws-lambda-invoker
run as AWS Lambda invoker
-aws-region string
AWS region
-aws-secret-key string
AWS secret key
-c int
sets the concurrency [default: half of CPU cores] (default 11)
-cache string
sets the cache directory [no effect at the moment] (default "cache")
-data-folder string
data folder for web runner (default "webdata")
-debug
enable headful crawl (opens browser window) [default: false]
-depth int
maximum scroll depth in search results [default: 10] (default 10)
-dsn string
database connection string [only valid with database provider]
-email
extract emails from websites
-exit-on-inactivity duration
exit after inactivity duration (e.g., '5m')
-function-name string
AWS Lambda function name
-geo string
set geo coordinates for search (e.g., '37.7749,-122.4194')
-input string
path to the input file with queries (one per line) [default: empty]
-json
produce JSON output instead of CSV
-lang string
language code for Google (e.g., 'de' for German) [default: en] (default "en")
-produce
produce seed jobs only (requires dsn)
-proxies string
comma separated list of proxies to use in the format protocol://user:pass@host:port example: socks5://localhost:9050 or http://user:pass@localhost:9050
-results string
path to the results file [default: stdout] (default "stdout")
-s3-bucket string
S3 bucket name
-web
run web server instead of crawling
-writer string
use custom writer plugin (format: 'dir:pluginName')
-zoom int
set zoom level (0-21) for search
```

## Using a custom writer

In cases the results need to be written in a custom format or in another system like a db a message queue or basically anything the Go plugin system can be utilized.

Write a Go plugin (see an example in examples/plugins/example_writeR.go)

Compile it using (for Linux):

```
go build -buildmode=plugin -tags=plugin -o ~/mytest/plugins/example_writer.so examples/plugins/example_writer.go
```

and then run the program using the `-writer` argument.

See an example:

1. Write your plugin (use the examples/plugins/example_writer.go as a reference)
2. Build your plugin `go build -buildmode=plugin -tags=plugin -o ~/myplugins/example_writer.so plugins/example_writer.go`
3. Download the lastes [release](https://github.com/gosom/google-maps-scraper/releases/) or build the program
4. Run the program like `./google-maps-scraper -writer ~/myplugins:DummyPrinter -input example-queries.txt`

### Plugins and Docker

It is possible to use the docker image and use tha plugins.
In such case make sure that the shared library is build using a compatible GLIB version with the docker image.
otherwise you will encounter an error like:

```
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /plugins/example_writer.so)
```

## Using Database Provider (postgreSQL)

For running in your local machine:

```
docker-compose -f docker-compose.dev.yaml up -d
```

The above starts a PostgreSQL contains and creates the required tables

to access db:

```
psql -h localhost -U postgres -d postgres
```

Password is `postgres`

Then from your host run:

```
go run main.go -dsn "postgres://postgres:postgres@localhost:5432/postgres" -produce -input example-queries.txt --lang el
```

(configure your queries and the desired language)

This will populate the table `gmaps_jobs` .

you may run the scraper using:

```
go run main.go -c 2 -depth 1 -dsn "postgres://postgres:postgres@localhost:5432/postgres"
```

If you have a database server and several machines you can start multiple instances of the scraper as above.

### Kubernetes

You may run the scraper in a kubernetes cluster. This helps to scale it easier.

Assuming you have a kubernetes cluster and a database that is accessible from the cluster:

1. First populate the database as shown above
2. Create a deployment file `scraper.deployment`

```
apiVersion: apps/v1
kind: Deployment
metadata:
name: google-maps-scraper
spec:
selector:
matchLabels:
app: google-maps-scraper
replicas: {NUM_OF_REPLICAS}
template:
metadata:
labels:
app: google-maps-scraper
spec:
containers:
- name: google-maps-scraper
image: gosom/google-maps-scraper:v0.9.3
imagePullPolicy: IfNotPresent
args: ["-c", "1", "-depth", "10", "-dsn", "postgres://{DBUSER}:{DBPASSWD@DBHOST}:{DBPORT}/{DBNAME}", "-lang", "{LANGUAGE_CODE}"]
```

Please replace the values or the command args accordingly

Note: Keep in mind that because the application starts a headless browser it requires CPU and memory.
Use an appropriate kubernetes cluster

## Telemetry

Anonymous usage statistics are collected for debug and improvement reasons.
You can opt out by setting the env variable `DISABLE_TELEMETRY=1`

## Perfomance

Expected speed with concurrency of 8 and depth 1 is 120 jobs/per minute.
Each search is 1 job + the number or results it contains.

Based on the above:
if we have 1000 keywords to search with each contains 16 results => 1000 * 16 = 16000 jobs.

We expect this to take about 16000/120 ~ 133 minutes ~ 2.5 hours

If you want to scrape many keywords then it's better to use the Database Provider in
combination with Kubernetes for convenience and start multipe scrapers in more than 1 machines.

## References

For more instruction you may also read the following links

- https://blog.gkomninos.com/how-to-extract-data-from-google-maps-using-golang
- https://blog.gkomninos.com/distributed-google-maps-scraping
- https://github.com/omkarcloud/google-maps-scraper/tree/master (also a nice project) [many thanks for the idea to extract the data by utilizing the JS objects]

## Licence

This code is licenced under the MIT Licence

## Contributing

Please open an ISSUE or make a Pull Request

Thank you for considering support for the project. Every bit of assistance helps maintain momentum and enhances the scraperโ€™s capabilities!

## Notes

Please use this scraper responsibly

banner is generated using OpenAI's DALE

## Sponsors

searchapi.com sponsors this project via Github sponsors.

If you register via the links on my page I get a commission. This is another way to support my work