Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sodascience/artscraper

Python package for downloading art and metadata of WikiArt and Google Arts & Culture
https://github.com/sodascience/artscraper

art download google-arts-and-culture odissei wikiart

Last synced: about 1 month ago
JSON representation

Python package for downloading art and metadata of WikiArt and Google Arts & Culture

Host: GitHub
URL: https://github.com/sodascience/artscraper
Owner: sodascience
License: mit
Created: 2022-03-22T12:07:08.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-04-08T14:54:11.000Z (9 months ago)
Last Synced: 2024-04-09T15:00:08.651Z (9 months ago)
Topics: art, download, google-arts-and-culture, odissei, wikiart
Language: Python
Homepage:
Size: 1.17 MB
Stars: 6
Watchers: 3
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

        
# ArtScraper

ArtScraper is a tool to download images and metadata for artworks available on

WikiArt (www.wikiart.org/) and Google Arts & Culture

(artsandculture.google.com/).

Functionality:

- `WikiArt` and `Google Arts & Culture`: Download images and metadata from a list of artworks' urls

- `Google Arts & Culture`: Download all images and metadata in the site, or from specific artists

## 1. Installation and setup

The ArtScraper package can be installed with pip, which automatically installs

the python dependencies:

```

pip install artscraper

```

## 2. Downloading art from WikiArt

To download data from WikiArt it is necessary to obtain

[API](https://www.wikiart.org/en/App/GetApi) keys. After obtaining them, you

can put them in a file called `.wiki_api` in the working directory for your

script. The format is: the API access key, a new line, the API secret key, and

a new line, e.g.:

```

7e57a60844

3defc62d8f

```

Alternatively, when ArtScraper doesn't detect the file `.wiki_api`, it will

ask for the API keys.

An example of fetching data is shown below and in the [notebook](examples/example_artscraper.ipynb). 

```python

from artscraper import WikiArtScraper

art_url = "https://www.wikiart.org/en/edvard-munch/anxiety-1894"

with WikiArtScraper(output_dir="data") as scraper:

    scraper.load_link(art_url)

    scraper.save_metadata() 

    scraper.save_image()

```

This will store both the image itself and the metadata in separate folders. If

you use ArtScraper in this way, it will skip images/metadata that is already

present. Remove the directory to force it to redownload it. 

Results:

[](https://www.wikiart.org/en/edvard-munch/anxiety-1894)

## 3. Downloading art from Google Arts & Culture

To download data from GoogleArt it is necessary to install 

[Firefox](https://www.mozilla.org/en-US/firefox/new/).

ArtScraper will open a new Firefox window, navigate to the image, zoom on it and take a screenshot of it. It will take a few seconds. Do not minimize that browser, and do not let the screensaver go on.

### 3.1 Downloading art from Google Arts & Culture using artworks' urls

An example of fetching data is shown below and in the [notebook](examples/example_artscraper.ipynb). 

```python

from artscraper import GoogleArtScraper

art_url = "https://artsandculture.google.com/asset/anxiety-edvard-munch/JgE_nwHHS7wTPw"

with GoogleArtScraper() as scraper:

    scraper.load_link(art_url)

    metadata = scraper.get_metadata() #or scraper.save_metadata()

    scraper.save_image("data/anxiety_munch.jpg")

    print(metadata) 

```

### 3.2 Downloading all art from Google Arts & Culture 

See [example notebook](examples/example_collect_all_artworks.ipynb).

The final structure of the results will be

- data

  - artist_links.txt (All artists, with one url per line) 

  - Artist_1

    - description.txt (Description of artist, from wikidata)

    - metadata.json (Metadata of arist, from wikidata)

    - works.txt (All artworks, with one url per line)

    - works 

      - work1

        - artwork.png (Artwork image)

        - metadata.json (Metadata of artwork, from Google Art and Culture)

      - work2

        - ...

  - Artist_2

    - ... 

A full example (but please check the [example notebook](examples/example_collect_all_artworks.ipynb) to add retries):

```python

from artscraper.find_artists import get_artist_links

# Get links for all artists, as a list

output_dir = "data"

artist_urls = get_artist_links(min_wait_time=1, output_file=f'{output_dir}/artist_links.txt')

# Find_artworks for each artist

for artist_url in artist_urls:

    with FindArtworks(artist_link=artist_url, output_dir=output_dir, 

                      min_wait_time=min_wait_time) as scraper:

            # Save list of artworks, the description, and metadata for an artist

            scraper.save_artist_information()

            # Find artist directory

            artist_dir = output_dir + '/' + scraper.get_artist_name() 

    # Scrape artworks

    with GoogleArtScraper(artist_dir + '/' + 'works', min_wait=min_wait_time) as subscraper:

        # Get list of links to this artist's works 

        with open(artist_dir+'/'+'works.txt', 'r') as file:

            artwork_links = [line.rstrip() for line in file]  

        # Download all artwork link (slow)

        for url in artwork_links:

            print(f'artwork URL: {url}')

            subscraper.save_artwork_information()

```

## Contributing

Contributions are what make the open source community an amazing place

to learn, inspire, and create. Any contributions you make are **greatly

appreciated**.

Please refer to the

[CONTRIBUTING](https://github.com/sodascience/artscraper/blob/main/CONTRIBUTING.md)

file for more information on issues and pull requests.

## License and citation

The package `artscraper` is published under an MIT license. When using `artscraper` for academic work, please cite:

    Schram, Raoul, Mitra, Modhurita, Garcia-Bernardo, Javier, van Kesteren, Erik-Jan, de Bruin, Jonathan, & Stamkou, Eftychia. (2022). 

    ArtScraper: A Python library to scrape online artworks (0.1.1). Zenodo. https://doi.org/10.5281/zenodo.7129975

## Contact

This project is developed and maintained by the [ODISSEI Social Data

Science (SoDa)](https://odissei-data.nl/nl/soda/) team.



Do you have questions, suggestions, or remarks? File an issue in the issue

tracker or feel free to contact the team via

https://odissei-data.nl/en/using-soda/.