https://github.com/agoutsmedt/central_bank_database

Project to collect documents published by central banks and to clean this data to create a database of central banks' documents
https://github.com/agoutsmedt/central_bank_database

central-bank central-bank-communication database scraping

Last synced: 9 months ago
JSON representation

Project to collect documents published by central banks and to clean this data to create a database of central banks' documents

Host: GitHub
URL: https://github.com/agoutsmedt/central_bank_database
Owner: agoutsmedt
Created: 2023-11-08T12:43:40.000Z (over 2 years ago)
Default Branch: master
Last Pushed: 2024-04-05T16:01:10.000Z (over 2 years ago)
Last Synced: 2024-04-05T17:25:49.452Z (over 2 years ago)
Topics: central-bank, central-bank-communication, database, scraping
Language: R
Homepage:
Size: 33.2 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# The Central Bank Database

This repository gathers scripts to collect documents published by central banks and to clean this data to create a database of central banks' documents.

## Scraping scripts

You will find here all the scripts to scrap the documents published by central banks.

### scraping_bis.R

This script allows you to scrap the speeches published on the [Bank of International Settlements website](https://www.bis.org/cbspeeches/index.htm):

- It extracts the metadata of the speeches (Title, date, author, etc...)
- It cleans this metadata to identify more clearly the speaker, the central bank of the speaker, etc...
- It downloads the corresponding pdf version of the scraped speeches
- It extracts the text of these pdf versions.

## Cleaning scripts

Here are all the scripts to clean the data scraped and to merge them to build one big database of central banks communication.

## Helper scripts

This directory gathers a script of `helper_functions.R` used in various scripts. It also gathers some background scripts used for longer operation: it launches the script as a background job, notably for downloading pdf or running OCR, to avoid waiting for the end of the operation that can take hour. As we don't want to overload website by too many requests, we use slow downloading, making background jobs necessary.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/agoutsmedt/central_bank_database

Awesome Lists containing this project

README