https://github.com/dav009/congresovisible

Data dumps of Colombian Senate votes
https://github.com/dav009/congresovisible

colombia open-data scraper

Last synced: 7 months ago
JSON representation

Data dumps of Colombian Senate votes

Host: GitHub
URL: https://github.com/dav009/congresovisible
Owner: dav009
Created: 2014-11-05T23:47:38.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2014-11-15T20:48:10.000Z (almost 11 years ago)
Last Synced: 2025-02-06T03:32:19.553Z (8 months ago)
Topics: colombia, open-data, scraper
Language: Python
Homepage:
Size: 5.62 MB
Stars: 3
Watchers: 4
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Dumps congresovisible.org

[Congresovisible.org](http://www.congresovisible.org) is a great project which provides information about :

- Colombian law projects
- How are those projects voted
- Votes made by Senators and Congressmen

Sadly they don't provide an API for this valuable information. So this repo provides :

- code to scrape their website in order to extract valuable information
- data dumps (in json format)

## How is the data structured

### Json Dump

Every line of the json dump corresponds to a json dictionary representing a voting event, every event contains the following data:

```json
[
{
"camara" : "Cámara de Representantes",
"estado" : "aprobado",
"id": 3014,
"ano": "2014",
"mes_dia": "Sep 03",
"desacuerdo": "1%",
"comisiones": "",
"acuerdo": "99%",
"procedimiento": "Descripcion proyecto de ley",

"detailed" : {
{"Álvaro Uribe": {"party": "Centro Democratico", "vote": "Aprobado"},
....
....
}

}

]
```

- `camara`: Which Legislature voted
- `id`: Congresovisible.org database identifier
- `ano`: Year in which the voting took place
- `mes_dia`: month, day in which the voting took place
- `detailed`: dictionary containing the name of politicians as keys, and a json object describing their party and vote as a value.

Each line of the file should be a parsable json object.

### TSV Data

The tsv data is split in two files:

- `votes.tsv`: contains the votes of politicians in sessions, each session is an identifier referencing a session description in `sessions.csv`
- `sessions.tsv`: contains a session description, date, and legislature.

# How to use it?

- If you just want to use the data, clone this repo and go to the folder `dumps`, pick your file ^^.

- If you want to generate a new dump:

1. Create a virtualenv with python3.4
2. `pip install -r requirements.txt`
3. `python main.py`

## Examples

### Clustering Senators

![](https://d262ilb51hltx0.cloudfront.net/max/2000/1*EMhjnbqtFA5Qjf8wBWo54w.png)

`clustering.r` :

- Set your working folder to the clustering sample:
```
setwd("path...to..repo/congresovisible/samples/senators_clustering/")
```

- Run the clustering by doing: `source("clustering.r")`

- Note: please install the needed r packages.

## Contact

dav.alejandro@gmail.com

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dav009/congresovisible

Awesome Lists containing this project

README