Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/douglasrizzo/enem2012-mariadb
Scripts that import the "Exame Nacional do
https://github.com/douglasrizzo/enem2012-mariadb
Last synced: about 9 hours ago
JSON representation
Scripts that import the "Exame Nacional do
- Host: GitHub
- URL: https://github.com/douglasrizzo/enem2012-mariadb
- Owner: douglasrizzo
- Created: 2015-07-18T18:20:29.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2019-12-09T19:00:38.000Z (about 5 years ago)
- Last Synced: 2024-12-30T20:19:33.121Z (25 days ago)
- Language: TSQL
- Size: 50.8 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SQL scripts to import Enem 2012 microdata into a MariaDB/MySQL database
Enem 2012 microdata consists of 3 huge CSV files, whose structure was optimized so that the files can be imported into a software called IBM SPSS. I don't have this program, so I created a bunch of scripts to normalize and import the data into a MariaDB/MySQL database, as well as generate dichotomous responses for all 4 tests.
Of particular importance is the fact that, although there are 4 tests (natural sciences, humanities sciences, literature, mathematics) the students take in the span of 2 days, there are actually 25 different test kits. The differences between each kit is only in the order of itens. I have manually tracked down where each item appears in each of the test kits and reordered the students' answers accordingly, so that all answers to a specific item are contained in the same column.
## The scripts
- `import.sql`: imports the data into a relational database, allowing for better data querying, and normalizes the data in multiple tables, avoiding redundancy;
- `binary_responses.sql`: reads the tables created by `import.sql` and dichotomizes item responses, creating one CSV file for each of the 4 tests that were applied in Enem 2012. The files are generated from a random sample of 50000 students. The samples for each of the 4 tests is different.
- `import_treated.sql`: in case the tables created by `import.sql` are ever exported to CSV files, this script imports them into a new database, taking considerably less time.## Instructions
1. download the 2012 Enem data, available [here](http://portal.inep.gov.br/microdados) and unzip it. If the link is broken, I suggest you google it, as its location has changed quite a few times along the years.
2. In a MariaDB or MySQL database (untested) run the `import.sql` script.
3. _(Optional)_ execute `binary_responses.sql` to generate the CSV files with dichotomous responses.