https://github.com/appeler/clean-names
Deduplicate and parse list of `dirty names'
https://github.com/appeler/clean-names
firstname lastname
Last synced: 5 months ago
JSON representation
Deduplicate and parse list of `dirty names'
- Host: GitHub
- URL: https://github.com/appeler/clean-names
- Owner: appeler
- Created: 2015-01-29T01:19:04.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2020-11-04T08:28:23.000Z (over 5 years ago)
- Last Synced: 2025-09-09T09:30:19.041Z (9 months ago)
- Topics: firstname, lastname
- Language: Python
- Homepage:
- Size: 41 KB
- Stars: 23
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe.md
Awesome Lists containing this project
README
### Clean Names
[](https://travis-ci.org/appeler/clean-names)
[](https://ci.appveyor.com/project/appeler/clean-names)
The script takes a csv file with column 'Name' containing 'dirty names' --- names with all different formats: lastname firstname, firstname lastname, middlename lastname firstname etc. (see [sample input file](sample_input.csv)). And it produces a csv file that has all the columns of the original csv file and the following columns: 'uniqid', 'FirstName', 'MiddleInitial/Name', 'LastName', 'RomanNumeral', 'Title', 'Suffix'. The script takes out duplicate names by default (see [sample output file](sample_output.csv)).
#### Application
The script was used to fix names in CF-Scores from [Database on Ideology, Money in Politics, and Elections](http://data.stanford.edu/dime). Processed database with clean names posted on [Harvard DVN](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28949).
#### Installation
1. Clone this repository
git clone https://github.com/soodoku/clean-names.git
2. Navigate to clean-names
3. Run `python setup.py install`
#### Using Clean Names
Usage: `process_names.py [options]`
#### Command Line Options
```
-h, --help show this help message and exit
-o OUTFILE, --out=OUTFILE
Output file in CSV (default: sample_output.csv)
-c COLUMN, --column=COLUMN
Column name in CSV that contains Names (default: Name)
-a, --all
Export all names (do not take duplicate names out) (default: False)
```
#### Example
python process_names.py -a sample_input.csv
### License
Scripts are released under the [MIT License](https://opensource.org/licenses/MIT)