https://github.com/ajwad-shaikh/ibmocha
A Caffeinated Solution To Privacy - Hack Solution for Abhikalpan 2k19 - IBM Hackathon | IBM Watson NLU
https://github.com/ajwad-shaikh/ibmocha
aadhaar data-representation ejs gdpr hacktoberfest hacktoberfest2019 ibm-bluemix ibm-cloud ibm-watson node-js privacy redaction sensitive-data-discovery sensitive-data-exposure sensitive-word-filter watson-api watson-natural-language watson-nlu watson-nlu-api
Last synced: about 2 months ago
JSON representation
A Caffeinated Solution To Privacy - Hack Solution for Abhikalpan 2k19 - IBM Hackathon | IBM Watson NLU
- Host: GitHub
- URL: https://github.com/ajwad-shaikh/ibmocha
- Owner: ajwad-shaikh
- License: mit
- Created: 2019-03-02T06:56:13.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-09T14:27:29.000Z (almost 3 years ago)
- Last Synced: 2025-04-04T09:26:39.551Z (6 months ago)
- Topics: aadhaar, data-representation, ejs, gdpr, hacktoberfest, hacktoberfest2019, ibm-bluemix, ibm-cloud, ibm-watson, node-js, privacy, redaction, sensitive-data-discovery, sensitive-data-exposure, sensitive-word-filter, watson-api, watson-natural-language, watson-nlu, watson-nlu-api
- Language: HTML
- Homepage:
- Size: 1.8 MB
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# IBMocha
### A Caffeinated Solution To Privacy
```
Modern Problems Require Modern Solutions
```
In the race to big data solutions and data-driven analytics, it is important to preserve the privacy of the information source as data propagates into the loops of the Internet.
IBMocha is a hack on IBM Watson NLU tools to utilize the power of Machine Learning Cloud Infrastructure to redact sensitive information on the Internet.
If you are a web-admin, you can use this code to look for potential exposure of private data on your pages. This can help you screen your website for possible **GDPR Violations**.
IBMocha is also modelled to target the recent [outbreak of Aadhaar Card Data](https://github.com/fs0c131y/AadhaarSearchEngine) that exploited search engine crawlers.
### Exposures Identified
1. Individual Names
2. Location
3. Email Addresses
4. Phone numbers
5. Aadhaar Numbers (primitive) (XXXX-XXXX-XXXX format)
### Get API credentials
1. Go to [IBM Cloud Console](https://console.bluemix.net/dashboard/apps/) -> Login/Register -> Visit Dashboard
5. Visit Catalog -> AI -> Natural Language Understanding or visit [Natural Language Understanding](https://console.bluemix.net/catalog/services/natural-language-understanding)
6. Create a Watson NLU Service
7. Go to [Dashboard](https://console.bluemix.net/dashboard/apps/)
8. Select your newly created Natural Language Understanding service
9. Go to Service Credentials tab
10. Create new credentials if it doesn't show up
11. Click view credentials
12. Create `config.json` in root directory of repo
13. Paste the credentials in json format in `config.json`
14. Add `config.json` to `.gitignore` to avoid misuse
### Development
1. Clone repo
`git clone https://github.com/ajwad-shaikh/IBMocha.git`
2. `cd IBMocha`
3. Install dependencies
`npm install`
14. `npm install nodemon -g`
15. `npm run serve` (`win-serve` if Windows Machine)
### Usage
1. open `localhost:8008`
2. There are two modes of input - **Text** and **URL**
3. **URL Mode** - Enter URL and click on submit to analyse the website for personal information exposure using **IBM Watson NLU Service**


4. **Text Mode** - Enter text and click on submit to analyse the text for personal information exposure using **IBM Watson NLU Service**.


5. Text Mode also renders a redacted preview that masks personal information.

### Further Development
* Include PDF file input.
* Include redacted website preview.
* Include PDF output.