Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pratyush1712/data-engineering
Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.
https://github.com/pratyush1712/data-engineering
flask nextjs nlp parallel-computing selenium-webdriver server-side-events
Last synced: 9 days ago
JSON representation
Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.
- Host: GitHub
- URL: https://github.com/pratyush1712/data-engineering
- Owner: pratyush1712
- Created: 2023-08-16T18:40:27.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-08-16T18:41:21.000Z (over 1 year ago)
- Last Synced: 2024-11-07T04:14:17.910Z (about 2 months ago)
- Topics: flask, nextjs, nlp, parallel-computing, selenium-webdriver, server-side-events
- Language: Python
- Homepage:
- Size: 437 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cornell Innovation and Entrepreneurship - Data Analysis Platform
Centralized data analysis platform for the Cornell Innovation and Entrepreneurship Lab. This repository contains scripts for data collection, data cleaning, and data analysis.
## Getting Started
### Prerequisites
- Python 3.9
- pip
- virtualenv
- Cornell Email### Installation
1. Clone the repository
```bash
git clone
```2. Create a virtual environment
```bash
virtualenv venv
```3. Activate the virtual environment
```bash
source venv/bin/activate
```4. CD into the server repository
```bash
cd server
```5. Install the dependencies
```bash
pip install -r requirements.txt
```6. Create a .env file in the server directory
```bash
touch .env
```7. Add the following environment variables to the .env file
```bash
export CORNELL_NETID = "your_cornell_netid"
export CORNELL_PASSWORD = "your_cornell_password"
export CAPITAL_IQ_USERNAME = "your_capital_iq_username"
export CAPITAL_IQ_PASSWORD = "your_capital_iq_password"
```8. Source the .env file
```bash
source .env
```9. Run the server
```bash
python app.py
```10. Open a new terminal window and CD into the client repository
```bash
cd cornell-data
```11. Install the dependencies
```bash
npm install
```12. Run the client
```bash
npm start
```## Usage
The platform could be used to collect companies data in the following ways:
1. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, individually.
```bash
cd scraping
``````bash
python index.py --source
```2. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, in bulk.
```bash
python index.py
```