Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pratyush1712/data-engineering

Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.
https://github.com/pratyush1712/data-engineering

flask nextjs nlp parallel-computing selenium-webdriver server-side-events

Last synced: 9 days ago
JSON representation

Cornell Financial Data Collection leverages Python, Selenium, and NLP to aggregate and analyze financial data from Cornell's corporate donors, offering a unique exploration of data collection and analysis techniques.

Awesome Lists containing this project

README

        

# Cornell Innovation and Entrepreneurship - Data Analysis Platform

Centralized data analysis platform for the Cornell Innovation and Entrepreneurship Lab. This repository contains scripts for data collection, data cleaning, and data analysis.

## Getting Started

### Prerequisites

- Python 3.9
- pip
- virtualenv
- Cornell Email

### Installation

1. Clone the repository

```bash
git clone
```

2. Create a virtual environment

```bash
virtualenv venv
```

3. Activate the virtual environment

```bash
source venv/bin/activate
```

4. CD into the server repository

```bash
cd server
```

5. Install the dependencies

```bash
pip install -r requirements.txt
```

6. Create a .env file in the server directory

```bash
touch .env
```

7. Add the following environment variables to the .env file

```bash
export CORNELL_NETID = "your_cornell_netid"
export CORNELL_PASSWORD = "your_cornell_password"
export CAPITAL_IQ_USERNAME = "your_capital_iq_username"
export CAPITAL_IQ_PASSWORD = "your_capital_iq_password"
```

8. Source the .env file

```bash
source .env
```

9. Run the server

```bash
python app.py
```

10. Open a new terminal window and CD into the client repository

```bash
cd cornell-data
```

11. Install the dependencies

```bash
npm install
```

12. Run the client

```bash
npm start
```

## Usage

The platform could be used to collect companies data in the following ways:

1. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, individually.

```bash
cd scraping
```

```bash
python index.py --source
```

2. Collecting data of list of companies from Capital IQ, Mergent Intellect, or Guidestar websites, in bulk.

```bash
python index.py
```