An open API service indexing awesome lists of open source software.

https://github.com/gamehunterkaan/companyenum

OSINT sweep on a company name — Flask dashboard that scrapes Craft.co, Trustpilot, CareerBliss, WHOIS, and web-tech scanners in parallel.
https://github.com/gamehunterkaan/companyenum

cybersecurity cybersecurity-tools osint osint-python osint-tool python python3

Last synced: 3 months ago
JSON representation

OSINT sweep on a company name — Flask dashboard that scrapes Craft.co, Trustpilot, CareerBliss, WHOIS, and web-tech scanners in parallel.

Awesome Lists containing this project

README

          

# CompanyEnum

A Flask web app that runs a one-shot OSINT sweep on a company name and presents the findings in a live dashboard. You type a company, watch each scraper report progress in real time, and get a tabbed report covering the company profile, financials, people, web-tech posture, and customer/employee reviews.

![CompanyEnum Input](images/companyenum-input.png)

## What it collects

| Tab | Sources | Data |
| ----------- | ------------------------------------ | ------------------------------------------------------------------------------------- |
| Summary | Craft.co, Google (website fallback) | Company name, website, HQ, founded date, description, sectors, competitors |
| Financials | Craft.co | Stock price, market cap, revenue |
| People | Craft.co | Executives with roles |
| Technology | securityheaders.com, SSL Shopper, Sucuri SiteCheck, whois.com | HTTP security-header grade, raw headers, TLS cert + SANs, Sucuri ratings and recommendations, WHOIS records |
| Ratings | Trustpilot, CareerBliss | Aggregate scores and recent reviews with star widgets |

## How it works

Submissions start a background thread that runs each scraper sequentially and writes per-step state into an in-memory job store. The browser gets redirected to a loading page that polls `/status/` every 500 ms and renders each step as pending → running → done / skipped / error. When the job finishes, the page auto-navigates to `/result/`.

Scrapers in [submodules/](submodules/) are a mix of three strategies depending on how aggressive the target is about bot detection:

- **requests / BeautifulSoup** for sites that don't block (whois.com, SSL Shopper, Sucuri API).
- **cloudscraper** for Cloudflare-protected sites that still accept a good browser fingerprint (Craft.co, securityheaders.com).
- **Playwright + playwright-stealth** (headless Chromium) for sites whose anti-bot won't budge without a real browser (Trustpilot, CareerBliss).

![CompanyEnum Summary](images/companyenum-summary.png)

## Project layout

```
main.py Flask app, routes, background job runner
requirements.txt Python dependencies
submodules/
craftco.py Craft.co profile + executives scraper
trustpilot.py Trustpilot reviews (Playwright)
careerbliss.py CareerBliss reviews (Playwright)
findwebsite.py Website resolver (Craft.co field + Google fallback)
securityheaders.py securityheaders.com scanner
sslhopper.py SSL Shopper cert checker
sucuri.py Sucuri SiteCheck API client
whoisquery.py whois.com scraper
compiledata.py HTML rendering for each output tab
static/
style.css Input page (waves, gradient, search bar)
loading.css Loading page (progress bar + step list)
output-style.css Report page (cards, tags, stars, grade badge)
script.js Output page tab switching + scroll progress
templates/
input.html Search form with example chips
loading.html Live progress UI
output.html Tabbed report
```

## Installation

Requires Python 3.9+.

```bash
pip install -r requirements.txt
playwright install chromium
```

The second command downloads the headless Chromium binary Playwright needs for Trustpilot and CareerBliss.

## Running

```bash
python main.py
```

Then open `http://127.0.0.1:5000/` and enter a company name. Use the example chips on the input page for a quick test.

## Notes

- The in-memory job store is capped at 32 entries and evicts oldest-first.
- A single scan takes roughly 20-40 seconds depending on network and which scrapers stall.
- If Craft.co can't find the company, the website resolver falls back to a Google search, which may itself fail silently if Google serves a CAPTCHA; in that case the Technology tab is filled with "Company website not found" placeholders and everything else still works.
- The scrapers target real HTML and API shapes that change without warning. Expect occasional breakage when a source restructures its page.