Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scanner-research/tv-news-viewer
Interactive exploration of a decade of TV news
https://github.com/scanner-research/tv-news-viewer
Last synced: 2 months ago
JSON representation
Interactive exploration of a decade of TV news
- Host: GitHub
- URL: https://github.com/scanner-research/tv-news-viewer
- Owner: scanner-research
- License: bsd-3-clause
- Created: 2019-03-19T19:14:52.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-06-03T17:25:03.000Z (8 months ago)
- Last Synced: 2024-06-03T20:16:20.487Z (8 months ago)
- Language: JavaScript
- Homepage:
- Size: 5.42 MB
- Stars: 4
- Watchers: 4
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Stanford TV News Analyzer
#### Setup instructions
1. Install Rust (see https://rustup.rs/)
2. Clone submodules: `git submodule init && git submodule update`
3. Run `./install_deps.sh` to install the submodules
4. `cd vgrid-widget` and run `./install.sh`.
- This will require `npm` and other javascript dependencies
(`npm install --save react react-dom mobx mobx-react`).
Install them as needed.
- Once this succeeds, `cd ..` to return to the top level.
5. Install python dependencies: `pip3 install -r requirements.txt`
6. Copy/symlink the indexed captions as `data/index`
7. Copy/symlink the data directory as `data`
8. Run `./derive_data.py` to generate derived data
9. Run `./develop.py` to start a development server or edit `config.json` to
serve using wsgi.#### Running tests
Run `pytest -vs tests` from the top directory.
#### Indexed captions directory
There should be 4 entries in this directory
- `documents.txt` (a list of documents that are indexed)
- `lexicon.txt` (a list of all the words)
- `index.bin` (a directory or inverted index file)
- `data` (a directory of all the binary encoded captions)#### Data directory
The data directory consists of the following files and directories:
- `videos.json` (metadata about the videos)
- `faces.ilist.bin` (intervals when faces are on screen)
- `people` (directory containing intervals when identified people are on screen)
- `people.metadata.json` (optional; JSON dictionary of names to metadata tags)
- `hosts.csv` (optional; a list of people and channels that they are hosts of)
- `face-bboxes` (directory containing face bounding boxes)
- `derived` (this directory is generated by `./derive_data.py`)#### Terminology in the code
- IntervalList (or ilist) - These are files that store intervals with a binary
bit-vector payload. The intervals can overlap, but must be sorted by start
time.
- IntervalSet (or iset) - These are files that store non-overlapping intervals,
sorted by start time. Unlike IntervalList, there is no bit-vector payload.