https://github.com/ittia-research/check
Automated fact-check
https://github.com/ittia-research/check
embedding factcheck llm rag rerank
Last synced: about 1 month ago
JSON representation
Automated fact-check
- Host: GitHub
- URL: https://github.com/ittia-research/check
- Owner: ittia-research
- License: mit
- Created: 2024-07-22T10:46:16.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-10-12T12:52:37.000Z (over 1 year ago)
- Last Synced: 2025-04-24T00:12:37.661Z (about 1 year ago)
- Topics: embedding, factcheck, llm, rag, rerank
- Language: Python
- Homepage: https://check.ittia.net
- Size: 180 KB
- Stars: 7
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
True, false, or just opinions? Maybe not binary, but a percentage.
Fact-checking tools to combat disinformation.
## How It Works
Extract a list of statements from given text.
For each statement search via search engine and read the top URLs.
For each hostname as one source, extract most related info from the read content.
For each sources, generate one verdict and citation with the extracted content.
Combine all verdicts of one statements into a final verdict.
Return a list of statements with verdicts, citations and others related.
## Get Started
Online demo: https://check.ittia.net
Using existing API: https://github.com/ittia-research/check/tree/main/packages/ittia_check
### Self-hosting API Server
Main components:
- Check server: see docker-compose.yml
- LLM: any OpenAI compatible API, self-hosting via vllm or Ollama
- Embedding: self-hosting via Ollama or Infinity
- Rerank: self-hosting via Infinity
- Search: https://search.ittia.net
### Other Tools
- Start a wiki_dpr retrieval server (ColBERTv2) for development: https://github.com/ittia-research/check/tree/main/datasets/wiki_dpr
### Search backend
- Using `search.ittia.net` for better optimization.
- Features:
- Customizable source count.
- Supports search sessions: streaming, resuming.
- Utilizes state-of-the-art search engine (currently Google).
## Design
Input something.
Analyze percentage of facts and opinions.
Fact-check like what a researcher will do:
* Use search engine as data source and AI as the verdict.
Output analysis:
* MARKDOWN as the default format, JSON optional.
### Philosophy:
- For new information, doubts as default, fact-check follows.
### Elements
Input types:
- facts
- opinions
- questions
Verdicts:
- false
- true
- tie: false and true verdicts counts are the same and above zero
- irrelevant: context processed irrelevant to the statement
## References
### Fact-check
- https://www.snopes.com
- https://www.bmi.bund.de/SharedDocs/schwerpunkte/EN/disinformation/examples-of-russian-disinformation-and-the-facts.html
## Acknowledgements
- TPU Research Cloud team at Google
- Google Search
- Jina Reader