An open API service indexing awesome lists of open source software.

https://github.com/commoncrawl/cc-citations

Scientific articles using or citing Common Crawl data
https://github.com/commoncrawl/cc-citations

bibliography bibtex opendata

Last synced: 4 months ago
JSON representation

Scientific articles using or citing Common Crawl data

Awesome Lists containing this project

README

          

# Common Crawl Citations – BibTeX Database

BibTex files are in [bib/](./bib/)

Note: work in progress, still contains only a fraction of recent articles

## Fields Specific for Common Crawl

The following non-standard fields are used to add information how the publications relate to Common Crawl:


cc-author-affiliation

affiliation of the authors

cc-class

classification of the publication: domain of research, topics, keywords

cc-snippet

snippet citing Common Crawl

cc-dataset-used

subset of Common Crawl used, e.g., CC-MAIN-2016-07

cc-derived-dataset-about

the publication describes a dataset which has been derived from Common Crawl, e.g., GloVe-word-embeddings

cc-derived-dataset-used

a dataset has been used which is derived from Common Crawl, e.g., GloVe-word-embeddings

cc-derived-dataset-cited

a derived dataset is cited but not used

## Formatting and Export of Citations

The [Makefile](./Makefile) contains targets to apply a consistent formatting to the citations. It also allows to export the citations. The following BibTeX tools are required: [bibtex2html](https://www.lri.fr/~filliatr/bibtex2html/), [bibclean](https://ctan.org/tex-archive/biblio/bibtex/utils/bibclean), [bibtool](http://www.gerd-neugebauer.de/software/TeX/BibTool/en/).

(Do not be confused by the pypi package bibclean, it's entirely different. bibclean, bibtool, and bibtex2html are available as OS packages, at least in apt-based distros.)

## Citations from Google Scholar Alerts

As an initial step and to get a higher coverage, citations are extracted from Google Scholar Alert e-mails received April 2016 to date. See [gscholar_alerts](./gscholar_alerts/).