Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aserg-ufmg/csindex
Transparent data about Brazilian scientific production in Computer Science
https://github.com/aserg-ufmg/csindex
academic-publishing computer-science scientific-publishing
Last synced: 7 days ago
JSON representation
Transparent data about Brazilian scientific production in Computer Science
- Host: GitHub
- URL: https://github.com/aserg-ufmg/csindex
- Owner: aserg-ufmg
- License: mit
- Created: 2017-12-26T16:14:01.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-10-23T22:27:56.000Z (2 months ago)
- Last Synced: 2024-10-24T11:34:13.792Z (2 months ago)
- Topics: academic-publishing, computer-science, scientific-publishing
- Language: HTML
- Homepage: http://csindexbr.org
- Size: 14.5 MB
- Stars: 34
- Watchers: 9
- Forks: 21
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: citations.py
- Authors: authors.html
Awesome Lists containing this project
README
# CSIndexbr
CSIndexbr (https://csindexbr.org) provides transparent data about Brazilian scientific production in Computer Science. We index full research papers published in selected conferences and journals. The papers are retrieved from DBLP.
# Dependencies
CSIndexbr is implemented in Python 3.9 (backend scripts). The front-end uses HTML and pure Javascript.
We also use:
* [requests](https://pypi.org/project/requests/): "a simple, yet elegant HTTP library", which is used to retrieve data from DBLP.
* [xmltodict](https://pypi.org/project/xmltodict/): "a Python module that makes working with XML feel like you are working with JSON", which is used to parse the XML files returned by DBLP.# Scripts
**All these script must be called from "data" folder:**
* *./run se pl chi*: update the papers (and related data) for the listed research areas (se, pl, and chi, in the example).
* *./runall*: update the papers (and related data) for **all** research areas
* *./rundblp*: download dblp files (xml, with papers) for **all** tracked professors
* *./runcitations*: update citations for for **all** research areas
# Input files
** These files must be placed in the "data" folder: **
There are two "global" configuration files:
* [all-researchers.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/all-researchers.csv): Brazilian CS professors (i.e., from CS departments) whose papers are tracked by CSIndexber, with three columns:
* Professor name (do not use "-" or accents in names)
* University (do not use distinct names for the same university; e.g. PUC-Rio and PUC-RIO)
* DBLP PID (see in this [screenshot](https://github.com/aserg-ufmg/CSIndex/blob/master/figs/dblp-pid-screenshot.jpg) how to retrieve PIDs from DBLP profiles)
* [research-areas-config.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/research-areas-config.csv): research areas covered by CSIndexbr, with two columns:
* research area acronym (e.g., se)
* minimum size of the conference papers indexed in this area (e.g., 10).The following files are specific of a given research area (i.e., each area has all files listed next; although, in this list, we are using "se" as example):
* [se-confs.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-confs.csv): conferences and journals indexed in a given research area ("se", in this case), with three columns:
* venue name at DBLP:
* for conferences, use "booktitle" XML entry, see [example](https://dblp.uni-trier.de/rec/xml/conf/esem/CoelhoVSS18.xml);
* for journals, use "journal" XML entry; see [example](https://dblp.uni-trier.de/rec/xml/journals/jss/BritoHVR18.xml)
* venue name in the charts and tables generated by CSIndexbr
* venue type, as follows:* 1: top-conference
* 2: not used anynore
* 3: "regular" conference (i.e., non-top)
* 4: top-journal
* 5: "regular" journal (i.e., non-top)
* 6: magazine or journal that accept short papers (>= 6 pages)
* 7: journals with low normalized-h5-index (see [FAQ](https://csindexbr.org/faq.html), for details)* [se-black-list.txt](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-black-list.txt): list of papers that **must not** be indexed, although they attend the basic indexing criteria. For example, they are papers published in other tracks, that is not the main research track of a conference. Each line contains the "url" XML field of the paper (see [example](https://dblp.uni-trier.de/rec/xml/conf/icse/NetoCLGM13.xml))
* [se-white-list.txt](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-white-list.txt): list of papers that **must** be indexed. For example, papers that do not have page numbers at DBLP metadata (see [example](https://dblp.uni-trier.de/rec/xml/journals/smr/SilvaVBAE17.xml))
# Output files
** These files are generated in the "data" folder: **
Examples assuming "se" research area:
* [se-out-confs.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-confs.csv): number of papers in indexed conferences
* [se-out-journals.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-journals.csv): number of papers in indexed journals
* [se-out-profs-list.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-profs-list.csv): professores with indexed papers in the area (and their departments)
* [se-out-profs.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-profs.csv): number of professores with indexeded papers (in the area) per department
* [se-out-scores.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-scores.csv): department scores (see formula in the [FAQ](https://csindexbr.org/faq.html))
* [se-out-papers.csv](https://github.com/aserg-ufmg/CSIndex/blob/master/data/se-out-papers.csv): metadata about indexed papers: year, venue, title, deparments, authors, doi, top or null (otherwise), journal (J) or conference (C), arxiv url or no_arxiv (otherwise), and number of citations# License:
MIT (for the source code) and CC BY-NC-SA 4.0 (for the data).