Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mjlee111/arxiv-crawler-py
Arxiv Paper crawling python code. Supports Markdown table output
https://github.com/mjlee111/arxiv-crawler-py
Last synced: 10 days ago
JSON representation
Arxiv Paper crawling python code. Supports Markdown table output
- Host: GitHub
- URL: https://github.com/mjlee111/arxiv-crawler-py
- Owner: mjlee111
- License: apache-2.0
- Created: 2024-11-05T06:36:48.000Z (12 days ago)
- Default Branch: master
- Last Pushed: 2024-11-05T06:55:50.000Z (12 days ago)
- Last Synced: 2024-11-05T07:29:06.433Z (12 days ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Arxiv-Crawler-Py
Python scripts for crawling arXiv papers and converting CSV to Markdown tables.
[![Python](https://img.shields.io/badge/Python-3.8%2B-blue)](https://www.python.org/)
[![License](https://img.shields.io/badge/License-Apache-green.svg)](LICENSE)## Overview
This repository contains two Python scripts:1. `crawler.py`: Crawls arXiv papers and saves them to a CSV file.
2. `csv_to_markdown_table.py`: Converts a CSV file to a Markdown table.## Requirements
| Component | Version | Notes |
| --------- | ------- | ----- |
| Python | 3.8+ | |
| BeautifulSoup4 | Latest | For XML parsing |
| Pandas | Latest | For data handling |
| Requests | Latest | For HTTP requests |## Development Environment
| Component | Version |
| --------- | ------- |
| Python | 3.10 |
| Cursor AI | Latest |
| OS | Ubuntu 22.04 LTS |## Scripts
| Script | Description | Usage |
| --------- | ------- | ----- |
| crawler.py | Crawls arXiv papers and saves them to a CSV file. | `python crawler.py --start_year 2021 --end_year 2024 --max_results 1000` |
| csv_to_markdown_table.py | Converts a CSV file to a Markdown table. | `python csv_to_markdown_table.py [markdown_file_path]` |## Features
`crawler.py`
- Search papers by topic
- Filter by year range
- Configurable maximum results
- Exports results to CSV
- Automatic rate limiting
- Error handling for missing data`csv_to_markdown_table.py`
- Converts CSV to Markdown table format
- Collapsible summary sections
- Automatic file naming
- Handles special characters
- Preserves data formatting## Usage
`crawler.py`
```bash
$ python3 crawler.py
# Follow the interactive prompts:
# 1. Enter search topic
# 2. Enter start year
# 3. Enter end year
# 4. Enter maximum results (-1 for unlimited)
````csv_to_markdown_table.py`
```bash
# With custom markdown filename
$ python3 csv_to_markdown_table.py input.csv output.md# Using default filename (same as CSV)
$ python3 csv_to_markdown_table.py input.csv
```## Contributing
I welcome all contributions! Whether it's bug reports, feature suggestions, or pull requests, your input helps me to improve. If you're interested in contributing, please check out my contributing guidelines or submit an issue.## License
This project is licensed under the [Apache 2.0 License](LICENSE). Feel free to use and distribute it according to the terms of the license.## Contact
If you have any questions or feedback, don't hesitate to reach out! You can contact me at [[email protected]][email].[email]: mailto:[email protected]