Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arxiv/arxiv-browse
Flask app for article abstract and listing pages
https://github.com/arxiv/arxiv-browse
arxiv-ce flask in-production python
Last synced: 10 days ago
JSON representation
Flask app for article abstract and listing pages
- Host: GitHub
- URL: https://github.com/arxiv/arxiv-browse
- Owner: arXiv
- License: mit
- Created: 2017-08-02T15:54:37.000Z (over 7 years ago)
- Default Branch: develop
- Last Pushed: 2024-05-22T18:05:00.000Z (9 months ago)
- Last Synced: 2024-05-22T18:45:24.851Z (9 months ago)
- Topics: arxiv-ce, flask, in-production, python
- Language: Python
- Homepage:
- Size: 49 MB
- Stars: 102
- Watchers: 17
- Forks: 56
- Open Issues: 25
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# arxiv-browse
## Running Browse with the Flask development server
You can run the browse app directly.
```bash
make venv
````(the make rarely works)
or
```bash
python --version
# 3.11.x
python -m venv ./venv
source ./venv/bin/activate
pip install poetry==1.3.2
poetry install
python main.py
```
Note -- make sure you have python dev installed befoore doing the above steps, or the `poetry install` will fail trying to build the mySQL library dependency. E.g.: `sudo apt-get install python3.11-dev`Then go to http://127.0.0.1:8080/abs/0906.5132
This will monitor for any changes to the Python code and restart the server.
Unfortunately static files and templates are not monitored, so you'll have to
manually restart to see those changes take effect.By default, the application will use the directory trees in
`tests/data/abs_files` and `tests/data/cache` and when looking for the document
metadata and PDF files. These paths can be overridden via environment variables
(see `browse/config.py`).This method will only give you access to minimal data in the built-in test
dataset. For full (read-only) access to the production database, the following
steps are necessary:## Running with access to the production database
Prerequisites:
you are logged into gcloud and have default application credentials. This
can be achieved by calling `gcloud auth login --update-adc --force` and
logging into your @arxiv.org account.Next, **all** the following steps have to be done.
Also, in the following we assume that you two port variable set in the environment:
```
MAIN_DB_PORT=3301
LATEXML_DB_PORT=3302
```
(the numbers are not important, but must be different)### Create a .env file
First, you'd need to create the '.env' file somewhere. Using tests/.env is suggested.
```
DOCUMENT_ABSTRACT_SERVICE=browse.services.documents.db_docs
ABS_PATH_ROOT=gs://arxiv-production-data
DOCUMENT_CACHE_PATH=gs://arxiv-production-data/ps_cache
DOCUMENT_LISTING_PATH=gs://arxiv-production-data/ftp
DISSEMINATION_STORAGE_PREFIX=gs://arxiv-production-data
LATEXML_ENABLED=True
LATEXML_BUCKET="gs://latexml_document_conversions"
LATEXML_BASE_URL=//127.0.0.1:8080
FLASKS3_ACTIVE=1
CLASSIC_DB_URI=" SEE BELOW "
LATEXML_DB_URI=" SEE BELOW "
```The value of `CLASSIC_DB_URI` one can obtain by
```
MAIN_SECRET=$(gcloud secrets versions access latest --project=arxiv-production --secret=readonly-arxiv-db-uri)
echo $MAIN_SECRET | sed -e "s#/arXiv.*#127.0.0.1:${MAIN_DB_PORT}/arXiv#"
```The value of `LATEXML_DB_URI` one can obtain by
```
LATEXML_SECRET=$(gcloud secrets versions access latest --project=arxiv-production --secret=latexml_db_uri_psycopg2)
echo $LATEXML_SECRET | sed -e "s#/latexmldb.*#127.0.0.1:${LATEXML_DB_PORT}/latexmldb#"
```If you have a PyCharm,
script: main.py
Enable env files
Add tests/.env![docs/development/pycharm-run-setting.png](docs/development/pycharm-run-setting.png)
### Running cloud-sql-proxy
You can obtain the database name `MAIN_DB_NAME` from the same `$SECRET` as in the previous step:
```
MAIN_DB_NAME=$(echo $MAIN_SECRET | sed -e "s#^.*unix_socket=/cloudsql/##")
```Similar, for the `LATEXML_DB_NAME` one needs to do
```
LATEXML_DB_NAME=$(echo $LATEXML_SECRET | sed -e "s#^.*host=/cloudsql/##")
```NOTE: `cloud_sql_proxy` and `cloud-sql-proxy` (new) have different options.
```
cloud-sql-proxy --address 0.0.0.0 --port ${MAIN_DB_PORT} ${MAIN_DB_NAME}
cloud-sql-proxy --address 0.0.0.0 --port ${LATEXML_DB_PORT} ${LATEXML_DB_NAME}
```For the old version
```
cloud_sql_proxy --instances=${MAIN_DB_NAME}=tcp:${MAIN_DB_PORT}
cloud_sql_proxy --instances=${LATEXML_DB_NAME}=tcp:${LATEXML_DB_PORT}
```If the proxy is working, you can use mysql client to connect to the db.
```bash
mysql -u browse -p --host 127.0.0.1 --port ${MAIN_DB_PORT} arXiv
Enter password:
...
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.mysql> show tables;
+------------------------------------------+
| Tables_in_arXiv |
+------------------------------------------+
| Subscription_UniversalInstitution |
````### Run the main server
Run
```
python main.py`
```
should start up and run### Test suite
Run the main test suite with the following command:
```bash
pytest tests
```### Running Browse in Docker
You can also run the browse app in Docker. The following commands will build and
run browse using defaults for the configuration parameters and will use the test
data from `tests/data`. Install [Docker](https://docs.docker.com/get-docker/) if
you haven't already, then run the following:```bash
script/start
```This command will build the docker image and run it. If all goes well,
http://localhost:8000/ will render the home page.### Configuration Parameters
See `browse/config.py` for configuration parameters and defaults). Any of these
can be overridden with environment variables.### Serving static files on S3
We use [Flask-S3](https://flask-s3.readthedocs.io/en/latest/) to serve static
files via S3.After looking up the AWS keys and region and bucket:
```bash
cd arxiv-browse
git pull
AWS_ACCESS_KEY_ID=x AWS_SECRET_ACCESS_KEY=x \
AWS_REGION=us-east-1 FLASKS3_BUCKET_NAME=arxiv-web-static1 \
pipenv run python upload_static_assets.py
```In AWS -> CloudFront, select the static.arxiv.org distribution, -> Invalidations -> Create invalidation,
and enter a list of url file paths, eg: /static/browse/0.3.4/css/arXiv.css.It may be help to use a web browser's inspect->network to find the active release version.
### Tests and linting for PRs
There is a github action that runs on PRs that merge to develop. PRs for which
these tests fail will be blocked. It is the equivalent of running:```
# if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statisticspytest tests
```### Setting up pytest in PyCharm
![docs/development/pycharm-run-setting.png](docs/development/pycharm-pytest.png)