https://github.com/vsoch/wikipedia-equations
word2vec embeddings for statistics and math equations from Wikipedia
https://github.com/vsoch/wikipedia-equations
embeddings equations math staistics wikipedia word2vec
Last synced: over 1 year ago
JSON representation
word2vec embeddings for statistics and math equations from Wikipedia
- Host: GitHub
- URL: https://github.com/vsoch/wikipedia-equations
- Owner: vsoch
- Created: 2019-01-12T19:27:44.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-10-07T16:57:47.000Z (over 6 years ago)
- Last Synced: 2025-01-28T00:36:10.801Z (over 1 year ago)
- Topics: embeddings, equations, math, staistics, wikipedia, word2vec
- Language: Jupyter Notebook
- Homepage:
- Size: 41.1 MB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Equation Mapping
[](https://zenodo.org/badge/latestdoi/165427328)
This is a dataset that uses word2vec to extract embeddings to describe equations
from statistics and math articles from wikipedia. We will do this for groups of links
that generally fall into these categories:
- [statistics](statistics)
- [mathematics](math)
For the first, we use a list of statistics articles. For the second, we do a
best effort to parse pages of math topics. You are free to use the vectors
for your analysis and efforts! Here are some interesting questions:
1. Can you build a model to predict an equation from one or more terms?
2. Can you predict terms from equations?
Please reference the README.md in each folder for further details.
## 1. Install Requirements
If you intend to try to recreate the data, for both, you need to first
install requirements, including a few libraries I created as a graduate
student, [wordfish](https://vsoch.github.io/2016/2016-wordfish/) and
[repofish](https://pypi.org/project/repofish/)
wordfish is a small library that uses gensim to run word2vec, and repofish uses it
to parse various internet resources for words, etc.
```bash
pip install -r requirements.txt
```
Then continue with instructions in the subfolder of choice. The steps are generally the same,
but the second (math) was developed after statistics.