https://github.com/baggepinnen/thesistools.jl

Some tools for working with a thesis written in Latex
https://github.com/baggepinnen/thesistools.jl

Last synced: 7 months ago
JSON representation

Some tools for working with a thesis written in Latex

Host: GitHub
URL: https://github.com/baggepinnen/thesistools.jl
Owner: baggepinnen
License: mit
Created: 2018-12-20T14:07:36.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2022-11-26T07:03:25.000Z (almost 3 years ago)
Last Synced: 2025-01-22T04:13:26.118Z (9 months ago)
Language: Julia
Size: 92.8 KB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # ThesisTools

This Julia package contains some tools I wrote while writing my [thesis](https://lup.lub.lu.se/search/publication/ffb8dc85-ce12-4f75-8f2b-0881e492f6c0).

# Functions

```julia

"""

    text, headings = process(filename, [sectionsplit::String])

Reads a tex file and removes all tex-code to produce a clean output without environments or commands (thus, all figure captions will be removed). If the optinal `sectionsplit` is set, splits the string into a vector at the specified section level.

`sectionsplit` ∈ ["part", "chapter", "section", "subsection"...]

"""

function process(filename, sectionsplit)

```

`process` is the main function, it takes your main tex file and compiles all text into a string by following `\include` and `\input` commands. It also detexes the string using the function `detex` below. If `sectionsplit` is provided, `text` will be a vector with, e.g., a string for each chapter/section etc.

---

```julia

"""

    text = compile(filename)

Return a string representing the tex-document. Follows \\input{} recursively.

See `process` for a function doing everyting you want ;)

"""

compile(filename)

```

`compile` handles the compilation of many separate tex-files into one string.

---

```julia

"""

outputtext = detex(inputtext)

Removes preamble, environments and latex tags from the inputtext

"""

function detex(t)

```

`detex` tries to remove all texiness from the string. It removes `\commands`, `$math$` and `$$math$$`, `% comments`, and all **environments** such as `\begin{figure} ... \end{figure}` (this means that all captions are removed also :/ )

---

```julia

"""

ϕ,θ,topics = categorize(crps, ntopics=8;

    iters           = 2010,     # number of gibbs sampling iters in lda

    α               = 1/ntopics,# hyper parameter topics per document

    β               = 0.001,    # hyper parameter words per topic

    words_per_topic = 30)

See `lda` for more help on options.

"""

function categorize(crps, ntopics=8;

    iters           = 2010,     # number of gibbs sampling iterss

    α               = 1/ntopics,# hyper parameter topics per document

    β               = 0.001,    # hyber parameter words per topic

    words_per_topic = 30)

```

`categorize` performs LDA on the corpus `crps`, see the usage example below.

---

```julia

"""

find_missing(filename, opening_char, closing_char)

Locates missing brackets etc. in a Latex document.

Example: `find_missing(thesis.tex, '{', '}')

Does not work if opening and closing chars are the same, e.g., \$ \$

"""

function find_missing(filename, oc, cc)

```

---

```julia

wikiscan(text)

```

Look for misspelled words etc. using Wikipedia regexes.

# Example usage

```julia

using ThesisTools, TextAnalysis

using TextAnalysis: sentence_tokenize, text

filename                    = "/local/home/fredrikb/phdthesis/phdthesis.tex";

chapters1, headings1        = process(filename, "chapter");

valid_chapter_inds          = length.(chapters1) .> 400;

valid_chapter_inds[[3,16]] .= false;

chapters                    = chapters1[valid_chapter_inds];

headings                    = headings1[valid_chapter_inds];

docs                        = StringDocument.(deepcopy(chapters));

crps                        = Corpus(deepcopy(docs));

prepare!(crps, strip_corrupt_utf8 | strip_case | strip_articles | strip_prepositions | strip_pronouns | strip_stopwords | strip_whitespace | strip_non_letters | strip_numbers)

# stem!(crps)

update_lexicon!(crps)

ϕ,θ,topics = ThesisTools.categorize(crps, 4); # LDA: Latent Dirichlet Allocation, takes about 10 seconds for a 160 page thesis and 4 categories.

julia> topics

30×4 Array{String,2}:

 "model"        "calibration"   "robot"         "model"

 "friction"     "matrix"        "seam"          "system"

 "functions"    "methods"       "sensor"        "time"

 "basis"        "data"          "measurement"   "learning"

 "function"     "method"        "error"         "models"

 "estimation"   "estimate"      "filter"        "dynamics"

 "signal"       "thesis"        "laser"         "function"

 "parameters"   "using"         "particle"      "optimization"

 "proposed"     "parameters"    "errors"        "linear"

 "position"     "linear"        "measurements"  "trajectory"

 "models"       "plane"         "trajectory"    "data"

 "spectral"     "sensor"        "uncertainty"   "regularization"

 "temperature"  "set"           "distribution"  "identification"

 "estimated"    "laser"         "forces"        "algorithm"

 "using"        "system"        "model"         "control"

 "matrix"       "based"         "estimation"    "jacobian"

 "method"       "frame"         "gaussian"      "prior"

 "parameter"    "procedure"     "estimator"     "noise"

 "velocity"     "algorithm"     "space"         "parameters"

 "data"         "vector"        "fsw"           "input"

 "dependence"   "approach"      "tracking"      "methods"

 "form"         "coordinate"    "forward"       "nonlinear"

 "due"          "initial"       "process"       "using"

 "squares"      "flange"        "tool"          "form"

 "varying"      "machine"       "function"      "optimal"

 "estimate"     "found"         "sensors"       "systems"

 "linear"       "kinematic"     "based"         "solution"

 "methods"      "optimization"  "deflections"   "weight"

 "dependent"    "research"      "kinematics"    "decay"

 "motor"        "rotation"      "modeling"      "network"

julia> topicnames = [ # These have to be manually arranged based on the words appearing in `topcis`

       "Modeling",

       "State Estimation and Calibration",

       "Robotics",

       "Learning Dynamics",

       ];

julia> using Plots

julia> heatmap(θ, yticks=(1:4, topicnames),xticks=(1:length(headings), headings), ylabel="Topic", xlabel="Chapter", size=(2000,600), color=:blues, xrotation=45);gui()

```

![window](lda.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/baggepinnen/thesistools.jl

Awesome Lists containing this project

README