https://github.com/stufield/stufield
Personal GitHub profile repository
https://github.com/stufield/stufield
Last synced: about 1 year ago
JSON representation
Personal GitHub profile repository
- Host: GitHub
- URL: https://github.com/stufield/stufield
- Owner: stufield
- Created: 2020-08-18T03:45:13.000Z (almost 6 years ago)
- Default Branch: main
- Last Pushed: 2024-12-04T15:02:36.000Z (over 1 year ago)
- Last Synced: 2025-01-25T06:13:08.589Z (over 1 year ago)
- Size: 12.2 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Welcome To My Homepage!
Senior Data Scientist โจ Machine Learning ๐ Model Development ๐ง Tool Development ๐ป Software Programmer ๐ฆ R Software ๐ Leadership
-----------
> "Making predictions is easy ... making accurate ones is much more difficult."
> -- Me --
-----------
ย
## About Me 
I love to solve problems.
Often the problem can be understanding a complex biological process, but
it can also be as simple as fixing something that's broken
(e.g. a door that jams, a bicycle, or even machine learning software).
In particular, I like to apply my data science skills to better
understand, or even solve, the problems we face.
Over the past 12+ years I have combined my statistical knowledge
and Open-Source Software tools to solve complex problems in the
Life Sciences proteomics (high dimensional) space.
In so doing, I have created a comprehensive R-based machine
learning analysis ecosystem that standardizes and enables
biomarker discovery and predictive model development.
Sometimes the problem is inconsistency across teams or analysts ...
thus I promote adherence of "tidy" data principles and am a
strong proponent reproducible research and use of bioinformatics pipelines.
Other times the problem can be sharing results across the organization ...
thus developing an Application Program Interface (API) infrastructure
that enables anyone to access model results with ease.
With my teaching background, I find it important to mentor junior team members
while simultaneously leading more senior members. This collaborative spirit
is essential to building and effective team that delivers to stakeholders,
fosters a sense of accomplishment, and drives revenue generation.
I am always open to discuss possible roles ๐ญ and whether my skill set
can solve problems in your space. Please reach out via:
| How | Where |
|:--- |:------------ |
| ๐ซ | [](mailto:stu.g.field@gmail.com) |
| โ๏ธ | 720.259.9982 |
| ๐ | www.linkedin.com/in/stu-field-sr-data-sci |
--------------------
### Skills
| Machine Learning ๐ | Statistics ๐ | Open-Source ๐ป | Software Tools ๐ง |
|:---------------------- |:-------------------- |:-------------- |:----------------- |
| Random Forest | Logistic regression | R | Linux๐ง, MacOS ๐ |
| Naive Bayes | Linear regression | C++ | Git, GitHub :octocat: |
| Lasso/ridge regression | GLMMs | Python ๐ | AWS |
| k-Nearest neighbour | Mixed-effects models | LaTeX | BASH, GNU |
| PCA | Survival analysis | CI/CD | BitBucket |
| Ensemble methods | Multivariate statistics | Docker ๐ | Slack |
| Maximum Likelihood | ANOVA | | Kubernetes |
#### Application of Skill Set
- **Data Analysis:** created high-dimensional,
high-throughput, multi-plex, proteomics machine
learning analysis ecosystem which enabled (and standardized)
biomarker discovery and model development across analysts.
- **Project Leadership:** led highly successful Open-Source Software (OSS)
initiative enabling customers to not only understand highly complex
analysis concepts in the proteomics space, but to conduct those analyses themselves.
- **Analysis Reports:** generated standardized analysis templates
enabling reproducible research and results across the organization.
- **Leadership:** successfully led a team of 3-5 direct reports through
analyses, code review, self-enablement, and career development.
- **Written Accomplishment:** proven ability to summarize complex analyses via strong
publication record.
## Tech Notes & Vignettes ๐
| Topic ๐ | Thumbnail ๐ |
|:------------------------------------------------------ |:----------------:|
| [False Discovery](articles/false-pos-q-values.md) |
|
| [Mixture Models](articles/mixture-em.md) |
|
| [Logistic Regression](articles/logistic-regression.md) |
|
| [Naive Bayes](articles/naive-bayes-tech-note.md) |
|
| [The Birthday Paradox](articles/birthday-paradox.md) |
|
| [Mack-Wolfe Tests](articles/mack-wolfe.md) |
|
| [Mixed Effects](articles/mixed-effects-models.md) |
|
| [Monty Hall Paradox](articles/monty-hall-paradox.md) |
|
| [Decision Boundaries](articles/decision-boundaries.md) |
|
| [Class Imbalance](articles/class-imbalance.md) |
|
## Baseball
| Topic ๐ | Thumbnail ๐ |
|:------------------------------------------------------ |:----------------:|
| [Pitch Classifier](articles/baseball-strike-classifier.md) |
|
----------------
#### Other Interests
- ๐ฌ Favorite food: ๐ ๐ฎ
- ๐ I am currently learning woodworking๐ชต ... I'm not very good, but I can make a lot of sawdust!
- ๐ฌ Ask me about: bikes and `R` ... I'll talk your๐ off!
- ๐ด I'm an avid cyclist:
come say hi on [
][5]
----------------
### More Details
- I maintain several `R` software libraries (๐ฆ) that implement
statistical and machine learning techniques in biomarker discovery.
Some of my popular published ๐ฆ are:
- [SomaDataIO](https://cran.r-project.org/web/packages/SomaDataIO/index.html)
([CRAN](https://cloud.r-project.org/))
- [SomaPlotr](https://github.com/SomaLogic/SomaPlotr)
- [gitr](https://github.com/stufield/gitr)
- [helpr](https://github.com/stufield/helpr)
- [power](https://github.com/stufield/power)
- These projects support analyses in the general Life Sciences (BioTech)
space to generate proteomic based insights in health spaces such as:
- cardiovascular disease
- liver disease (NASH/NAFLD)
- alcohol effects
- biological aging
- exercise status
- metabolic disease
- Favorite techniques:
- random forest
- logistic regression (ol' faithful)
- naive Bayes
- KKNN (nearest neighbor)
- survival analyses
- ensemble methods
- I am a proponent of the open-source software, conducting the majority
of my research/analysis via Linux toolkits, R, and the RStudio IDE.
- I promote conforming to the adherence of so-called "tidy" data, a
philosophy of data science designed to share underlying data
structure, grammar, and format which facilitates the generation
of reproducible analyses.
-------------------
### ๐ง Tools & Languages














### ๐ง GitHub Commits












------------
### ๐ GitHub Stats
#### Contributions
------------
### ๐ Links & Resources
- [https://github.com/MartinHeinz/MartinHeinz](https://github.com/MartinHeinz/MartinHeinz)
- [https://dplyr.tidyverse.org/articles/programming.html](https://dplyr.tidyverse.org/articles/programming.html)
------------
[1.1]: http://i.imgur.com/tXSoThF.png (twitter icon with padding)
[2.1]: http://i.imgur.com/0o48UoR.png (github icon with padding)
[1.2]: http://i.imgur.com/wWzX9uB.png (twitter icon without padding)
[2.2]: http://i.imgur.com/9I6NRUm.png (github icon without padding)
[3.2]: https://raw.githubusercontent.com/stufield/stufield/main/linkedin-3-16.png
[4.2]: https://raw.githubusercontent.com/stufield/stufield/main/icons8-instagram-24.png
[5.1]: https://raw.githubusercontent.com/stufield/stufield/main/strava-icon.svg
[5.2]: https://raw.githubusercontent.com/stufield/stufield/main/icons8-strava-24.png
[1]: https://twitter.com/stufield3
[2]: https://github.com/stufield
[3]: https://www.linkedin.com/in/stu-field-133396a
[4]: https://www.instagram.com/carlito_caliente/
[5]: https://www.strava.com/athletes/3292229