Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tebe-nigrelli/personal-analysis
Data analysis of personal habits
https://github.com/tebe-nigrelli/personal-analysis
emacs jupyter-notebook org-mode python quantified-self
Last synced: 20 days ago
JSON representation
Data analysis of personal habits
- Host: GitHub
- URL: https://github.com/tebe-nigrelli/personal-analysis
- Owner: tebe-nigrelli
- License: cc0-1.0
- Created: 2024-11-02T17:13:53.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-20T18:00:40.000Z (about 2 months ago)
- Last Synced: 2024-11-20T18:26:22.261Z (about 2 months ago)
- Topics: emacs, jupyter-notebook, org-mode, python, quantified-self
- Language: Jupyter Notebook
- Homepage:
- Size: 102 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# This notebook
As mentioned in my [doom-emacs repository](https://github.com/tebe-nigrelli/doomemacs-config), I like to collect data on my habits and analyze it using mathematical methods.
This page gives an outline for my process. The full jupyter notebooks are available for reference ([1](Common%20Agenda%20AnalysisFramework.ipynb), [2](Summary%20Agenda%20Analysis.ipynb)), while here I have highlighted some features and results. The code is quite messy and begging to be formatted, which I intend to do eventually.
For the time being, I use two notebooks, one to understand single events, and the other to study daily, weekly or monthly trends. The image below gives a sense of how data is first created in Emacs and is exported to different formats using three scripts.
![](assets/programs_sketch.png)
Skip to the Data Analysis
- [Clustering](#clustering): creating categories of days and interpreting them.
- [Correlations](#correlations): studying how time dedicated to each activity relate.
- [Energy Function](#energy-function): using a centextualised energy function to describe days and understand productivity.# Data Collection
I store most of my data through a customised version of program called doom-emacs, which I published [here](https://github.com/tebe-nigrelli/doomemacs-config).
For example, whenever I watch a movie, I open [Emacs](https://www.gnu.org/software/emacs/tour/) and use some keyboard shortcuts to quickly record an entry in one of my log files. For the writing format, I use [org-mode](https://orgmode.org/), one of the available plugins for Emacs, which I extended slightly to include the time zone in the time stamps.
Original
ExtendedC fmt
[%Y-%m-%d %a %H:%M]
[%Y-%m-%d %a %H:%M %z]org-mode
The following excerpt shows how a typical entry is stored in the .org text format. Here, the 'Film' subheading, which falls under the tag/category MDI (ie. Media), has a single log, lasting for the time between two time stamps. This amounts to one hour and fifty-five minutes, with 'Dune' as entry note.
```
** Film :MDI:
:LOGBOOK:
CLOCK: [2024-10-16 Wed 22:24 +0200]--[2024-10-17 Thu 00:20 +0200] => 1:56
- Dune
:END:
```# Exporting
The data is exported to a csv table using [Jeff Filipovits](https://github.com/legalnonsense)'s brilliant [org-csv-export](https://github.com/legalnonsense/org-clock-export) package. It is particularly useful because it was designed to be extensible: users can define functions that retrieve data for each row, adding the results to the export file. The full code is available at [org-clock-export](https://github.com/legalnonsense/org-clock-export) and the file 'org-csv-util.el' of this repository contains my settings.
The following is my export format: each row represents a column, with functions specifying how to extract said information for each log entry. I make sure to include _position_ in the log inside the file (ie. "outline"), tags and any included notes.
```lisp
'("filename" (file-name-nondirectory (buffer-file-name))
"outline" (tn/list-to-string (org-get-outline-path t t))
"date" (concat start-year "-" start-month "-" start-day)
"tzone" (tn/get-tzone)
"start" (concat start-hour ":" start-minute)
"duration" (number-to-string (+ (* (string-to-number total-hours) 60)
(string-to-number total-minutes)))
"tags" (or (org-entry-get (point) "ALLTAGS") "nil")
"note" (tn/get-lognote))
```At export, the data looks like this, all the way down for 7184 rows, as of November 3rd 2024.
filename
outline
date
tzone
start
duration
tags
noteCalendar.org
“Projects” “Quantified-Self-Study” “Export Report”
2024-01-05
+0100
23:00
55
:2024:PRJ:
Computing matrix linear transformationCalendar.org
“Projects” “Quantified-Self-Study” “Export Report”
2024-01-05
+0100
22:15
13
:2024:PRJ:
Simplifying codeCalendar.org
“Projects” “Quantified-Self-Study” “Export Report”
2024-01-03
+0100
23:30
50
:2024:PRJ:
ExperimentingCalendar.org
“Projects” “Quantified-Self-Study” “Export Report”
2024-01-03
+0100
22:00
69
:2024:PRJ:
R Markov Chain automated improvementsI use primarily the **pandas** Python library to conduct my investigation, as its methods are particularly efficient and straightforward, in addition to supporting the necessary types. In Jupyter, the table with the data is read into a pandas dataframe, which I will use throughout my code.
# Data Cleaning
Data is first filtered and formatted for analysis: it is read as is, and the user is able to filter entries by features, which are either numerical, string lists or of set type. For example, only events happening within a specific time range, sporting a given tag or duration may be selected.
## Parsing
As the data is originally in text format, columns are converted into their proper types and added to the dataframe.
The columns affected are time stamps, which become 'datetime' objects, time durations, made into 'timedelta' objects and tags, which are converted into frozenset types. The latter is Python's immutable version of the tag object: immutability makes the data hashable, so the pandas library can filter the dataframe efficiently.
Typically, methods add results to the dataframe without overwriting existing data, unless it was generated by the function itself. For instance, calling a method will add a column the first time, but calling it again, even with different parameters, will overwrite that column.
In hindsight, I believe that a better choice would have been to produce columns and add them separately, but for the time being the code runs to a satisfactory degree.
## Event Analysis
Some simple properties of the data are generally observed as a preliminary step to analysis.
For instance, the following picture shows the relation between duration of activities and their time of beginning. This data relates to reading research paper (September to November 2024). The code allows me to filter the events, determine their labels and plot them with distinct colors.
![](assets/research_papers_scatterplot.png)
## Outlier Detection
Outlier events are identified based on deviation from the mean of their duration, which is measured in standard deviations. A critical number of standard deviations is fixed and all values that surpass the bounds are considered extreme.
The following command shows how a selection of the data (ie. boolean mask) is typically extracted: in this case I group events by outline, that is, by where the heading is placed in the file, in order to only compare similar events.
> extract_outliers_by_group_mask(df, "outline", "duration_timedelta", 3)
outline
duration (h)53
[Work, Helping-Various]
02:30331
[Projects, Thesis-Help, Thesis-Data]
02:13380
[Projects, Attimo-Personal-Clocking, Coding]
03:00577
[Learning, Series, The Boys]
04:38598
[Learning, Series, Better Call Saul]
02:30The reason to group outliers stems from the range of recordings, as some kinds are much longer than other. If one grouped all events together, activities such as sleep, being longer in average, would seem outliers and be excluded. I should note that this method is also useful to identify events that were misrecorded, helping to correct faulty data.
## Time Deltas
Outliers may also be detected from observing the distribution of time between consecutive activities. For instance, if an activity is suspended for months, it should be excluded altogether as it amounts to unbalanced and incomplete data.
![](assets/project_events_timedeltas.png)
## Utilities
The **pandas** and **matplotlib** libraries offer a variety of methods and shortcuts to filter dataframes by the values of their columns, or to visualise data quickly. As some data types I use are not standard, I wrote some methods to help with operations.
## Outline Navigation
Outlines are lists of strings which represent the position of a log inside a file. Consider the following file structure.
```
Heading 1
└─► Subheading 2
└─► Subheading 3
Log A
Log B
│
└─► Subheading 4
Log1
Log2
```
From the example, _Log A_ will have as outline: ["Heading 1", "Subheading 3"]. Specialised methods are used to select clocks based on which outline criteria they match. For example, _Log A_ and _Log B_ are under "Subheading 3" but not "Subheading 4". The following methods are used:```python
get_exact_outline_mask(df: pd.DataFrame, outline: list) -> pd.DataFrame
get_any_outline_mask(df: pd.DataFrame, outline: str) -> pd.DataFrame
get_index_outline(df: pd.DataFrame, outline: str, index: int) -> pd.DataFrame:
```The same is done for tags: events can be selected if their tags are a subset of the desired tags.
```python
get_subset_match_tags_mask(cl: pd.Series, tags) -> pd.Series:
```### Histograms
I often plot the histograms of single properties, such as duration, to better understand data.
```python
plot_histogram(df["duration"], title="Duration histogram", bins=40)
```![](assets/Duration_Histogram.png)
In the image above, one can see the relation between frequency of recording and duration of the log. The bell-looking distribution to the right is sleep, whereas events at the first peak left show personal activities and the second, lower peak corresponds to lessons, typically lasting 90 minutes.
There are also situations where one might want to visualize 2D histograms, so I coded this functionality:
![](assets/sleep2Dhist.png)
I should note that the visualisation code accounts for nonstandard types: in the following plot I compare tags of fronzenset type, to duration, of timedelta type.
![](assets/sleep2Dhisttag.png)
# Summary Analysis
As a choice of my study, I typically group logs by time and category, running the scripts only the summarised data.
> Instead of studying N events that happened in a week, I group them and only model their combined duration. This ignores their number and variation by event, instead focusing on the total effect. Doing an activity for 1 hour, 10 times, will look the same as doing it once, for 10 hours - within the same time period.
The transformation is done for practical reasons: to reduce the size of the dataset, and to make the effect of particularly long events uniform. Moreover, I automatically store the summary table to reduce running time, using the file when needed.
## Grouping in time
Logs are first grouped into discrete time chunks: the user picks a "time step size", typically 1 day, 1 week or 1 month, and all events that fall under each time period are summed into the number of minutes dedicated to each activity.
> Instead of considering N occurrences of an event in each week, I just count the total minutes dedicated to each event type per week.
This subdivision results in a summary table which is a lot smaller compared to the original data: for instance, a 7 day summary of a full year will amount to only 52 rows, from an original 3000. It should also be noted that choosing very long or very short steps will result in either too few data points or many time chunks which are occupied in full by single events. In both cases, analysis is not very indicative.
## Grouping by category
It should be noted that each event has multiple tags associated to it. Consider the following entry: it has year, type, and location as tag: 2023_2024 for the school year, _LES_ for Lessons, _GER_ as in German and _@aulae_ to refer to classroom 'e'.```
Filename, Heading, ..., Tags
University.org, "GER", ..., :2023_2024:LES:GER:@aulae:
```The main tags ones that I use are standardised:
- _SWO_, _SFR_ - sleep (with the distinction of waking up with an alarm or freely).
- _LES_, _REV_, _EXM_ - lessons, revision and exams (for university).
- _R_, _E_, _S_: revision, exercises and social (eg. revision in group)
- _BUR_, _WRK_, _TDY_ - bureaucracy (eg. documents), various work tasks, tidying up.
- _PRJ_ - time dedicated to personal projects.
- _MDI_ - media such as reading books, watching movies or series.Since events generally have multiple tags, one would want to calculate combined duration, while remaining capable of differentiating between them by tag. For example, I may want to compare only how "REV" and "LES" correlate in time, so I would need to calculate two distinct sums, without confusion between the two.
My solution is to group events by their set of tags, and compute the sum within each distinct group of tags in the dataset. This is the natural way of grouping the events without loss of information.
The process results in a summary table which has one column for each unique combination of tags. These columns are empty for most of the time, but they can be combined as needed, based on a desired merging rule.
## Merging by category
The next step is to pick the subject of analysis, which determines how the events, already grouped by tags, are merged into categories, thus reducing the number of variables.
For instance, an analysis that seeks to obtain a complete understanding of how all activities interact in the agenda will combine them uniformly (ie. "standard" in the following example). However, a targeted analysis, such of university study may split some tags into multiple categories. For instance, "REV" is split into its "R", "E" and "P", differentiating between revision, exercise and university projects. In both cases, grouping is justified because there is very little overlap in how tags are assigned, which prevents double counting. In the code, a tag_tree dictionary is used as a simple way to store merging rules.
```python
tag_tree = {
"standard": {
"Sleep": ["SWO", "SFR"],
"Lessons": ["LES"],
"Revision": ["REV", "EXM"],
"Repetitive": ["BUR", "WRK", "TDY", "ORG", "REP"],
"Projects": ["PRJ"],
"Media": ["MDI"],
"Social": ["CAL", "OUT", "EVE", "DOG"],
},"study": {
"Theory": ["R"],
"Exercise": ["E"],
"Projects": ["P"],
"Exams": ["EXM"],
"Lessons": ["LES"],
}}
```Following a two-step process may seem inefficient, as events are first merged by tag, and in a second moment combined into a single group. However, this makes it possible to cache results, running multiple analyses from the same summary table.
Interestingly, merging tags represents a change in paradigm: the user decides which set of tags should be counted in the same basket, and which represents different objects. Depending on perspective, merging could change.
## Methods
In the next section, I discuss the practicality and insight from using different methods to analyse the data.
It is important to note that events and summaries have different properties and distributions. More precisely, the summary table is composed of rows with multiple data columns, as shown by the following picture, with histograms of how each column is spread.
![](assets/sparsity_university_histograms.png)
### Sparsity
Something that I found surprising when first looking at the data is how sparse activities tend to be: I always thought that I would do multiple activities, every day, but for some categories such as those relating to university, most days do not involve any activities.
![](assets/sparsity_standard_histogram.png)
The chart above, taking 1 day as the step size, sorts activities by how rarely they occupy days. The first column indicates that nearly 70% of days go without lessons. Naturally, this visualisation is susceptible to which time period is considered: in this case all of 2023, and 2024 up to November, overrepresenting summer days.
### Outliers
There is a constant effort to verify the existence of outliers in the data, as forgetting to insert data or 'over correcting' missing values may produce irregularities, and skew analysis.
The standard z-score normalisation is used to verify the presence of outliers, and can help in understanding the nature of the data. The following table gives an example for a typical output. Here, values are written as minutes, and it is clear which are plausible and which are highly irregular.
max
from time
to time
Theory
Exercise
Projects
Exams
Lessons0
Theory
2024-01-27
2024-01-28
465
45
0
0
01
Exercise
2024-01-30
2024-01-31
95
365
0
25
02
Projects
2024-05-17
2024-05-18
0
39
489
0
903
Exams
2024-01-23
2024-01-24
385
10
0
183
04
Lessons
2024-04-18
2024-04-19
0
0
0
0
600### Normalisation
In order for some data analysis methods to work properly, numeric data is extracted from the summary table, then each column is normalised making the mean 0 and the standard deviation 1 (z-score).
The **Pandas** library allows to work on a copy of table, then use the new indices in filtering the old table, helping to merge results to their computed labels. This process is often done in the following sections.
### Clustering
Consider how each data point in the summary table represents a combination of total times dedicated to each activity within a fixed timeframe:
> Two data points of step-size '1 day' represent each a combination of times dedicated to every activity. A 'distance' can be defined between them, to record their relative difference. More complex operations can be built on this, to obtain interesting results.
Based on the relative 'similarity' between rows of the summary table, it is possible to group data points into 'clusters'. This can help to interpret different behaviours in the time-steps, such as _productive_ as opposed to _not productive_ during the university period.
It should be noted that clustering is not affected much by problem in the data that stump methods such as regressions, for example collecting data unevenly.
After choosing the number of clusters, I prefer to check the relative size of the groups to ensure that the split is somewhat even. It is possible for single distant points to be assigned their own group, which is not useful in analysis.
The following gives the spread of 5 clusters over the [standard](#merging-by-category) grouping.
+ Cluster 0 :: 56%, 297 samples
+ Cluster 1 :: 5%, 29 samples
+ Cluster 2 :: 23%, 120 samples
+ Cluster 3 :: 10%, 51 samples
+ Cluster 4 :: 6%, 34 samplesA **dendrogram** is sometimes used to get a sense of the relative shape of the clusters: if the tree is balanced (eg. the green one), the cluster is somewhat 'compact', as opposed to having 'tails' or spikes (eg. the red cluster).
![](assets/hierarchical_standard_dendrogram.png)
I have found that **Agglomerative Clustering** works best in avoiding such extreme cases. In addition, varying the number of clusters can be useful to divide numerous groups into more precise categories depending on the number of points available.
Some automatic methods were also developed to interpret the nature of each cluster:
- _Computing the group mean for each category_: one can calculate the 'average' point in each cluster to get a sense of some typical values. This assumes that the group is convex, which tends to be almost always satisfied in the data, although it is not guaranteed by **Agglomerative Clustering**.
- _Identifying how each cluster stands out from the others_: given the mean value of each coordinate in a cluster, one can identify extreme values and automatically produce labels containing 'high' and 'low' qualities, using a z-score method.Expanding on the previously mentioned 5 clusters, the following table shows their noteworthy features, taken at a precision of at least _n_=1 standard deviation. The number _n_ is chosen by hand to regulate the number of noteworthy features: as higher numbers will exclude all but the most outstanding qualities.
High
Low0
[’Projects’]
[]1
[’Repetitive’]
[]2
[’Sleep’, ’Revision’, ’Media’]
[’Projects’]3
[’Lessons’]
[’Media’]4
[’Social’]
[’Sleep’]From the table, it seems that if one considers 5 groups of behaviour, _Projects_ and _Repetitive_ tasks (eg. going to the Post Office) don't reduce significantly the time dedicated to other activities, as opposed to _Lessons_, which reduce _Media_ consumption, for example.
Clustering has the advantage of being nonparametric and nonlinear, making it effective at modelling qualitative properties, though this comes at the cost of not explaining quantitatively how the data is spread.
### Distributions in Time
Having identified clusters and their labels, I also like to visualise them in relation to relevant time frames.
For example, fixing 5 clusters over the [standard](#merging-by-category) division of tags, I produce histograms of how 'days' are spread throughout: _Project_ days focus on Sundays, whereas _Social_ days are preferred on Saturday and Tuesday.
![](assets/standard_week_distribution.png)
The same can be done over the months, which confirms the correctness of the labels.
![](assets/standard_month_distribution.png)
### PCA
[Principal Component Analysis](https://youtu.be/FD4DeN81ODY) is an elementary technique to reduce the number of variables needed to represent the data. It is useful to both visualise and understand datasets, assuming they are 'simple' enough.
More technically, PCA identifies the main directions in which the data points are 'spread' - you could imagine the whole set of data points as a cloud, with PCA looking for its principal axes in decreasing order of relevance. After finding these directions, the data is drawn in terms of them (ie. as a linear combination of a basis of feature vectors), also converting to this new format.
![](assets/PCA_standard_variance.png)
When applying PCA on data with _n_ variables, _n_ principal vectors are found in decreasing order of importance. From the graph, one sees that using only the first three components to describe the data, a high 67% of the variance (spread) is still explained (red line), which makes the reduction useful for some purposes like visualisation. This means that the data will be reduced from _n_ = 8 variables to 3 variables, obtained from a matrix transformation of the original data.
The following image is a 3D plot of PCA applied to the [5 clusters](#clustering) from the previous section. As 67% of total variance is maintained, it is a good representation of the true distribution of the data in the original 7-dimensional space of the data. The number 7 comes from the columns specified by the [standard](#merging-by-category) merging rule. The new plot shows each point in terms of the three new PCA component vectors. Moreover, the labels are obtained from the [cluster means method](#clustering) in the previous section.
![](assets/3d_PCA_standard.png)
PCA is particularly useful if the data is 'simple' enough (ie. features are explained linearly), because it means that points may be expressed as a weighted sum of properties.
Here, one can associate to PC1 the notion of 'doing more revision', with PC3 social occasions and PC2 'projects' and 'repetitive tasks'. With this method, 60% of a day's description is the sum of the three.
### Markov Chains
A _Transition Matrix_ is a square table of probabilities that measure the rate of transition from some state [i] to a state [j] in some time period. In my [case](#clustering), I can construct a matrix which summarises the likelihood of switching from a day of 'University' activities to one of 'Projects' or 'Social'.
Using a fixed step size, it is possible to estimate the probability of going from cluster A to B by adding up all the times a transition happened. This method is used to construct the coefficients of the matrix.
![](assets/standard_transition_matrix.png)
The object may be observed directly, or it can be interpreted as a [Markov Chain](https://en.wikipedia.org/wiki/Markov_chain) process. More specifically, a transformation that acts on a vector of probabilities (ie. where entry i represents the probability of being in the cluster i) and produces a new vector representing the probabilities at the next time step.
Since the probabilities are calculated numerically, one can assume that the matrix is regular enough (ie. ergodic), and compute the eigenvector associated to eigenvalue 1, which will give the stable point probability vector.
Coming back to the real world, this method takes a given allocation of clusters and predicts the distribution in the next moment. Consider the problem of finding the point of optimal productivity. If you push yourself beyond it, your productivity in the next day will be lower. The following text gives the division of tasks that can be maintained indefinitely - according to my routine during the university week.
```
Stable Distribution:
Projects - 0.30443171,
Repetitive - 0.07481842,
Revision - 0.30414416,
Lessons - 0.21057073,
Social - 0.10603498.
```### Correlations
Moving on, given that clustering does not give a precise sense of how different categories relate, correlation matrices are computed for each component of the data, giving a finer notion of interactions between columns in the summary table.
A correlation matrix is a square table of correlation coefficients, representing how two variables tend to agree in size, rated from -1 to +1, depending on whether the value of one tends to be the negative of the other, the same, or totally unrelated, if 0.
The following picture gives a sense of how different activities are correlated: a red value represents that the two activities tend to be high at the same time, whereas a blue value refers to one activity being high when the other is low. In either case, one should observe that correlation tends to be low (0.05), which should be attributed to a lot of hidden variables and unpredictability affecting the result.
![](assets/standard_correlation_matrix_comparison.png)
I experimented with using more kinds of correlations: Pearson, which is susceptible to outliers, and Kendall, more robust to extreme cases. This can be seen in _Sleep_ and _Revision_ appear drastically different due to the presence of outliers; ie. points with either a lot of lessons or a lot of revision, which skew the whole statistic. This suggests that **Kendall Correlation** is more reliable.
In my study of the correlations, I also observed correlations between present and future values, answering the question, "If I do a lot of one thing now, how much more do I do another thing later, on average?". One weakness of this method is to be symmetric, not distinguishing between high now and low later and low now and high later. Still, it is interesting to see some interpretable results: a lot of _projects_ will reduce _media consumption_.
![](assets/kendall_correlation_standard.png)
### Energy Function
I eventually asked myself whether combinations of activities in a time step could be described in terms of an 'energy budget'. For instance, doing 'costly' activities such as _Revision_ would consume the energy, reducing it for other costly activities such as _Projects_. Then, regenerative activities such as _Media_ consumption would add energy.
I considered the simplest model: using a vector of costs _c_, define energy of a vector $x$ of activities in the time step as $E(x) := c\cdot x$. The choice of optimal $c$ would result from the minimisation of the mean spread of the energy value, over all recordings:
$$c := \text{argmin}_{c \in \mathbb{R}^n} \mathbb{E}\left[\text{Var}(c \cdot x)\right]$$
The reasoning behind this formula is that the 'Energy' value should be as close to constant as possible (it would not be a good definition otherwise). Moreover, depending on time of the year, it is reasonable to assume that energy dynamics change. For example, what is typically done to rest or for fun during the exam period is fundamentally different from summer of during lessons, so any energy calculations should be contextualised.
The following heatmap proposes multiple cost vector for the university period, ranking them by variance and showing the histogram for $c\cdot x$ at the right.
![](assets/lessons_standard_energy_bars.png)
At a fundamental level, it is unclear to pick the best cost vector because the true uncertaintay of the data is not known. This is also related to linear models performing poorly on the data: there is too much randomness in the features and how they are distributed, not to mention the problem of distinguishing between 'energy' and 'time available'.
The following matrix shows the best cost vectors by time period. It should be noted that sign has no absolute meaning, as vectors $c$ and $-c$ produce the same result.
![](assets/standard_energy_comparison.png)
On a final note, energy methods fail to capture nonlinear effects: activities twice as long will be considered twice as 'costly'. There is also a deep ambiguity in what is generally done and what is exhausting to do: just being signed up for class does not guarantee paying attention, hence being drained by the energy expense.
# Extensions
In its current state, the code has a lot of useful features that can be used to analyse the data, or as a basis for other data analysis methods, but it requires interactive development within Jupyter.
I had plans to bundle the code into a flexible script that would allow users to 'order' a certain output, whether data, a report, or a visual graph. The program would identify all the intermediate steps needed to compute the results, and save them to memory, while generating the output, to reduce its average running time.
Although the idea was scrapped due to its complexity and time requirements, I would still like to revise the project in the future. For instance, I would like to have a script that automatically compiles yearly reports on my habits and productivity.
These days, most of my time is spent developing [Attimo](https://github.com/quercia-dev/Attimo/), a free and open source productivity tool.