https://github.com/max607/decision-theory23

Last synced: 11 months ago
JSON representation
Host: GitHub
URL: https://github.com/max607/decision-theory23
Owner: max607
License: gpl-3.0
Created: 2023-04-17T07:39:35.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-07-22T11:37:55.000Z (almost 2 years ago)
Last Synced: 2024-07-22T13:56:08.575Z (almost 2 years ago)
Size: 37.8 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          

[TOC]

# Preliminary

## Orga

* TA $\rightarrow$ 23.05., CJ $\rightarrow$ 05.06.

* We can have access to old videos (recorded Zoom lecture)

* Monday 10:05 -- 11:45

* Tuesday 17:05 -- 19:45 + break

* Timeseries may be interesting

## Learning objectives

* Learn to design and solve decision problems

* Get an outside perspective on / widen the horizons of "common statistics and machine learning"

  * Frame it into decision problems

  * Reflect on the notion of uncertainty

# Introduction

* Choose an action out of many \

  $\rightarrow$ Consequences (Utility / loss, depend on an unknown nature) \

  $\rightarrow$ Optimize

* Unknown $\rightarrow$ uncertainty (statistics yay)

  * Also data

  * And losses

* Interdisciplinary importance

  * Rational choice theory

  * Uncertainty in AI, expert systems, decision support systems

  * Decision theory

## Rational choice

* Macro-level by aggregation of individual actions

* Goal-oriented rational actor

* Game theory

* In sociology: e.g., Coleman: Foundations of Social Theory

* Modern: bounded rationality $\rightarrow$ behavioural economics (Kahneman & Tversky 1979, Nobel prize 2002)

* Munich Center for Mathematical Philosophy (MCMP)

## Uncertainty in AI

* Expert systems

* Uncertainty in AI (based on name of a group)

* E.g., portfolio management

* How would an expert behave? Model that

* Make expert knowledge widely available

* Paper on autonomous driving systems (ADS)

* Two types of uncertainty

  * Parameter uncertainty - epistemic uncertainty

  * Prediction uncertainty

* MYCIN $\rightarrow$ methodological consequences

  * Uncertainty factors

  * Comparative probabilities

  * $A \cup B$ is likely, but we don't know which

  * $\mathbb{P}(A \cup B)$ high

  * $\mathbb{P}(A) + \mathbb{P}(B):$ where is the mass?

  * Probabilities are based on sets $\rightarrow$ move to fuzzy sets 

# Missed

$\rightarrow$ slides 34 -- 79

* What this is: $(\mathbb{A}, \Theta, u(\cdot))$

* Type I vs type II uncertainty

## Semantics of the data-free decision problem

* $(\mathbb{A}, \Theta, u(\cdot))$

* $\mathbb{A}$ is known

* $\Theta$ is known (closed world assumption)

* Consequences $c(\alpha, \vartheta), \alpha \in \mathbb{A}, \vartheta \in \Theta$ are unique

* ...

* "Marxist Group" -- Wer war das?

* Decisions theory to handle non-linearity of money

  * 1000€ for an average person vs. football player

* Difficult: act-state independence (e.g., more rockets, war)

# Examples

* Hiking

* Lotto

* Cake

* Investment

* Production planing

  * $a = \left( \begin{matrix} a(1) \\

                               \vdots \\

                               a(q) \\

                \end{matrix} \right)$

  * $a(\ell)$: Production of $a(\ell)$ units of good $\ell, \ell = 1, ..., q$

  * $u: \mathbb{A} \times \Theta \rightarrow \mathbb{R}$

  * $(\alpha, \vartheta) \mapsto u(\alpha, \vartheta)$

* TODO slides 96 -- 98

* Slide 103 $\rightarrow$ nice tutorial paper

# Randomized actions - mixed extension

* Hardcore definition

* Utility under fixed state-of-nature is the expectancy w.r.t. the actions

* $\mathbb{P}(a) = \delta(a_\ell) \rightarrow$ pure action

* Discussion of randomized actions $\rightarrow$ next Monday (a.k.a. problem sheet)

# Deciding with data

* Observation of random experiment (sampling, asking an expert, ...)

* Decision functions of sample

  * $d: \mathcal{X} \rightarrow \mathbb{A}$

  * $x \mapsto d(x)$ \

    $\rightarrow$ evaluate decision functions

  * $p_{\vartheta_1}(\{x_1\}) = p(\{x_1\} || \vartheta_1)$

  * $||:$ frequentist: $\vartheta_1$ is not random!

  * TODO statistics and constructivism

* Now: randomized actions, but the randomness is from sampling given true data

  * Fix strategy before seeing the data. E.g., tempting in the case of searching for new medicine

  * $U(d^+, \vartheta_j) := \sum_{l = 1}^s u(d^+(x_\ell), \vartheta_j)\ p_{\vartheta_j}(\{x_\ell\})$

* Data-based decision problem

  * $((\mathbb{A}, \Theta, u(a)), (\mathcal{X}, \sigma(\mathcal{X}), p_\vartheta))$

  * $a \in \mathbb{A}, \vartheta \in \Theta$

  * Need $\vartheta$ two times: for the utility function and for the probability for actions

* Common to use loss - so risk notation $\rightarrow$ similar to machine learning

* Data-free problem can be induced by data $\rightarrow$ just a special case

* Formally every decision problem can be written as a data-free decision problem

# Re-framing estimation and testing

* Estimation part on Monday

* Testing part on Tuesday

* From slide 141

# Decision rules and principles

* Rules give optimal decisions, but only under certain conditions

* Principles: only avoid obviously dumb actions

  * One curve is completely under another $\rightarrow$ principle

  * Curves overlap $\rightarrow$ aggregate by rule (loose information, make a lot of assumptions)

* Meta rules: interesting paper, decision rules for decision rules 

## Order theory

* Is about $<, >, \leq, \geq, ...$

* Easy: $\left( \begin{matrix} 1 \\ 2 \\ \end{matrix} \right) \overset{?}{<} \left( \begin{matrix} 2 \\ 3 \\ \end{matrix} \right) \rightarrow$ principle

* Problem: $\left( \begin{matrix} 1 \\ 2 \\ \end{matrix} \right) \overset{?}{<} \left( \begin{matrix} 2 \\ 1 \\ \end{matrix} \right) \rightarrow$ rule

* Pareto front

* Dominance: not reasonable to choose an inadmissible action

* Admissible: action, which is not strictly dominated by any action

* Admissibility of a certain action can be lost when moving to mixed extension

* Mixed extension makes Pareto front smooth (Pareto front: smallest set of admissible actions, relation to complete cases)

* Randomization: decision based on information-less data

```r

# Example Pareto front

plot(c(7, 6, 6, 4, 1, 2, 6, 6.5), c(3, 5, 5, 4, 6, 5, 4, 3.5))

lines(c(1, 6, 6.5, 7), c(6, 5, 3.5, 3))

```

## Complete classes

* $\forall a \in \mathbb{A} \setminus \mathbb{C}\ \exists a^* \in \mathbb{C}: a^* \succ a$

* Set $\mathbb{C}$

## Optimality criterion

* Lexicographic order: Like in a dictionary (aa, ab, ba, ...)

  * q-step optimality criterion $\rightarrow$ makes an odering, rule: take the best

## Exercise 2

### Problem 3

| decision theory           | estimation theory                            | testing theory                                                                                                                     |

|---------------------------|----------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|

| state space $\Theta$      | parameter space                              | $\{H_0, H_1\}$                                                                                                                     |

| state $\vartheta$         | true parameter                               | $H_i$                                                                                                                              |

| action space $\mathbb{A}$ | all possible estimates                       | (don't) reject $H_0$                                                                                                               |

| loss function             | negative log-likelihood, distance measure    | decide for a vier-Felder-Tafel                                                                                                     |

| **data based**            |                                              |                                                                                                                                    |

| decision function         | estimator, $\hat{\vartheta}(Z_1, ..., Z_n)$  | $d(T(\bm{Z}) = t) = \begin{cases} H_1,& \text{if } t \in \mathbb{C}(q_{\Upsilon_0}, \alpha) \\ H_0,& \text{otherwise} \end{cases}$ |

| risk function             | see below                                    | $\mathcal{R}(d, H_i) = \mathbb{E}_X(\mathcal{L}(d(X),H_i))$                                                                               |

| information structure     | $\mathcal{X}$, $\sigma$-field, $p_\vartheta$ |                                                                                                                                    |

* $Z_i \overset{iid}{\sim} N(\mu, \sigma^2)$

* $\mathcal{X} = \mathbb{R}^n$

* $p_\vartheta$: How are the data distributed? $\rightarrow p_\vartheta^{\otimes n} \overset{ind}{=} \prod_{i = 1}^n q_\vartheta(Z_i = z_i)$

* risk function: $f(\text{decision function}, \text{true parameter})$

* $\mathcal{R}(\hat{\vartheta}(Z_1, ..., Z_n), \vartheta) = \mathbb{E}_{p_\vartheta}(\mathcal{L}(\hat{\vartheta}(Z_1, ..., Z_n), \vartheta))= \text{MSE}_\vartheta(\hat{\vartheta}(Z_1, ..., Z_n), \vartheta) \overset{unbiased}{=} \mathbb{V}_{p_\vartheta}(\hat{\vartheta}(Z_1, ..., Z_n))$

* Uniformly optimal procedure?

  * $\mathcal{R}(\hat{\vartheta}^*(Z_1, ..., Z_n), \vartheta) \le \mathcal{R}(\hat{\vartheta}(Z_1, ..., Z_n), \vartheta), \forall \hat\vartheta \in \mathcal{D}, \vartheta \in \Theta$

  * Impossible, e.g., for $\text{MSE}$: choose $\hat\vartheta \equiv \vartheta, \forall \vartheta \in \Theta$

### Problem 6

```r

plot(c(7, 6, 6, 4, 1, 2, 6, 6.5), c(3, 5, 5, 4, 6, 5, 4, 3.5))

points(c(6.5, 4.6), c(4, 4.1), col = "red")

lines(c(1, 6, 6.5, 7), c(6, 5, 4, 3))

lines(c(1, 2, 4, 7), c(6, 5, 4, 3))

```

# Bayes 

In a Bayesian setting randomization does not pay out

* because we have an equivalent pure action a, for every randomized action $\tilde{a}$

Are reversely admissible actions also Bayes actions?

* Bayes Action $\rightarrow$ admissible does not hold if regularity conditions do not hold

* What are regularity conditions

Discussion between subjectivity and objectivity

* Prior distribution might be subjektiv

* still if you don't agree with the prior, you have an admissible solution for the decision problem

* super-objectivists are in some sense subjective as well

Probability is a degree to which something can be proven.

Propensity concept (Popper):

Probability allows inference concepts

Critique on frequentist: What are similar cases $\leftarrow$ That's subjective

## Text

* Probability is a property of the subject not of the object: In which way is the subject uncertain?

* It's not important if there is randomness

* Random numbers are deterministic, because they are constructed in a deterministic way.

* $\rightarrow$ If we assign Probabilities we are all subjectivists in definition

* "I can not criticize the bases/assumptions, i can only criticize the consistency"

* In Bayes probabilities are assigned to parameters, because we are uncertain about them

## Laplace criterion

* all that holds for Bayesian, holds for Laplace, too (randomization does not work, admissibility)

# Regret

* Always a loss problem -- weather one starts with utility or loss

* Minimax action with transformed utility function

* Doesn't make that much sense, but kind of models consumer behavior

# Hurwicz criterion, Hodges-Lehmann

* Convex combination: $\Phi_\alpha(x, y) = \alpha x + (1 - \alpha) y$

* Sensitivity analysis w.r.t. $\alpha$ possible
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/max607/decision-theory23

Awesome Lists containing this project

README