A wee bit of theory

(An) Implementation of the RSA

A learning case

Four worlds, many messages

Non fully informed – with a “?”

Four wor(l)ds

All possibilities

IIFD – no question

IIFD – artificial eroteticity

Cost plot

Heat map – square one

Heat map – priors

Can we play with numbers?

Moving on to Question Under Discussion (QUD)

Tools

A wee bit of theory

(An) Implementation of the RSA

A learning case

Four worlds, many messages

Non fully informed – with a “?”

Four wor(l)ds

All possibilities

IIFD – no question

IIFD – artificial eroteticity

Cost plot

Heat map – square one

Heat map – priors

Can we play with numbers?

Moving on to Question Under Discussion (QUD)

Tools

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{1}\)

S\({}_{1}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{30}\)

S\({}_{30}\)

L\({}_{10}\)

S\({}_{10}\)

Many thanks to Benjamin Spector

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{1}\)

S\({}_{1}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{10}\)

S\({}_{10}\)

The parameters

L\({}_{30}\)

S\({}_{30}\)

L\({}_{10}\)

S\({}_{10}\)

To do. Or better, refer to someone more competent (see previous title).

Let:

\(P : w \rightarrow [0,1]\) the prior distribution on possible worlds.
\(c : u \rightarrow \mathbb{R}\) a cost function on utterances.
\(Lex : (u,w) \rightarrow 1\) iff \(u\) is true at \(w\), \(0\) otherwise
\(\lambda \in \mathbb{R}^+\) a free energy temperature parameterThe higher the lambda, the more rational the speaker/listener

We define by recursion over \(n \in \mathbb{N}\) following triplet \(S, L, U\)Speaker, Listener, Utility

where \(L_0\) is the naïve listener.

\[ \begin{eqnarray} L_0&(w&\mid& u) &\propto& P(w)Lex(u,w) \\ U_{n+1}&(u&\mid& w) &=& \log(L_n(w\mid u)) - c(u) \\ S_{n+1}&(u&\mid& w) &\propto& e^{\lambda U_{n+1}(u\mid w)} \\ L_{n+1}&(w&\mid& u) &\propto& P(w)S_{n+1}(u\mid w) \end{eqnarray} \]

Remarks about the code:

A single recursive function handles both the speaker and the listener
The structure distinguish the naïve listener, hard coded, and furthers listeners/speakers that are computed, hence the immediate if
There’s an helper function to avoid passing the whole universe around. So the main function gets them but the recursive calls only have the “level” so as to avoid a huge call stack
I’m hiding some external functions here

Apart from that, thanks to R syntactic sugar, the implementation follows very closely the definitions.

Things we could to improve this if need be:

Make Sn and Ln an external array so that it doesn’t get passed up though the call stack. This is a quick modification, will do if it gets slow
Optimise normCol and normRow, these ones are dumb-ish right now
I’m open to suggestions?

Here be the code:

L <- function(priors,logic,cost,temp,n) {

  Ln <- normRow(logic * (1/sum(logic))) # L0

  for (i in 1:n) {
    Sn <- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
    Ln <- normRow(t(t(Sn) * priors))            # Def of Ln
  }

  list(Ln, Sn) # Returns the (Listener, Speaker) pair

}

I will only show how to call the function once. Then I’ll hide this.

worldNames <- c("∃¬∀", "∀")
some       <- c(1    , 1  )
all        <- c(0    , 1  )
justSome   <- c(1    , 0  )
lexNames   <-            c("some", "all", "justSome")
cost       <-            c(0     , 0    , 1         )
logic      <- t(data.frame(some  , all  , justSome  ))
priors     <- rep((1/length(worldNames)), length(worldNames)) # Flat
temp       <- 1
iter       <- 10

names(cost)     <- lexNames
colnames(logic) <- worldNames

list[Ln, Sn] <- L(priors,logic,cost,temp,iter)
knitr::kable(t(Ln))

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

cost        <- c("cost", cost)
names(cost) <- c("Lex", lexNames)

Key points:

Worlds: \(\{∃¬∀, ∀\}\)
Lexicon: \(\{some, all, justSome\}\)
\(\lambda = 1\)
Flat priors
Costs:

Lex	some	all	justSome
cost	0	0	1

Truth table:

	∃¬∀	∀
some	1	1
all	0	1
justSome	1	0

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{∅, A¬B, B¬A, A∨B\}\)
Lexicon: \(\{A, B, AorB, AnotB, BnotA, notA, notB, neither\}\)
\(\lambda = 5\)
Flat priors
Costs:

Lex	A	B	AorB	AnotB	BnotA	notA	notB	neither
cost	0	0	1	1	1	1	1	1

Truth table:

	∅	A¬B	B¬A	A∨B
A	0	1	0	1
B	0	0	1	1
AorB	0	0	0	1
AnotB	0	1	0	0
BnotA	0	0	1	0
notA	1	0	1	0
notB	1	1	0	0
neither	1	0	0	0

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{?, ∃¬∀, ∀\}\)
Lexicon: \(\{some, all, someNotAll\}\)
\(\lambda = 2\)
Flat priors
Costs:

Lex	some	all	someNotAll
cost	0	0	2

Truth table:

	?	∃¬∀	∀
some	1	1	1
all	0	0	1
someNotAll	0	1	0

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{∅, ∃¬∀, ∀\}\)
Lexicon: \(\{no, some, all, notAll\}\)
\(\lambda = 4\)
Flat priors
Costs:

Lex	no	some	all	notAll
cost	0	0	0	1

Truth table:

	∅	∃¬∀	∀
no	1	0	0
some	0	1	1
all	0	0	1
notAll	1	1	0

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{∅, ∃, ∃¬∀, ∀, ¬∀, ∅∨∀\}\)
Lexicon: \(\{no, some, someNotAll, all, notAll, NoneOrAll\}\)
\(\lambda = 2\)
Flat priors
Costs:

Lex	no	some	someNotAll	all	notAll	NoneOrAll
cost	2	2	4	2	3	4

Truth table:

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{C, BC, AC, ABC, AB\}\)
Lexicon: \(\{a, b, c, ab, ac, bc, abc\}\)
\(\lambda = 1\)
Flat priors
Costs:

Lex	a	b	c	ab	ac	bc	abc
cost	0	0	0	1	1	1	2

Truth table:

	C	BC	AC	ABC	AB
a	0	0	1	1	1
b	0	1	0	1	1
c	1	1	1	1	0
ab	0	0	0	1	1
ac	0	0	1	1	0
bc	0	1	0	1	0
abc	0	0	0	1	0

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

Key points:

Worlds: \(\{AB?, C?\}\)
Lexicon: \(\{a, b, c, ab, aorc, borc\}\)
\(\lambda = 1\)
Flat priors
Costs:

Lex	a	b	c	ab	aorc	borc
cost	0	0	0	1	1	1

Truth table:

	AB?	C?
a	1	0
b	1	0
c	0	1
ab	1	0
aorc	1	1
borc	1	1

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

L(AB?|{a,b}) vs L(AB?|{ab}) as a function of cost

Key points:

Worlds: \(\{∃¬∀, ∀\}\)
Lexicon: \(\{some, all, justSome\}\)
\(\lambda \in [0,10]\)
Flat priors
Costs: \([0, 0, \alpha]\text{ for }\alpha\in[0,10]\)
Truth table:

	∃¬∀	∀
some	1	1
all	0	1
justSome	1	0

Iterations: \(100\)

Generate data:

st      <- 0.1
mi      <- 0.1
ma      <- 2
lambdas <- seq(mi,ma,st)
costs   <- seq(mi,ma,st)
size    <- length(costs)

Llist <- matrix(rep(NA, size*size), nrow=size)

Llist <- foreach(i=1:size, .combine='rbind') %dopar% {
  foreach(j=1:size, .combine='c') %dopar% {
    list[Ln, Sn] <- L(priors,logic,costs[i]*cost,lambdas[j],iter)
    Ln[1,1]
  }
}

df = data.frame(lambda=rep(lambdas,each=size),
                cost=rep(costs,size),
                p=as.vector(Llist))

ggplot(df,aes(x = lambda, y = cost)) +
  scale_fill_gradientn(colours=cbPalette) +
  geom_raster(aes(fill = p), interpolate = TRUE) +
  coord_equal()

Heat map of (lambda, cost) -> L(∃¬∀|‘some’)

Key points:

Worlds: \(\{∅, ∃¬∀, ∀\}\)
Lexicon: \(\{no, some, all, justSome\}\)
\(\lambda \in [0,7.5]\)
Costs: \([0, 0, 1]\)
Truth table:

	∅	∃¬∀	∀
no	1	0	0
some	0	1	1
all	0	0	1
justSome	0	1	0

Iterations: \(500\) Generate data:

lambdas <- seq(0.05,7.5,0.02)
prior   <- seq(0.05,0.95,0.02)

start_time <- Sys.time()
Llist <- foreach(i=1:length(lambdas), .combine='rbind') %dopar% {
  foreach(j=1:length(prior), .combine='c') %dopar% {
    remain <- (1 - prior[j])/2
    priors <- c(remain, remain, prior[j])
    list[Ln, Sn] <- L(priors,logic,cost,lambdas[i],iter)
    Ln[2,2]
  }
}
end_time <- Sys.time()

Order of magnitude for what is computed:

Total \(L_n\) computations: \(3,730,000\)
Data points: \(7,460\)
Elapsed time
- \(2.20711203018824\) minutes,
- \(0.295859521472955\text{ms}/\text{data point}\)
- \(0.591719042945909\text{ns}/\text{iteration}\)

df = data.frame(lambda=rep(lambdas,length(prior)),
                prior=rep(prior,each=length(lambdas)),
                p=as.vector(Llist))

ggplot(df,aes(x = lambda, y = prior)) +
  scale_fill_gradientn(colours=cbPalette) +
  geom_raster(aes(fill = p), interpolate = TRUE)

Heat map of (lambda, prior) -> L(∃¬∀|‘some’)

l <- 15
worldNames <- c(seq(1,l))
lex <- matrix(1, l, l)
lex[lower.tri(lex)] <- 0
lexNames   <- worldNames
cost       <- c(log(seq(1,l)))/10
logic      <- t(data.frame(lex))
priors     <- rep((1/length(worldNames)), length(worldNames))
temp       <- 1
iter       <- 10

names(cost)     <- lexNames
colnames(logic) <- worldNames

list[Ln, Sn] <- L(priors,logic,cost,temp,iter)

cost2        <- c("cost", signif(cost, digits=2))
names(cost2) <- c("Lex", lexNames)

Key points:

Worlds: \(\{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\}\)
Lexicon: \(\{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\}\)
\(\lambda = 1\)
Flat priors
Costs:

Lex	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
cost	0	0.069	0.11	0.14	0.16	0.18	0.19	0.21	0.22	0.23	0.24	0.25	0.26	0.26	0.27

Truth table, an at least fashion:

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
X2	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0
X3	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
X4	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0
X5	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0
X6	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0
X7	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
X8	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0
X9	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
X10	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0
X11	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0
X12	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0
X13	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0
X14	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0
X15	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1

	some	all	justSome
∃¬∀	0.81	0	1
∀	0.19	1	0

	some	all	justSome
∃¬∀	0.69	0.00	0.31
∀	0.16	0.84	0.00

To do:

\(q\) such that \(\forall w \in W, q_a(w)\) returns the equivalence classes of \(w\) in \(W\) with respect to the answer for question \(a\), i.e. all the \(w \in W\) that answer the same question with regard to \(a\).
Implement this in the strategies

Here’s an attempt:

L <- function(priors,logic,cost,temp,n) {

  Ln <- normRow(logic * (1/sum(logic))) # L0

  for (i in 1:n) {
    Sn <- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
    Ln <- normRow(t(t(Sn) * priors))            # Def of Ln
    # This should be replaced with an update that updates all elements of the same equivalence class for a given question.
  }

  list(Ln, Sn) # Returns the (Listener, Speaker) pair

}

Here are some functions/options used up there:

# Let's not print too many digits for the human eye
options(digits=2)

normCol = function(m) { # Normalises on columns.
  t(t(m)/colSums(m))
}

normRow = function(m) { # Normalises on rows. This was benchmarked.
  m/rowSums(m)
}

# https://stat.ethz.ch/pipermail/r-help/2004-June/053343.html
# See below for a "why"
list <- structure(NA,class="result")
"[<-.result" <- function(x,...,value) {
   args <- as.list(match.call())
   args <- args[-c(1:2,length(args))]
   length(value) <- length(args)
   for(i in seq(along=args)) {
     a <- args[[i]]
     if(!missing(a)) eval.parent(substitute(a <- v,list(a=a,v=value[[i]])))
   }
   x
}

# Allows returning/matching tuples from functions:
useless   <- function() { list(a = 2, b = 6) }
list[a,]  <- useless() # sets a to 2, ignores "b"
list[,b]  <- useless() # sets b to 6, ignores "a"
list[c,d] <- useless() # sets c to 2 and d to 6

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
X2	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0
X3	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
X4	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0
X5	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0
X6	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0
X7	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
X8	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0
X9	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
X10	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0
X11	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0
X12	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0
X13	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0
X14	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0
X15	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
X2	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0
X3	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
X4	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0
X5	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0
X6	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0
X7	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
X8	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0
X9	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
X10	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0
X11	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0
X12	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0
X13	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0
X14	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0
X15	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1

someNotAll

	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
X1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
X2	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0
X3	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0
X4	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0
X5	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0
X6	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0
X7	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0
X8	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0
X9	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
X10	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0
X11	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0
X12	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0
X13	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0
X14	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0
X15	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1