## Loading required package: iterators
## Loading required package: parallel
Grab the source of this file here
Many thanks to Benjamin Spector
His seminar on RSA
introduced me to this and he provided real-time feedback on my attempts to implement it.
A wee bit of theory
To do. Or better, refer to someone more competent (see previous title).
Let:
- \(P : w \rightarrow [0,1]\) the prior distribution on possible worlds.
- \(c : u \rightarrow \mathbb{R}\) a cost function on utterances.
- \(Lex : (u,w) \rightarrow 1\) iff \(u\) is true at \(w\), \(0\) otherwise
- \(\lambda \in \mathbb{R}^+\) a free energy temperature parameterThe higher the lambda, the more rational the speaker/listener
We define by recursion over \(n \in \mathbb{N}\) following triplet \(S, L, U\)Speaker, Listener, Utility
where \(L_0\) is the naïve listener.
\[ \begin{eqnarray} L_0&(w&\mid& u) &\propto& P(w)Lex(u,w) \\ U_{n+1}&(u&\mid& w) &=& \log(L_n(w\mid u)) - c(u) \\ S_{n+1}&(u&\mid& w) &\propto& e^{\lambda U_{n+1}(u\mid w)} \\ L_{n+1}&(w&\mid& u) &\propto& P(w)S_{n+1}(u\mid w) \end{eqnarray} \]
(An) Implementation of the RSA
Remarks about the code:
- A single recursive function handles both the speaker and the listener
- The structure distinguish the naïve listener, hard coded, and furthers listeners/speakers that are computed, hence the immediate
if
- There’s an helper function to avoid passing the whole universe around. So the main function gets them but the recursive calls only have the “level” so as to avoid a huge call stack
- I’m hiding some external functions here
Apart from that, thanks to R
syntactic sugar, the implementation follows very closely the definitions.
Things we could to improve this if need be:
- Make
Sn
andLn
an external array so that it doesn’t get passed up though the call stack. This is a quick modification, will do if it gets slow - Optimise
normCol
andnormRow
, these ones are dumb-ish right now - I’m open to suggestions?
Here be the code:
<- function(priors,logic,cost,temp,n) {
L
<- normRow(logic * (1/sum(logic))) # L0
Ln
for (i in 1:n) {
<- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
Sn <- normRow(t(t(Sn) * priors)) # Def of Ln
Ln
}
list(Ln, Sn) # Returns the (Listener, Speaker) pair
}
A learning case
The parameters
I will only show how to call the function once. Then I’ll hide this.
<- c("∃¬∀", "∀")
worldNames <- c(1 , 1 )
some <- c(0 , 1 )
all <- c(1 , 0 )
justSome <- c("some", "all", "justSome")
lexNames <- c(0 , 0 , 1 )
cost <- t(data.frame(some , all , justSome ))
logic <- rep((1/length(worldNames)), length(worldNames)) # Flat
priors <- 1
temp <- 10
iter
names(cost) <- lexNames
colnames(logic) <- worldNames
<- L(priors,logic,cost,temp,iter)
list[Ln, Sn] ::kable(t(Ln)) knitr
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
<- c("cost", cost)
cost names(cost) <- c("Lex", lexNames)
Key points:
- Worlds: \(\{∃¬∀, ∀\}\)
- Lexicon: \(\{some, all, justSome\}\)
- \(\lambda = 1\)
- Flat priors
- Costs:
Lex | some | all | justSome |
---|---|---|---|
cost | 0 | 0 | 1 |
- Truth table:
∃¬∀ | ∀ | |
---|---|---|
some | 1 | 1 |
all | 0 | 1 |
justSome | 1 | 0 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
Four worlds, many messages
The parameters
Key points:
- Worlds: \(\{∅, A¬B, B¬A, A∨B\}\)
- Lexicon: \(\{A, B, AorB, AnotB, BnotA, notA, notB, neither\}\)
- \(\lambda = 5\)
- Flat priors
- Costs:
Lex | A | B | AorB | AnotB | BnotA | notA | notB | neither |
---|---|---|---|---|---|---|---|---|
cost | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |
- Truth table:
∅ | A¬B | B¬A | A∨B | |
---|---|---|---|---|
A | 0 | 1 | 0 | 1 |
B | 0 | 0 | 1 | 1 |
AorB | 0 | 0 | 0 | 1 |
AnotB | 0 | 1 | 0 | 0 |
BnotA | 0 | 0 | 1 | 0 |
notA | 1 | 0 | 1 | 0 |
notB | 1 | 1 | 0 | 0 |
neither | 1 | 0 | 0 | 0 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
Non fully informed – with a “?”
The parameters
Key points:
- Worlds: \(\{?, ∃¬∀, ∀\}\)
- Lexicon: \(\{some, all, someNotAll\}\)
- \(\lambda = 2\)
- Flat priors
- Costs:
Lex | some | all | someNotAll |
---|---|---|---|
cost | 0 | 0 | 2 |
- Truth table:
? | ∃¬∀ | ∀ | |
---|---|---|---|
some | 1 | 1 | 1 |
all | 0 | 0 | 1 |
someNotAll | 0 | 1 | 0 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
Four wor(l)ds
The parameters
Key points:
- Worlds: \(\{∅, ∃¬∀, ∀\}\)
- Lexicon: \(\{no, some, all, notAll\}\)
- \(\lambda = 4\)
- Flat priors
- Costs:
Lex | no | some | all | notAll |
---|---|---|---|---|
cost | 0 | 0 | 0 | 1 |
- Truth table:
∅ | ∃¬∀ | ∀ | |
---|---|---|---|
no | 1 | 0 | 0 |
some | 0 | 1 | 1 |
all | 0 | 0 | 1 |
notAll | 1 | 1 | 0 |
L\({}_{1}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{1}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
All possibilities
The parameters
Key points:
- Worlds: \(\{∅, ∃, ∃¬∀, ∀, ¬∀, ∅∨∀\}\)
- Lexicon: \(\{no, some, someNotAll, all, notAll, NoneOrAll\}\)
- \(\lambda = 2\)
- Flat priors
- Costs:
Lex | no | some | someNotAll | all | notAll | NoneOrAll |
---|---|---|---|---|---|---|
cost | 2 | 2 | 4 | 2 | 3 | 4 |
- Truth table:
∅ | ∃ | ∃¬∀ | ∀ | ¬∀ | ∅∨∀ | |
---|---|---|---|---|---|---|
no | 1 | 0 | 0 | 0 | 0 | 0 |
some | 0 | 1 | 1 | 1 | 0 | 0 |
someNotAll | 0 | 0 | 1 | 0 | 0 | 0 |
all | 0 | 0 | 0 | 1 | 0 | 0 |
notAll | 1 | 0 | 1 | 0 | 1 | 0 |
noneOrAll | 1 | 0 | 0 | 1 | 0 | 1 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
IIFD – no question
The parameters
Key points:
- Worlds: \(\{C, BC, AC, ABC, AB\}\)
- Lexicon: \(\{a, b, c, ab, ac, bc, abc\}\)
- \(\lambda = 1\)
- Flat priors
- Costs:
Lex | a | b | c | ab | ac | bc | abc |
---|---|---|---|---|---|---|---|
cost | 0 | 0 | 0 | 1 | 1 | 1 | 2 |
- Truth table:
C | BC | AC | ABC | AB | |
---|---|---|---|---|---|
a | 0 | 0 | 1 | 1 | 1 |
b | 0 | 1 | 0 | 1 | 1 |
c | 1 | 1 | 1 | 1 | 0 |
ab | 0 | 0 | 0 | 1 | 1 |
ac | 0 | 0 | 1 | 1 | 0 |
bc | 0 | 1 | 0 | 1 | 0 |
abc | 0 | 0 | 0 | 1 | 0 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
IIFD – artificial eroteticity
The parameters
Key points:
- Worlds: \(\{AB?, C?\}\)
- Lexicon: \(\{a, b, c, ab, aorc, borc\}\)
- \(\lambda = 1\)
- Flat priors
- Costs:
Lex | a | b | c | ab | aorc | borc |
---|---|---|---|---|---|---|
cost | 0 | 0 | 0 | 1 | 1 | 1 |
- Truth table:
AB? | C? | |
---|---|---|
a | 1 | 0 |
b | 1 | 0 |
c | 0 | 1 |
ab | 1 | 0 |
aorc | 1 | 1 |
borc | 1 | 1 |
L\({}_{30}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{30}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
Cost plot
Heat map – square one
Key points:
- Worlds: \(\{∃¬∀, ∀\}\)
- Lexicon: \(\{some, all, justSome\}\)
- \(\lambda \in [0,10]\)
- Flat priors
- Costs: \([0, 0, \alpha]\text{ for }\alpha\in[0,10]\)
- Truth table:
∃¬∀ | ∀ | |
---|---|---|
some | 1 | 1 |
all | 0 | 1 |
justSome | 1 | 0 |
- Iterations: \(100\)
Generate data:
<- 0.1
st <- 0.1
mi <- 2
ma <- seq(mi,ma,st)
lambdas <- seq(mi,ma,st)
costs <- length(costs)
size
<- matrix(rep(NA, size*size), nrow=size)
Llist
<- foreach(i=1:size, .combine='rbind') %dopar% {
Llist foreach(j=1:size, .combine='c') %dopar% {
<- L(priors,logic,costs[i]*cost,lambdas[j],iter)
list[Ln, Sn] 1,1]
Ln[
}
}
= data.frame(lambda=rep(lambdas,each=size),
df cost=rep(costs,size),
p=as.vector(Llist))
ggplot(df,aes(x = lambda, y = cost)) +
scale_fill_gradientn(colours=cbPalette) +
geom_raster(aes(fill = p), interpolate = TRUE) +
coord_equal()
Heat map – priors
Key points:
- Worlds: \(\{∅, ∃¬∀, ∀\}\)
- Lexicon: \(\{no, some, all, justSome\}\)
- \(\lambda \in [0,7.5]\)
- Costs: \([0, 0, 1]\)
- Truth table:
∅ | ∃¬∀ | ∀ | |
---|---|---|---|
no | 1 | 0 | 0 |
some | 0 | 1 | 1 |
all | 0 | 0 | 1 |
justSome | 0 | 1 | 0 |
- Iterations: \(500\) Generate data:
<- seq(0.05,7.5,0.02)
lambdas <- seq(0.05,0.95,0.02)
prior
<- Sys.time()
start_time <- foreach(i=1:length(lambdas), .combine='rbind') %dopar% {
Llist foreach(j=1:length(prior), .combine='c') %dopar% {
<- (1 - prior[j])/2
remain <- c(remain, remain, prior[j])
priors <- L(priors,logic,cost,lambdas[i],iter)
list[Ln, Sn] 2,2]
Ln[
}
}<- Sys.time() end_time
Order of magnitude for what is computed:
- Total \(L_n\) computations: \(3,730,000\)
- Data points: \(7,460\)
- Elapsed time
- \(2.20711203018824\) minutes,
- \(0.295859521472955\text{ms}/\text{data point}\)
- \(0.591719042945909\text{ns}/\text{iteration}\)
= data.frame(lambda=rep(lambdas,length(prior)),
df prior=rep(prior,each=length(lambdas)),
p=as.vector(Llist))
ggplot(df,aes(x = lambda, y = prior)) +
scale_fill_gradientn(colours=cbPalette) +
geom_raster(aes(fill = p), interpolate = TRUE)
Can we play with numbers?
<- 15
l <- c(seq(1,l))
worldNames <- matrix(1, l, l)
lex lower.tri(lex)] <- 0
lex[<- worldNames
lexNames <- c(log(seq(1,l)))/10
cost <- t(data.frame(lex))
logic <- rep((1/length(worldNames)), length(worldNames))
priors <- 1
temp <- 10
iter
names(cost) <- lexNames
colnames(logic) <- worldNames
<- L(priors,logic,cost,temp,iter)
list[Ln, Sn]
<- c("cost", signif(cost, digits=2))
cost2 names(cost2) <- c("Lex", lexNames)
Key points:
- Worlds: \(\{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\}\)
- Lexicon: \(\{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15\}\)
- \(\lambda = 1\)
- Flat priors
- Costs:
Lex | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cost | 0 | 0.069 | 0.11 | 0.14 | 0.16 | 0.18 | 0.19 | 0.21 | 0.22 | 0.23 | 0.24 | 0.25 | 0.26 | 0.26 | 0.27 |
- Truth table, an at least fashion:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
X1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X3 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X4 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X5 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X6 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X7 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
X9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
X10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
X11 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
X12 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
X13 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
X14 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
X15 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
L\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.81 | 0 | 1 |
∀ | 0.19 | 1 | 0 |
S\({}_{10}\)
some | all | justSome | |
---|---|---|---|
∃¬∀ | 0.69 | 0.00 | 0.31 |
∀ | 0.16 | 0.84 | 0.00 |
Moving on to Question Under Discussion (QUD)
To do:
- \(q\) such that \(\forall w \in W, q_a(w)\) returns the equivalence classes of \(w\) in \(W\) with respect to the answer for question \(a\), i.e. all the \(w \in W\) that answer the same question with regard to \(a\).
- Implement this in the strategies
Here’s an attempt:
<- function(priors,logic,cost,temp,n) {
L
<- normRow(logic * (1/sum(logic))) # L0
Ln
for (i in 1:n) {
<- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
Sn <- normRow(t(t(Sn) * priors)) # Def of Ln
Ln # This should be replaced with an update that updates all elements of the same equivalence class for a given question.
}
list(Ln, Sn) # Returns the (Listener, Speaker) pair
}
Tools
Here are some functions/options used up there:
# Let's not print too many digits for the human eye
options(digits=2)
= function(m) { # Normalises on columns.
normCol t(t(m)/colSums(m))
}
= function(m) { # Normalises on rows. This was benchmarked.
normRow /rowSums(m)
m
}
# https://stat.ethz.ch/pipermail/r-help/2004-June/053343.html
# See below for a "why"
<- structure(NA,class="result")
list "[<-.result" <- function(x,...,value) {
<- as.list(match.call())
args <- args[-c(1:2,length(args))]
args length(value) <- length(args)
for(i in seq(along=args)) {
<- args[[i]]
a if(!missing(a)) eval.parent(substitute(a <- v,list(a=a,v=value[[i]])))
}
x
}
# Allows returning/matching tuples from functions:
<- function() { list(a = 2, b = 6) }
useless <- useless() # sets a to 2, ignores "b"
list[a,] <- useless() # sets b to 6, ignores "a"
list[,b] <- useless() # sets c to 2 and d to 6 list[c,d]