Learning RSA

Toward an account of the Illusory Inference?

November 2017 — Mathias Sablé Meyer

Table of contents

## Loading required package: iterators
## Loading required package: parallel

Grab the source of this file here

Many thanks to Benjamin Spector

His seminar on RSA introduced me to this and he provided real-time feedback on my attempts to implement it.

A wee bit of theory

To do. Or better, refer to someone more competent (see previous title).

Let:

We define by recursion over \(n \in \mathbb{N}\) following triplet \(S, L, U\)Speaker, Listener, Utility

 where \(L_0\) is the naïve listener.

\[ \begin{eqnarray} L_0&(w&\mid& u) &\propto& P(w)Lex(u,w) \\ U_{n+1}&(u&\mid& w) &=& \log(L_n(w\mid u)) - c(u) \\ S_{n+1}&(u&\mid& w) &\propto& e^{\lambda U_{n+1}(u\mid w)} \\ L_{n+1}&(w&\mid& u) &\propto& P(w)S_{n+1}(u\mid w) \end{eqnarray} \]

(An) Implementation of the RSA

Remarks about the code:

Apart from that, thanks to R syntactic sugar, the implementation follows very closely the definitions.

Things we could to improve this if need be:

Here be the code:

L <- function(priors,logic,cost,temp,n) {

  Ln <- normRow(logic * (1/sum(logic))) # L0

  for (i in 1:n) {
    Sn <- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
    Ln <- normRow(t(t(Sn) * priors))            # Def of Ln
  }

  list(Ln, Sn) # Returns the (Listener, Speaker) pair

}

A learning case

The parameters

I will only show how to call the function once. Then I’ll hide this.

worldNames <- c("∃¬∀", "∀")
some       <- c(1    , 1  )
all        <- c(0    , 1  )
justSome   <- c(1    , 0  )
lexNames   <-            c("some", "all", "justSome")
cost       <-            c(0     , 0    , 1         )
logic      <- t(data.frame(some  , all  , justSome  ))
priors     <- rep((1/length(worldNames)), length(worldNames)) # Flat
temp       <- 1
iter       <- 10

names(cost)     <- lexNames
colnames(logic) <- worldNames

list[Ln, Sn] <- L(priors,logic,cost,temp,iter)
knitr::kable(t(Ln))
some all justSome
∃¬∀ 0.81 0 1
0.19 1 0
cost        <- c("cost", cost)
names(cost) <- c("Lex", lexNames)

Key points:

  • Worlds: \(\{∃¬∀, ∀\}\)
  • Lexicon: \(\{some, all, justSome\}\)
  • \(\lambda = 1\)
  • Flat priors
  • Costs:
Lex some all justSome
cost 0 0 1
  • Truth table:
∃¬∀
some 1 1
all 0 1
justSome 1 0

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

Four worlds, many messages

The parameters

Key points:

  • Worlds: \(\{∅, A¬B, B¬A, A∨B\}\)
  • Lexicon: \(\{A, B, AorB, AnotB, BnotA, notA, notB, neither\}\)
  • \(\lambda = 5\)
  • Flat priors
  • Costs:
Lex A B AorB AnotB BnotA notA notB neither
cost 0 0 1 1 1 1 1 1
  • Truth table:
A¬B B¬A A∨B
A 0 1 0 1
B 0 0 1 1
AorB 0 0 0 1
AnotB 0 1 0 0
BnotA 0 0 1 0
notA 1 0 1 0
notB 1 1 0 0
neither 1 0 0 0

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

Non fully informed – with a “?”

The parameters

Key points:

  • Worlds: \(\{?, ∃¬∀, ∀\}\)
  • Lexicon: \(\{some, all, someNotAll\}\)
  • \(\lambda = 2\)
  • Flat priors
  • Costs:
Lex some all someNotAll
cost 0 0 2
  • Truth table:
? ∃¬∀
some 1 1 1
all 0 0 1
someNotAll 0 1 0

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

Four wor(l)ds

The parameters

Key points:

  • Worlds: \(\{∅, ∃¬∀, ∀\}\)
  • Lexicon: \(\{no, some, all, notAll\}\)
  • \(\lambda = 4\)
  • Flat priors
  • Costs:
Lex no some all notAll
cost 0 0 0 1
  • Truth table:
∃¬∀
no 1 0 0
some 0 1 1
all 0 0 1
notAll 1 1 0

L\({}_{1}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{1}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

All possibilities

The parameters

Key points:

  • Worlds: \(\{∅, ∃, ∃¬∀, ∀, ¬∀, ∅∨∀\}\)
  • Lexicon: \(\{no, some, someNotAll, all, notAll, NoneOrAll\}\)
  • \(\lambda = 2\)
  • Flat priors
  • Costs:
Lex no some someNotAll all notAll NoneOrAll
cost 2 2 4 2 3 4
  • Truth table:
∃¬∀ ¬∀ ∅∨∀
no 1 0 0 0 0 0
some 0 1 1 1 0 0
someNotAll 0 0 1 0 0 0
all 0 0 0 1 0 0
notAll 1 0 1 0 1 0
noneOrAll 1 0 0 1 0 1

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

IIFD – no question

The parameters

Key points:

  • Worlds: \(\{C, BC, AC, ABC, AB\}\)
  • Lexicon: \(\{a, b, c, ab, ac, bc, abc\}\)
  • \(\lambda = 1\)
  • Flat priors
  • Costs:
Lex a b c ab ac bc abc
cost 0 0 0 1 1 1 2
  • Truth table:
C BC AC ABC AB
a 0 0 1 1 1
b 0 1 0 1 1
c 1 1 1 1 0
ab 0 0 0 1 1
ac 0 0 1 1 0
bc 0 1 0 1 0
abc 0 0 0 1 0

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

IIFD – artificial eroteticity

The parameters

Key points:

  • Worlds: \(\{AB?, C?\}\)
  • Lexicon: \(\{a, b, c, ab, aorc, borc\}\)
  • \(\lambda = 1\)
  • Flat priors
  • Costs:
Lex a b c ab aorc borc
cost 0 0 0 1 1 1
  • Truth table:
AB? C?
a 1 0
b 1 0
c 0 1
ab 1 0
aorc 1 1
borc 1 1

L\({}_{30}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{30}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

Cost plot

L(AB?|{a,b}) vs L(AB?|{ab}) as a function of cost

Heat map – square one

Key points:

∃¬∀
some 1 1
all 0 1
justSome 1 0

Generate data:

st      <- 0.1
mi      <- 0.1
ma      <- 2
lambdas <- seq(mi,ma,st)
costs   <- seq(mi,ma,st)
size    <- length(costs)

Llist <- matrix(rep(NA, size*size), nrow=size)

Llist <- foreach(i=1:size, .combine='rbind') %dopar% {
  foreach(j=1:size, .combine='c') %dopar% {
    list[Ln, Sn] <- L(priors,logic,costs[i]*cost,lambdas[j],iter)
    Ln[1,1]
  }
}

df = data.frame(lambda=rep(lambdas,each=size),
                cost=rep(costs,size),
                p=as.vector(Llist))

ggplot(df,aes(x = lambda, y = cost)) +
  scale_fill_gradientn(colours=cbPalette) +
  geom_raster(aes(fill = p), interpolate = TRUE) +
  coord_equal()
Heat map of (lambda, cost) -> L(∃¬∀|‘some’)

Heat map – priors

Key points:

∃¬∀
no 1 0 0
some 0 1 1
all 0 0 1
justSome 0 1 0
lambdas <- seq(0.05,7.5,0.02)
prior   <- seq(0.05,0.95,0.02)

start_time <- Sys.time()
Llist <- foreach(i=1:length(lambdas), .combine='rbind') %dopar% {
  foreach(j=1:length(prior), .combine='c') %dopar% {
    remain <- (1 - prior[j])/2
    priors <- c(remain, remain, prior[j])
    list[Ln, Sn] <- L(priors,logic,cost,lambdas[i],iter)
    Ln[2,2]
  }
}
end_time <- Sys.time()

Order of magnitude for what is computed:

df = data.frame(lambda=rep(lambdas,length(prior)),
                prior=rep(prior,each=length(lambdas)),
                p=as.vector(Llist))

ggplot(df,aes(x = lambda, y = prior)) +
  scale_fill_gradientn(colours=cbPalette) +
  geom_raster(aes(fill = p), interpolate = TRUE)
Heat map of (lambda, prior) -> L(∃¬∀|‘some’)

Can we play with numbers?

l <- 15
worldNames <- c(seq(1,l))
lex <- matrix(1, l, l)
lex[lower.tri(lex)] <- 0
lexNames   <- worldNames
cost       <- c(log(seq(1,l)))/10
logic      <- t(data.frame(lex))
priors     <- rep((1/length(worldNames)), length(worldNames))
temp       <- 1
iter       <- 10

names(cost)     <- lexNames
colnames(logic) <- worldNames

list[Ln, Sn] <- L(priors,logic,cost,temp,iter)

cost2        <- c("cost", signif(cost, digits=2))
names(cost2) <- c("Lex", lexNames)

Key points:

Lex 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
cost 0 0.069 0.11 0.14 0.16 0.18 0.19 0.21 0.22 0.23 0.24 0.25 0.26 0.26 0.27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
X2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
X3 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
X4 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
X5 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
X6 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0
X7 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
X8 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
X9 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
X10 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
X11 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
X12 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
X13 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0
X14 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
X15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

L\({}_{10}\)

some all justSome
∃¬∀ 0.81 0 1
0.19 1 0

S\({}_{10}\)

some all justSome
∃¬∀ 0.69 0.00 0.31
0.16 0.84 0.00

Moving on to Question Under Discussion (QUD)

To do:

Here’s an attempt:

L <- function(priors,logic,cost,temp,n) {

  Ln <- normRow(logic * (1/sum(logic))) # L0

  for (i in 1:n) {
    Sn <- normCol(exp(temp * (log(Ln) - cost))) # Def of Sn
    Ln <- normRow(t(t(Sn) * priors))            # Def of Ln
    # This should be replaced with an update that updates all elements of the same equivalence class for a given question.
  }

  list(Ln, Sn) # Returns the (Listener, Speaker) pair

}

Tools

Here are some functions/options used up there:

# Let's not print too many digits for the human eye
options(digits=2)

normCol = function(m) { # Normalises on columns.
  t(t(m)/colSums(m))
}

normRow = function(m) { # Normalises on rows. This was benchmarked.
  m/rowSums(m)
}

# https://stat.ethz.ch/pipermail/r-help/2004-June/053343.html
# See below for a "why"
list <- structure(NA,class="result")
"[<-.result" <- function(x,...,value) {
   args <- as.list(match.call())
   args <- args[-c(1:2,length(args))]
   length(value) <- length(args)
   for(i in seq(along=args)) {
     a <- args[[i]]
     if(!missing(a)) eval.parent(substitute(a <- v,list(a=a,v=value[[i]])))
   }
   x
}

# Allows returning/matching tuples from functions:
useless   <- function() { list(a = 2, b = 6) }
list[a,]  <- useless() # sets a to 2, ignores "b"
list[,b]  <- useless() # sets b to 6, ignores "a"
list[c,d] <- useless() # sets c to 2 and d to 6