How to implement an experiment

CS’s advice to Cψ

Mathias Sablé Meyer

What is this document?

This is intended as some kind of conceptual how-to, a theoretical tutorial if you wish. I will not give actual programming tips in a given language and if I do its just for illustration purpose and it is not intended to actually run.

With that in mind, this is what I took from a few years of theoretical computer scicence when confronted with an arguably horrible language in a situation where my priorities were not conceptual beauty of a solution but actual implementation of a practical experiment. I am by no way an expert in experiment design, and what is said here is my view on the matter which you should only take as a source of inspired criticism.

Premilinary remarks

Typical RSVP presentation
Typical RSVP presentation

A few things to keep in mind:

Sounds about right?

What are your goals when designing an experiment?

What should your code ensure:

  1. You know what your stimuli are
  2. The data has to be safe at all cost
  3. The code should reflect the logic of the experiment

The order here is purely indicative. I’ll argue that 3. helps with 1. and 2. but it is still not a goal per se.

Note that 1. is not something like you should have absolute control over the stimuli: hardware fails, human mistakes arise, many things prevent you from a percent control, but if such things occurs — you miss a frame, your a given audio stimuli slightly stutters, whatever — the least you should do for the sake of reproducibility is to know it.

The second point is more practical and naïve: no matter what, somehow you should save the data: maybe the experiment will stop mid-run — hardware failure, programming failure, subject failure, you name it. If that ever happens, for the sake of science, for the sake of your time and of the subject’s time, for the sake of public financing that go in research, what has been done has to be safe.

The third point is arguably even more central: if you wan peer-review to make sense, if you seek reproducibility, or if you want to work with co-researchers, then your code should be readable and I would argue that this prevails over micro-optimisation in the long run.

Know your stimuli

If timing is not frame-critical, well, there’s nothing much to say: if there is an obvious timing problem, then you’ll probably see it, otherwise it may not be worth investigating.

I would like to give general advice though:

Having order of magnitude in mind

I’ll give an anonymous quick example of a discussion with a colleague whose experiment was displaying words. Many words, quickly, but words, one per frame. Now, I was piloting and he told me to wait while the experiment was preparing. Surprised that it was computing things I asked him what was going on, to what he answered:

I’m pre-computing all frames and storing them, and you see how long it takes so I couldn’t do that real time

I asked them where they were stored and he said simply: “in the hard drive, why?”.

Let’s assume for a second that the frame is indeed stored uncompressed and on the hard drive. That’s about 1000*1000 pixels, or 1M times (r,g,b) tuple, hence a total of about 3Mb, give or take. A typical mechanical hard drive that you find on experimentation setup (as opposed to SSD, much quicker but still not everywhere) has a sequential read speed of 150Mb/s plus some fetch time that we will ignore. Thus reading your frame to display it takes about 0.02s or 20ms. Now, with a refresh rate of 60 frame per second, you have 16ms to compute a frame.

Even with luck, good hardware and low-level optimisation, you’re playing with fire here: you’ll find yourself missing frames, while your experiment’s bottleneck will be the hard drive!

RAM is quicker, significantly so: one to two orders of magnitude for sequential access, much more for random access (remember the name? Random Access Memory? Note that we’re talking about transfer rate, not access time, where there is more like 3-4 order of magnitudes of difference between the two!). But it’s more limited in space, and with 32bit programs it’s hard-capped to about 3.75Gb.

Is that enough? Let’s see, suppose you reuse frames, you have 750 words (with a typical-length of 8 words per sentence, that’s about 100 sentences, seems realistic) but say half of them have an additional white spot for a photodiode (see later), and let’s forget about masks, fixation frames, and so on. That’s still 1500*3Mb, or 4,5Gb, just for this! Now of course a first remark from someone working in the field of information the is that a given frame contains less than 1Kb of information (I mean, look at what one can fit in just 4k !), but your not being efficient at all by storing frames!

Controls during an experiment

<…>

Save the data

Anticipate bugs

They may happen. Embrace them, after all from a semantic point of view a bug is just an undocumented feature — see how hard it is to formally define a bug? Maybe you changed on-the-fly the experiment for some reason, maybe you thought it was nothing and it wouldn’t lead to an error, maybe the version on the experiment setup is slightly outdated and you did not anticipate enough, maybe something else. In any case, a bug may happen and if it does, it should not lead to everything failing.

A way to anticipate, which is ugly in many respects, is to program in such a way that you catch any unforeseen error, save every piece of relevant data, and then only raise the same error for the higher functions to deal with — if any.

See for example this kind of structure:

Note about this structure: any error thrownin the “code” block will be catched, no matter its nature or importance. Generally speaking this is bad practice: error are meaningfull and catching all of them under the same umbrella is not caring about this nature: you remove any semantic from the error to perform your piece of code. Well, good enough for you if in the end you save your data, but it may be a good idea to try to throw the received error at the end of the saving part to know what happened!

Another (ugly) thing you can do is to put your data in globally defined arrays: while I consider them bad practice in general, this means that in case of failure the variables are still declared and somewhere in the environment, they were not destroyed with their local environment. I usually start my code with global Results and then use this variable to store everything.

What about hardware failure?

This is all good for software failure, but what if the power goes off? It shouldn’t, sure, but what if? Have globall variables in RAM is not enough then, you want them on a type of memorry that handles power cut, typically a hard drive. But hard drive are also slower than memory so if timing is critical then be smart: write to the drive in a background process, or when there is a pause in the stimuli of any kind, just avoid real-time writing everything to the drive if timing matters.