Let's make it possible to go from thoughts -> text. Yeah, brain reading shit.

We can begin this work by predicting what words people are listening to. We can use a similar approach to Semantic reconstruction of continuous language from non-invasive brain recordings, but instead of using fMRI, we use low-cost, non-invasive EEG. This will have the added benefit of being much more accessible, and hopefully more insightful, since it will be much lower latency (although more noisy).

Here's the basic pipeline:

We start by collecting EEG recordings of one person, while they are watching several videos. We must have time-synced captions available for these videos. Let's go for a long time period: I'm thinking 10 hours of recordings.

Next, we train a encoder-decoder (transformer encodes text, diffusion model decodes EEG signal) model to predict an EEG recording given a word sequence. This is the most important model, that we will actually be training. It has to be good! Which is why we collect a lot of training data.

Now, we use language models! They're pretty good at completing sequences. The plan is to use LMs to generate word sequences, predict EEG recordings for these word sequences, and compare these predictions to the actual recording to find the most similar ones. And we keep doing this... there, we might have thought2text!

Some points in random order of things to think about: