• theunknownmuncher@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    11 hours ago

    I believe so, and some may not really translate well into text at all, and instead represent some kind of specific or abstract visual feature. There would be an entire other neural network or part of a neural network specifically for decoding the tokens into text