Recovering Large Language Model Token Subspaces by Systematic Prompting
Some large language models (LLMs) are open source and are therefore fully open for scientific study. Many LLMs are proprietary and their internals are hidden, which hinders the ability of the research community to study their behavior under controlled conditions. For instance, the token input embedding specifies an internal vector representation of each token used by the model. If the token input embedding is hidden, latent semantic information about the set of tokens is unavailable to researchers. This talk outlines a general and flexible method for prompting an LLM to reveal its token input embedding, even if this information is not published with the model. If the LLM can be prompted systematically and certain benign conditions about the quantity of data collected from the responses are met, the token embedding is recovered up to homeomorphism. While the prompting can be a performance bottleneck depending on the size and complexity of the LLM, the recovery runs within a few hours on a typical workstation. With this method in hand, we demonstrate its effectiveness by recovering the token subspace of the Llemma-7B LLM.