Understanding Large-Language Models

Understanding Large-Language Models


Exploring the Foundations of LLM Systems

In an era where the boundaries of technology are constantly expanding, the emergence and evolution of Large Language Models (LLMs) marks a significant milestone in the development of ‘artificial intelligence’ (AI).

LLMs harness the power of vast datasets and advanced algorithms to revolutionize human-computer interaction, enabling them to ingest and generate human-like text with rapidly growing sophistication. With their ability to parse and generate human language, LLMs are not just tools for productivity, but transformational catalysts towards larger-scale knowledge processing infrastructures.

This article, inspired by Orion Reed's presentation titled “The ‘Design Space’ of LLMs: Demystifying LLMs & LLM-based systems” delves into the world of LLMs, shedding light on their fundamental building blocks, from the basic unit of tokens to the complexities of reasoning models. Another focal point is the integration of Retrieval-Augmented Generation (RAG) models, which combine natural language processing with dynamic information retrieval, adding a layer of information-processing depth to AI-generated content that reduces “hallucinations” and increases response accuracy. By demystifying the internal workings of LLMs, we aim to provide a better understanding of these powerful tools, setting the stage for informed discussions about their place in the future technological landscape and their implications on society, economy, and governance.

Exploring the Basics of LLMs: Tokenization, Context Windows & Embedding

At their core, LLMs interact with the digital representation of language, translating complex human communication into a loosely structured format of character codes and tokens.  This includes character encoding systems and translating varied alphabets and symbols into digital data. Digital representations of language also extend to technologies such as Text-to-Speech and Speech Recognition, enabling interactive voice responses and converting spoken words to text. The process forms the fundamental bedrock upon which the capabilities of LLMs are built, giving them the ability to efficiently interpret and generate utterances in “human languages.”  In order to do this, LLMs make use of various subfunctions – such as tokenization, context windows, and semantic embeddings, which are explored further below.

LLM Tokenization

Before processing any natural language text, it is critical to normalize the text, converting it into a more convenient, standardized form. This normalization often involves separating or tokenizing words from large blocks of ongoing text, which is vital for most language-related tasks. Tokens, in language processing, are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens. Tokens can be words, phrases, symbols, or other elements of written language that the system uses for analysis and comprehension. It's important to note that the exact tokenization process varies between models – and is not a manual process, but one optimized for LLMs. Different models will produce different tokens corresponding to the same input text. This process is not just a technical accomplishment, but a crucial step in enabling LLMs to engage with the complexities of human language. 

Context Windows & Positioning 

A  context window refers to the amount of text the model can consider at one time when making predictions or generating text. It defines the scope of text that the model considers at any given time, influencing its understanding and responses. Just as tokenization breaks down text into manageable units, context windows frame these units within a broader linguistic landscape. This framing allows LLMs to grasp the contextual significance of words, in addition to their immediate meanings.

This figure shows the loss as a function of token position for Claude 2 on very long context data. Image by Anthropic
This figure shows the loss as a function of token position for Claude 2 on very long context data. Image by Anthropic

The decision to set strict limits on context windows in large language models (LLMs) is primarily driven by the model's performance across different lengths of context. Essentially, as the context length increases, there often comes a point where the quality of the model's output starts to decline. The creators of these models determine an optimal cutoff point where the balance between quality and the amount of input text is most favorable. This means they choose to prioritize maintaining high-quality responses, even if it means restricting the amount of additional text the model can process as input. 

Understanding how these context windows function provides deeper insight into the operational mechanics of LLMs, and their ability to mimic human-like understanding and communication in language processing. The size of the context window can significantly affect the performance of the model. A smaller window may lead to inaccurate responses, due to insufficient context, while a larger window can provide more information but may also increase computational complexity and the risk of including irrelevant information. In addition to computational power, larger context windows also require more memory, which can be a limiting factor in the model's design. Techniques such as memory-augmented neural networks or sparse attention patterns can be used to mitigate these issues. 

An overview performance of LLMs for long context understanding. Image from Loogle: Can Long-Context Language Models Understand Long Contexts? 
An overview performance of LLMs for long context understanding. Image from Loogle: Can Long-Context Language Models Understand Long Contexts?

Semantic Embedding

It’s imperative to understand that these systems are not merely Generative Pre-trained Transformers with a text box for input. Delving deeper into their architecture reveals a sophisticated combination of technologies designed to simulate an intuitive interaction with the user. Among the critical components of these systems are embedding functions, which play a pivotal role in how these models process a vast range of inputs – from the rumblings of seismic activity to the nuances of handwritten numerals.

Embedding functions translate varied forms of data into a multi-dimensional coordinate system known as “the embedding space.” Here, the concept of ‘distance’ between objects is repurposed to represent their similarity: closer points indicate greater likeness in the semantic domain, while further points represent greater difference between concepts. For instance, ‘red’, ‘apple’, and ‘fruit’ might find themselves in proximity within this abstract embedding space. This spatial arrangement allows for complex data to be understood and categorized with remarkable accuracy. An open-source prompt-based embedding called “Instructor” exemplifies the adaptability of LLMs. It leverages prompt-driven fine-tuning to manipulate embeddings for different purposes, from clustering academic concepts to distinguishing linguistic styles for varied audiences.

An example of semantic embedding along two dimensions. Image from Supervised Understanding of Word Embeddings by Halid Parmeet, Yoshihisa
 An example of semantic embedding along two dimensions. Image from Supervised Understanding of Word Embeddings by Halid Parmeet, Yoshihisa

Lastly, vector stores act as specialized databases that track these high-dimensional vectors, offering efficient retrieval based on their "distance" from each other. This distance is a proxy for conceptual similarity, a crucial feature for models like ChatGPT that need to distill extensive conversation histories into a condensed, comprehensible format. The context window of LLMs, despite its impressive capacity – spanning tens or hundreds of thousands of tokens – is not infinite. It cannot consider an entire file system simultaneously, which makes the efficiency of vector stores and embedding functions critical to the model's performance and memory. These components form what we experience as the deceptively simple ChatGPT interface, obscuring the intricate web of complex operations beneath.

LLM Reasoning Models`

LLMs are not monolithic machines; they can encompass diverse models of reasoning. These reasoning models represent a leap in LLM capabilities, enabling them to navigate complex thought processes and offer more nuanced and context-aware outputs –  transforming our approach to problem-solving and decision-making processes.

Schematic illustration of various approaches to problem-solving in LLMs. Each rectangle represents a thought, which is a coherent language sequence that serves as an intermediate step toward problem-solving. Image from Tree of Thoughts Prompting
Image from Tree of Thoughts Prompting

Chain of Thought reasoning, for instance, has emerged as a powerful tool in the realm of LLMs, in comparison with traditional “direct response” models. Unlike conventional models that directly output responses, Chain of Thought reasoning prompts the LLM to articulate intermediate steps or “reasoning processes.” This approach mimics human-like problem-solving, where a series of logical steps leads to a final conclusion. The model is not just providing an answer, but also detailing the thought process behind it – thereby enhancing transparency and reliability.

Self-consistency models, on the other hand, take a different – yet equally innovative – approach. These models generate multiple responses or hypotheses to a given query, then internally evaluate and 'vote' on the most plausible or accurate answer. This method leverages the idea of exploring diverse perspectives before converging on a single, most likely conclusion. The 'Tree of Thoughts' model, a variation of this domain that evaluates potential reasoning paths, discards the branches that it deems less feasible and focuses on the more promising ones, iterating towards better responses. Despite these advances, there remain limitations, as discussed in Language Models Do Not Possess the Uniquely Human Cognitive Ability of Relational Abstraction" by Monk et al. This study underscores the challenges LLMs face in mimicking complex human cognitive abilities –  which is important to keep in mind, given the often impressive outputs of these tools.

Guided and Augmented Generation in LLMs

Guided generation in language models refers to the process of directing the output of large language models to adhere to specific data schemas, structures, or syntaxes.

This is essential when the desired output is in a structured format other than plain text, e.g. a JavaScript Object Notation (JSON) file (a format for structuring data, and it's widely used for storing and transporting data in web applications). The simplest and most common approach to achieving this desired output is through the use of prompts. By instructing the model to produce responses in a particular format, such as a numbered list, the output can be more easily parsed. Due to the probabilistic nature of these models, however, this method can be brittle, and can potentially lead to inconsistent outputs. 

Context-free grammars (CFGs) are utilized to enhance robustness and guarantee structure. CFGs limit the model's output to a predefined syntax or grammar, ensuring that the generated text adheres to specific rules. This approach is particularly useful in scenarios where the output needs to match the grammar of a programming language, like Python, guaranteeing syntactical correctness (if not functional accuracy). A tool can be designed to generate SQL statements using Extended Backus-Naur Form (EBNF) grammar, for example, addressing concerns related to ambiguity and recursion. 

This flowchart offers a simplified overview of the complex processes involved in enabling a chat system to communicate and respond to user input. Image by Orion Reed 
This flowchart offers a simplified overview of the complex processes involved in enabling a chat system to communicate and respond to user input. Image by Orion Reed 

Another method for shaping the output of language models is augmented generation –   which involves building a context window around the input, often by querying external databases and incorporating the retrieved text into the input structure. This approach, known as Retrieval-Augmented Generation (RAG), allows for the generation of more complex and structured outputs. RAG can significantly speed up LLM’s language processing by generating only the necessary tokens, ensuring data integrity when transferring information within an LLM system. These methods represent a significant advancement in the field of natural language processing, enabling more precise and reliable outputs from language models.

Evolving LLM Interfaces

The interfaces through which we interact with Large Language Models (LLMs) shape our collective understanding and expectations of this technology.

The chat interface is the most widely recognized, offering a simple experience: You type a message into a box, hit send, and receive a coherent response. This interaction model has significantly influenced public perception and acceptance of LLMs. Beyond chat, however, there are diverse ways to engage with these models –  some of which are being experimented with as internal research projects at BlockScience, such as project Interlay, seen below. 

A demo of project Interlay, an internal sandbox project at BlockScience, experimenting at the cross section of AI, knowledge management systems, and open source collaborative interfaces.
A demo of project Interlay, an internal sandbox project at BlockScience, experimenting at the cross section of AI, knowledge management systems, and open source collaborative interfaces.

Diagrammatic representations, for example, allow users to collaboratively manipulate blocks or nodes in a graph, resembling multi-player workflows that resemble signal processing chains. In these instances, LLMs can process streams of natural language in various capacities, akin to signal processing for audio data. Another approach involves spatially encoding semantics, where the proximity on a canvas can reflect changes along a defined dimension –  such as lengthening a text, or shifting its style. 

Interface augmentation may soon transform our software interactions. Instead of chat interfaces and limited menus, imagine right-clicking text and receiving smart, context-aware actions from an LLM. Done properly, these tools could offer users the most relevant choices out of countless options, streamlining our digital experience by working in the background to enhance usability, without overwhelming users with complexity. Using LLMs in this way could redefine human-computer interaction in digital environments, making AI technologies a seamless and integral part of our societal knowledge organization infrastructure (KOI).

Conclusion

The exploration of LLMs opens a window into a future where language and technology converge in unprecedented ways. For instance, LLMs can be leveraged for summarizing complex ideas into more digestible formats, translating not only between languages but also between technicalities and causal explanations. This translation can involve analogies to bridge the gap between expert and layman understandings.

The integration of Large Language Models into the systems we design can take many forms and serve a variety of functions, each adding layers of sophistication and utility. LLMs can enhance existing user interfaces, making them more interactive and adaptive. They can perform tasks such as classification, categorization, ranking, and evaluation, which bring us into a new paradigm of data transformation.

These functionalities allow for the creation of systems that not only possess higher levels of apparent intelligence but can also apply these skills reflexively. An appropriately-designed LLM could even evaluate and rank its own outputs, facilitating a form of self-improvement and quality control within the AI system itself - a process necessitating appropriate risk analysis and rigorous engineering design!While this list is not exhaustive, it underscores the vast potential applications of LLMs.

It is important to note that the role of an LLM in a system is not confined to direct interaction via chat interface, although this modality is still the most common way that users interact with most LLMs: An LLM can also operate behind the scenes, executing tasks that, are integral to a system's efficacy and intelligence, even if they are not immediately visible to its users. This versatility is what makes LLMs such a powerful tool in the development of advanced information-processing solutions.

The continued evolution of LLMs holds the opportunity to rethink how humans interact with our digital tools –  and to reimagine the ways that we manage information processing and knowledge generation, as individuals and as collectives.  As we stand on the cusp of this new era, it is essential that we continue to build and use LLM-based tools ethically – recognizing them as public infrastructure, and engineering them to meet the needs of the people they are meant to serve. 

Acknowledgements:

Article written by Hashir Nabi, inspired by research presentation from Orion Reed with feedback, edits, and publication by Jeff Emmett, David Sisson , Jessica Zartler, and Ilan Ben-Meir.


Citations

  1. The ‘Design Space’ of LLM’s (Demystifying LLM’s & LLM-based systems) 
  2. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  3.  Language model tokenizer
  4. Model Card and Evaluations for Claude Models
  5. Memory Augmented Neural Networks (MANNs): Enhancing the Power of Artificial Intelligence through Memory
  6. Generating Long Sequences with Sparse Transformers
  7. Lost in the Middle: How Language Models Use Long Contexts
  8. LooGLE: Can Long-Context Language Models Understand Long Contexts?
  9. Tree of Thoughts Prompting
  10. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  11. Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
  12. SocraSynth: Multi-LLM Reasoning with Conditional Statistics
  13. Tree of Thoughts: Deliberate Problem Solving with Large Language Models
  14. Language Models Do Not Possess the Uniquely Human Cognitive Ability of Relational Abstraction
  15. Efficient Guided Generation for Large Language Models
  16. Enabling Generative AI to Produce SQL Statements: A Framework for the Auto-Generation of Knowledge Based on EBNF Context-Free Grammars
  17. Knowledge Organization Infrastructure- Orion Reed & Luke Miller
  18. TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks
  19. One Embedder, Any Task: Instruction-Finetuned Text Embeddings
  20. Instructor (open-source prompt-based embedding
  21. t-SNE plot using supervised dimensions
  22. Generative Interfaces Beyond Chat // Linus Lee // LLMs in Production Conference
  23. Dynamic documents as personal software - Geoffrey Litt

About BlockScience

BlockScience® is a complex systems engineering, R&D, and analytics firm. By integrating cutting-edge research, applied mathematics, and computational engineering, we analyze and design safe and resilient socio-technical systems. We provide engineering, design, and analytics services to a wide range of clients, including for-profit, non-profit, academic, and government organizations, and contribute to open-source research and software development.

You've successfully subscribed to BlockScience Blog
You have successfully subscribed to the BlockScience Blog
Welcome back! You've successfully signed in.
Unable to sign you in. Please try again.
Success! Your account is fully activated, you now have access to all content.
Error! Stripe checkout failed.
Success! Your billing info is updated.
Error! Billing info update failed.