Imagine a magical knowledge database in which entries are not only stable collections of words, but also living institutions that can understand and generate human texts. Welcome to the world of large language models (LLM), which expand the boundaries of natural language processing in a wonderful way. As a software developer, we are currently beginning an attractive journey that takes us through the internal structures of these linguistic mechanisms. After all, the big language has revolutionized the AI ​​landscape in recent years.
Advertisement
Professor Dr. Michael Stall has been working in Siemens Technology since 1991. His research focuses on the software architecture for large complex systems (distributed systems, cloud computing, IIOT), embedded systems and artificial intelligence. He recommends professional fields on software architecture issues and is responsible for architectural training of senior software architects in Siemens.
Fasting your seat belt!
-
Introduction to llms
-
Hardware requirements and pretrand models
-
Architectural type LLM
-
Argument model

Basic principles of LLM
A large language model is a type of artificial intelligence (AI) designed to process and generate human language. It is a learning model that uses deep neural networks to learn patterns and relationships within language data. The “large” training in LLM refers to the huge amounts of data and a large number of parameters that can be seen as an adjustable button and which uses models to make predictions.
LLMS based Transformer architectureThat in 2017 “Meditation you all need“Was introduced. Unlike RNNS or CNN, the transformer uses the self-posost mechanism to record reference relationships between all words in a text in parallel.
Major components of transformers include:
- Encoder decoder structure (often decoder for LLM)
- Multiple attack
- Feed-forward network
- Residual connection and layer generalization
Does everything look like Bohemian villages? I try to explain the concepts in more detail.
Transformer architecture contains encoders and decoders, which in turn are made of neuronal networks.
(Image: Wikipedia)
An LLM is created by the following components:
- Torque: It is the model of a model that is responsible for separating input texts in small units called tokens. Token words, sub -squirts (small units within words) or even signs can occur.
- Embeding: The embeding layer is the place where the magic begins. This converts input tokens into numerical representations, which is called an embeding and can understand the model. These embedded the semantic significance of the token, so that the model can differentiate between words with similar meanings.
- Encoder: Encoder is the brain of the model that contains magic. He takes embeding and produces a constant representation of input text, often called “relevant presentation”. This presentation catches the nuances of the language such as syntax, semantics and references.
- Cobbler: The decoder is the creative side of the model, which is responsible for generating lessons based on input and relevant presentation. It behaves like a highly developed language generator that can produce consistent and relevant relevant texts.
- Training target: Training is the guiding principle of target model, which defines what the model should adapt during training. Repeated target masked voice models (tokens missing tokens) and predict the next sentence (predict what two sentences comply).
Each LLM has a pile of neuronal layers-in facts, dozens of similar transformer layers for a hierarchical facility processing:
- Initial layers Capture local patterns (such as word group).
- Deep layers Global references and abstract modeling.
Example: GPT-3 has 96 layers, 70 layers up to LLAma 2.
The purpose of a tokener is to convert the raw text into untrue units (tokens). There are many ways for this:
- Byte pair encoding (BPE): United States (eg “unhappy” from “United Nations” and “happy”).
- WordPiece: Uniform BPE, customized for subordinate units.
- Syntration: Processes the raw text without pre-tok.
Example: The sentence is “attractive!” `(” Of “,” is “,” fas “,” zin “,” eirand “,”, “!”) Can be separated in tokens.
Embeding: LLM’s building blocks
The base of LLMS that enables the model to present tokens (words, sub -wastes or characters) as numeric vector. Each token is converted into a high -level vector (eg 768 or 4096 dimensions). The model learns these vectors during training and captures the semantic importance of input tokens. There are different types of embeding, including:
- In word cases, words are represented as a vector in a high-dimensional location. The vector has a real number and often contains several hundred elements. Each dimension represents the domain selected by AI, such as color, animal style.
- Sub-volumes represent the sub-sum (small units within words) as the emballing vector.
- The signed cases represent individual characters as the vector.
At this point it is important that we developers do not define the dimensions themselves. This makes models during training without our intervention. A dimension can be about color, a different length. In any case, words like “hangover” and “cat” are very close together in multi -level vector room. On the other hand, words like “ice cream” and “space” are far away.
Physical coding and embeding
Physical coding, which is referred to as conditional embeding in English, plays an important role in LLM by obtaining an order of input tokens. LLMS destroys input text in the sequence of tokens and converts each token into a numerical representation. However, the order of tokens is lost in this conversion, which is necessary to understand the relationship between references and tokens. To solve this problem, to recall the order of the token, positional coding is added to the embedded tokens. LLM learns status coding during training. They work to encode the position of each token in the sequence. This enables the model to understand the relationship between the token and their position in the sequence.
There are different types of conditional coding:
- Use a certain coding for each position in absolute position coding sequence.
- Relative condition coding uses a relative coding that depends on the distance between the tokens.
- The model learns the status coding during training. They can be adapted on the basis of specific work and data records.
Feed forward layer: llms work animals
Feed forward layer (fully connected layer) is LLMS work animals. They take embeding in the form of input and produce a constant representation of the input text. Feed is made of forward layer:
- Linear layers apply a linear change to input embing to generate a new amount of vectors.
- Activation functions such as Relu (rectured linear unit) or Gelu (Gausi error linear unit), introduced non -diary to models to record complex patterns and relationships.
Multi-head station, self-station and cross-end: key to relevant understanding
Like in sentences “The dog chased the cat beating through the whole house. But she could hide in time “ There are different words, each of which is not separated only in the room. In the first sentence, “dog” refers to an activity “hunting” and hunting “cat”. So we have a strong relationship between the word “dog” for two other words in the same sentence. Each of these words adds the relationship of one word to another in the couple sentence. These connections can be strong or weak. An LLM calculates relationships for each word in a sentence, for any other word in the sentence. This is called self-station. It refers to “how to pay attention to each other”.
Because LLM creates a related approach in parallel to many places of a text, we are working with multi-headed posts. Without enough view, however, an LLM is lost significant relationship. So if the LLM was included in the sentences and each sentence was edited for itself, the second sentence will be lost that “he refers to” cat “in the first sentence. The cross station serves to determine attacks/relationships through a large reference. It is important that LLMs are limited in the size of the reference under consideration. The larger the reference, the more necessary RAM. The size of the context ranges from some kilobights (some typewriter) to several megabytes (full book content). If LLM has already noticed too much reference information and memory “overflow”, it begins to “mistake” the previous reference.
Self station is a mechanism that enables models to focus on different parts of the input text and generate a relevant performance. The second word is just the biggest relationship (attack) for the word of the word.
Multi-head station is a mechanism that enables attention to different parts of the input text of a divorced approach at the same time. Instead of a single attack surgery, the transformer uses several “heads”: heads capture various relationships (eg Syntax vs.). The outputs of the heads are concentrated and linearly estimated.
Benefits: The model also learns diverse reference dependence.
Cross station is a mechanism that enables the model to pay attention to external information to remove tokens, such as input text (prompt) or other models. This is particularly important for infection from the encoder to the decoder layer.
It is obtained by the following stages:
- Querry, key and price vectors: The model produces quarry, key and price vectors from input embing. The query vector represents reference, the key vector represents the input tokens, and the value vector represents the importance of each token.
- Confession Weight: Model Counts the weight by calculating the query and major vector scalar products. These loads represent the importance of each token in context.
- Weighted yoga: The model coefficients calculate the weighted yoga of the vector using the honor load as a coefficient. This produces a relevant representation of the input text.
According to the original observation of LLM, the next post belongs to hardware requirements and various pretense models.
(RME)
