GPT-4.1 builds upon its predecessor, GPT-4.5 (codenamed “Orion”), which was released on February 27, 2025. While specific architectural details about GPT-4.1 have not been publicly disclosed, it is known to support a context window of up to 1 million tokens and has a knowledge cutoff of June 2024. The model has been tested across various benchmarks, including academic knowledge, coding, instruction following, vision, and long-context tasks.
Parameters: GPT-4 is reported to have approximately 1.8 trillion parameters distributed across 120 layers.
Architecture: GPT-4 employs a Mixture of Experts (MoE) model comprising 16 experts, each with around 111 billion parameters. During inference, only two experts are active per forward pass, effectively utilizing about 280 billion parameters at a time.
Training Data: The model was trained on approximately 13 trillion tokens, sourced from diverse datasets such as CommonCrawl, RefinedWeb, and possibly platforms like Twitter, Reddit, YouTube, and various textbooks.
Training Cost: The estimated cost to train GPT-4 is around $63 million, considering the computational resources and time required.
While GPT-4.1 is the most recent model available, detailed information about its architecture and training parameters remains undisclosed. Previous models like GPT-4 and GPT-4.5 have been characterized by their large-scale parameter counts and extensive training data, leveraging advanced architectures such as the Mixture of Experts to optimize performance and efficiency.
#chatgpt #openAI #AI #MOE #GPT4 #orion #models #training #data #GPT4.5 #expert #varun #leaked