Architecture & Model Design

(a) Large Language Models (LLMs)

  • Core architecture: Transformer-based deep learning networks.
  • Training Process:
    • Uses self-supervised learning (e.g., masked language modeling or next-word prediction).
    • Fine-tuned on specific datasets for tasks like summarization, translation, and reasoning.
  • Computational Complexity:
    • Requires massive amounts of data (tokens) and compute power (TPUs/GPUs).
    • Scaling laws suggest bigger models improve performance (e.g., GPT-4 vs. GPT-3.5).
  • Key Techniques:
    • Attention Mechanism (Self-Attention, Cross-Attention).
    • Positional Encoding for sequence handling.
    • Fine-tuning & Reinforcement Learning from Human Feedback (RLHF).

(b) Large Concept Models (LCMs)

  • Core architecture: Can integrate multiple AI paradigms:
    • Transformers for text understanding (like LLMs).
    • Neuro-symbolic AI (combining neural networks with logical rules).
    • Knowledge Graphs (explicitly modeling relationships between concepts).
    • Multimodal Learning (combining vision, audio, structured data, and text).
  • Training Process:
    • Uses semantic understanding and concept embeddings rather than just predicting next-word sequences.
    • Can be trained on structured knowledge, ontologies, and real-world reasoning datasets.
  • Computational Complexity:
    • Often more efficient than pure LLMs because it leverages structured knowledge rather than relying purely on statistical token prediction.
  • Key Techniques:
    • Graph Neural Networks (GNNs) to model relationships.
    • Multimodal Fusion to integrate images, text, and symbols.
    • Logic-augmented learning (e.g., Prolog-like inference in AI).