Architecture & Model Design
(a) Large Language Models (LLMs)
- Core architecture: Transformer-based deep learning networks.
- Training Process:
- Uses self-supervised learning (e.g., masked language modeling or next-word prediction).
- Fine-tuned on specific datasets for tasks like summarization, translation, and reasoning.
- Computational Complexity:
- Requires massive amounts of data (tokens) and compute power (TPUs/GPUs).
- Scaling laws suggest bigger models improve performance (e.g., GPT-4 vs. GPT-3.5).
- Key Techniques:
- Attention Mechanism (Self-Attention, Cross-Attention).
- Positional Encoding for sequence handling.
- Fine-tuning & Reinforcement Learning from Human Feedback (RLHF).
(b) Large Concept Models (LCMs)
- Core architecture: Can integrate multiple AI paradigms:
- Transformers for text understanding (like LLMs).
- Neuro-symbolic AI (combining neural networks with logical rules).
- Knowledge Graphs (explicitly modeling relationships between concepts).
- Multimodal Learning (combining vision, audio, structured data, and text).
- Training Process:
- Uses semantic understanding and concept embeddings rather than just predicting next-word sequences.
- Can be trained on structured knowledge, ontologies, and real-world reasoning datasets.
- Computational Complexity:
- Often more efficient than pure LLMs because it leverages structured knowledge rather than relying purely on statistical token prediction.
- Key Techniques:
- Graph Neural Networks (GNNs) to model relationships.
- Multimodal Fusion to integrate images, text, and symbols.
- Logic-augmented learning (e.g., Prolog-like inference in AI).