YsummarY, use Tab ↹, Return/Enter and go back (⌘ + ←) to navigate.

Deepseek R1 671b Running LOCAL AI LLM is a ChatGPT Killer!

YouTube Video

This YouTube video documents the YouTuber’s attempt to run the DeepSeek 671B large language model (LLM) locally on a CPU-based system. Key points include:

Technical Challenges & Setup:

  • Resource Intensive: Running 671B locally is highly resource-intensive, requiring substantial RAM (the YouTuber uses a system with ~1.5TB RAM via multiple R930 CPUs). This is not a recommended approach for most users.
  • Parallel Processing Issue: The YouTuber encounters significant problems with parallel processing, resulting in drastically reduced performance (1-35 tokens per second instead of expected higher rates). The context window size is unexpectedly multiplied by four, hindering performance. They’re seeking community assistance to resolve this.
  • Hardware: The setup utilizes older high-end CPUs (R930s) and massive amounts of RAM. The YouTuber details cost-effective ways to achieve this, but cautions against this approach for most individuals. Alternative hardware, like the MZ32 AR0 motherboard, is discussed as better suited for scaling RAM.
  • VM as a potential Solution: A virtual machine (VM) is considered as a potential solution to the parallel processing issue.

DeepSeek 671B Performance & Observations:

  • Slow Speed: The model’s speed is extremely slow when run locally on CPUs, taking hours to answer simple questions. Performance degrades significantly over time, as shown by the varying tokens-per-second rates across multiple questions.
  • Accuracy: While the model correctly answers some questions, it fails on others, including a “one-shot” prompt. The performance is erratic.
  • Reasoning Capabilities: Despite the slow speed, the model demonstrates impressive reasoning abilities in certain scenarios (e.g., the “Armageddon with a Twist” prompt), considering multiple perspectives.

Broader Implications & Discussion:

  • DeepSeek’s Impact: The YouTuber comments on the market reaction to DeepSeek, suggesting it’s overblown. They highlight DeepSeek’s open-source nature and the acceleration of inference speeds achieved through optimized code as key factors.
  • AGI Potential: The video explores the potential implications of DeepSeek and open-source AI models for the future of Artificial General Intelligence (AGI).
  • Future Work: The YouTuber plans to post detailed results and the full text of the “Armageddon with a Twist” interaction on their blog. They also mention upcoming reviews of other LLMs like Janice and the new Quinn Vision model.

In short, the video is a technical deep-dive into the challenges and rewards of running a large LLM locally, alongside commentary on the broader implications of DeepSeek’s advancements and the ongoing race towards AGI.

Next: Why Trump made a deal to free Ross Ulbricht
Prev: DeepSeek Just CRUSHED Big Tech AGAIN With JANUS PRO - New SHOCKING AI Model!