Running Deepseek R1 671B Versions Locally or 70B on Groq Remotely

By Brian Wang

Feb 02, 2025

∙ Paid

The distilled versions of Deepseek are not as good as the full model. They are vastly inferior and other models out perform them handily. Running the full model, with a 16K or greater context window, is possible for about $2000 at about 4 tokens per second. This uses an

Machine Specs

AMD EPYC 7702

512GB DDR4 2400 in 32GB 2×4 ECC DIMMS

Gigabyte MZ32-AR0 Single Socket Mobo

Typical storage was an 4 TB mirror of NVMe U.2’s but that is pulled right now for the storage redo.

Leaving me the boot mirror 512GB NVMe pair

100GbE Mellanox ConnectX-4

Proxmox 8.3.3

4x 3090 MSI Ventus GPUs (no sli)

Corsair 1500w PSU

Rig Frame

Corsair H170i Elite XT 420mm Water Block Works for SP3 with bracket.

LXC container settings running docker. Docker runs ollama/owui stack.

120 CPUs (threaded cores, recommend backing it off 8 to keep temps down 4c at peak)

496GB RAM

unprivileged container

Upgrading GPUs from 3090 to RTX5080 can improve the speed.

RTX 5090: 21,760 CUDA cores, 32GB GDDR7 memory, 575W TGP ($1999)

RTX 5080: 10,752 CUDA cores, 16GB GDDR7 memory, 360W TGP ($999)

RTX 5070 Ti: 8,960 CUDA cores, 16GB GDDR7 memory, 300W TGP6 ($799)

RTX 5070: 6,144 CUDA cores, 12GB GDDR7 memory, 250W TGP ($549)

Keep reading with a 7-day free trial

Subscribe to next BIG future to keep reading this post and get 7 days of free access to the full post archives.