• 1 Post
  • 13 Comments
Joined 3 years ago
cake
Cake day: June 16th, 2023

help-circle




  • Currently I’m running a Q6K quant of Hermes 4 14B with a 32K context window via llama.cpp that works pretty well. Generation output is a comfy ~50tok/sec. These v100s are 16GB each, but there are 32GB versions available too.

    I’m running everything via NixOS and have to do package overrides to get inference engines to build with the right CUDA versions.

    My goal is to get a cohesive environment set up for Hermes Agent to learn my system/lab/network and help my grow it over time.

    Overall, I’m happy with them. The mezzanine board is good quality, I’m using PTM sheets under those massive heatsinks and some arctic p9 fans to keep them at around 60C under load.