Running LLM models at the crib

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
I've had a passing interesting in AI for the last few years, but only realized in the last month or so that I don't need an expensive laptop to run models locally.

Share what you are running, what front-end you are using, any tips/tricks you know of, etc.
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,087
Reputation
9,641
Daps
172,668
i've used LM Studio when i tested some quantized 6B & 7B models.



 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
Just use jupyter and hugging face
I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.


What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels
 

Ty Daniels

Superstar
Joined
Dec 13, 2019
Messages
2,022
Reputation
3,409
Daps
14,320
I already have jupyterlab in Docker for data analysis, but that UX is meh. I might as well just use the terminal instead lol.

I have been using Jan.ai and it does allow you to grab new models directly from huggingface.


What models are yall using and what are your use cases @bnew @greenvale @Ty Daniels


I'm mainly using it for AI Art/Editing, more so Stable Diffusion 1.5 and XL, along With Flux

Using Forge UI, Krita AI Diffusion, and sometimes Fooocus

Tools I use
- Krita (with Krita Ai Difussion) (Like Adobe's Generative Fill, but Free)
- Pinokio
- Stablity Matrix
- Google Colab
- ControlNet (SD 1.5 and XL)

I've also played with LLama installed locally, but mainly use ChatGPT(Claude etc...) for any non-art related tasks

 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
Damn I'm actually running KDE but never tried Krita. I'll have to spin the block for that one.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
Wake up babe brehette, the new GGUF QAT model just dropped:


What They're Doing

The post describes Unsloth's "Dynamic 2.0" quantization method for large language models, which they claim outperforms other quantization approaches including QAT (Quantization-Aware Training). They're primarily focusing on:


  1. Improved quantization techniques that reduce KL Divergence (a measure of how much the quantized model differs from full precision)
  2. Better calibration datasets (using conversational style data instead of WikiText)
  3. More accurate benchmarking methodology

Key Comparisons​


For Gemma 3 27B, they show various quantization levels comparing old vs new methods and QAT vs non-QAT approaches. The notable claim is that their Dynamic 4-bit quantization achieves +1% better performance on MMLU than Google's QAT while using 2GB less disk space.

So it sounds like you can do more w/ less. I am updating my models on my devices as we speak.
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
Forgot the links:

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only :banderas:
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,087
Reputation
9,641
Daps
172,668
Forgot the links:

Long story short, find models that have UD in the name which stands for 'unsloth dynamic'.

I'm now running the unsloth:gemma-3-4b-it-GGUF:gemma-3-4b-it-UD-Q4_K_XL.gguf on my chrultrabook (i3-1125g4) and the unsloth:gemma-3-12b-it-GGUF:gemma-3-12b-it-UD-IQ3_XXS.gguf on my dekstop (i5-8500) in spite of both devices being CPU-only :banderas:

how many tokens per second?
 

Macallik86

Superstar
Supporter
Joined
Dec 4, 2016
Messages
6,785
Reputation
1,587
Daps
22,362
how many tokens per second?
10 t/s for the 4B on my Chrultrabook which is manageable
1.5 t/s for the 12B on my Desktop which is not 😭


The 4B seems too small for my desktop tbh... Gwen3.0 was launched today (and then walked back) and it has an 8B model I believe which should be better for my desktop, so I'm waiting on unsloth's optimized version.
 

JayJedi

Pro
Supporter
Joined
Nov 26, 2016
Messages
384
Reputation
85
Daps
887
Reppin
618
I'm interesting in accessing LLM's as a hobbyist but I'm evaluating the potential payoff in my own life.

Are there are obvious practical uses to LLMs being used at home or is it primarily number crunching etc?

I've tried one localized LLM where it converted images into brief three second videos.

Is there any such thing as a local virtual assistant or grammar/spell bot........or is it pretty much tied to whatever datasets or API's you can access and analyze?
 

bnew

Veteran
Joined
Nov 1, 2015
Messages
63,087
Reputation
9,641
Daps
172,668
I'm interesting in accessing LLM's as a hobbyist but I'm evaluating the potential payoff in my own life.

Are there are obvious practical uses to LLMs being used at home or is it primarily number crunching etc?

I've tried one localized LLM where it converted images into brief three second videos.

Is there any such thing as a local virtual assistant or grammar/spell bot........or is it pretty much tied to whatever datasets or API's you can access and analyze?

the immediate benefit to using image models is privacy to avoid uploading your private photos so it can later be trained on.

local models that can probably help you with your taxes or personal finances and you don't want to divulge that info to a A.I company etc.
 
Top