Which model fits my computer?

Starting points for typical practice setups

Choose the card that best matches the machine you actually dictate on - whether that is a shared computer at the clinic, your own laptop, or a desktop at home. If you are unsure how much memory the PC has, check with your practice IT or in the system settings.

Practice desktop or older laptop (~8 GB RAM)

Common for a reception PC, a spare room in a small clinic, or an older notebook: start with a smaller speech model so dictation feels responsive. Add optional note structuring later, or use a faster machine for that step if this one feels sluggish.

Current MacBook, iMac, or Windows laptop (~16 GB)

Typical for a solo GP, specialist consult room, or home office: a medium speech model is often a good fit. Once transcription feels reliable, try light note structuring - then adjust in the app if needed.

Higher-spec workstation or Apple with plenty of memory

For busy clinics, long dictation sessions, or when your EHR and browser already use a lot of RAM: you can usually aim for larger speech models and more capable optional structuring - then fine-tune in the picker.

Fine-tune in Miccy

Every computer is a bit different. Use the cards above as a starting point, then download from Miccy's model picker, run a short test dictation, and move up or down if speed or accuracy is not right.

Technical details for IT and advanced users

Know your machine (30 seconds)

Before picking a model, note:

System RAM (e.g. 8 / 16 / 32 GB) - shared on Apple Silicon with GPU tasks.
Discrete GPU VRAM (NVIDIA/AMD) if you have one - e.g. 4 / 8 / 12 GB. Integrated graphics usually shares system RAM.
Apple Silicon (M1/M2/M3) - treat "unified memory" as both RAM and GPU pool; larger models need more total memory headroom.
CPU-only is fine for smaller ASR models and small quantised LLMs - expect higher latency than GPU.

Speech-to-text (ASR)

Whisper-class models are usually named by size. Larger = better accuracy for many accents and languages, but heavier on disk, RAM, and compute. Numbers below are rules of thumb, not guarantees.

Whisper tier (illustrative)	Rough fit
Tiny / Base	Low-end laptops, CPU-only, or very tight VRAM. Fastest, lowest accuracy - ok for quick tests, not ideal for difficult clinical dictation.
Small	Balanced for many machines with limited GPU; good compromise on CPU if you can wait a bit.
Medium	Strong quality; prefers a modern CPU with many cores or a GPU with several GB VRAM / enough unified memory on Apple Silicon.
Large	Best quality for demanding audio; needs a capable GPU or Apple chip with plenty of unified RAM. Slow on weak CPUs.
Turbo (where offered)	Often a speed-focused variant - check release notes in the app; still match to your hardware tier above.

Parakeet / Moonshine (where available): typically optimised for English and very fast on supported hardware. Prefer them when your workflow is English-only and latency matters more than maximum language coverage - still confirm fit in the picker.

Optional note structuring (LLM)

Structuring uses a text model (often GGUF weights via llama.cpp, or your own Ollama-compatible server). Model "size" is measured in billions of parameters (7B, 13B, 70B...) and quantisation (Q4, Q5, Q8...) - lower quantisation = smaller file and less RAM/VRAM, some quality trade-off.

Very approximate VRAM / RAM needs (GGUF, inference):

~7B at Q4 - often on the order of 4-6 GB VRAM or unified memory for comfortable speed; CPU-only possible with enough system RAM but slower.
~13B at Q4 - often ~8 GB+ VRAM or headroom on unified memory machines.
Larger (34B / 70B) - typically high-end GPUs or workstations; not a first choice on laptops with 8 GB RAM.

If you point Miccy at Ollama (or similar) on another host, that machine's RAM/VRAM limits apply - your laptop only sends text, but the remote box must handle the model size you load there.

Download Miccy Back to Models & Languages