Which model fits my computer?
Most GPs and specialists run Miccy on a practice PC, a laptop in the consult room, or a computer at home for letters and admin. You download speech and optional note-structuring models inside the app; this page only suggests where to start. Exact names and versions are listed in Miccy's built-in picker.
Starting points for typical practice setups
Choose the card that best matches the machine you actually dictate on - whether that is a shared computer at the clinic, your own laptop, or a desktop at home. If you are unsure how much memory the PC has, check with your practice IT or in the system settings.
Practice desktop or older laptop (~8 GB RAM)
Common for a reception PC, a spare room in a small clinic, or an older notebook: start with a smaller speech model so dictation feels responsive. Add optional note structuring later, or use a faster machine for that step if this one feels sluggish.
Current MacBook, iMac, or Windows laptop (~16 GB)
Typical for a solo GP, specialist consult room, or home office: a medium speech model is often a good fit. Once transcription feels reliable, try light note structuring - then adjust in the app if needed.
Higher-spec workstation or Apple with plenty of memory
For busy clinics, long dictation sessions, or when your EHR and browser already use a lot of RAM: you can usually aim for larger speech models and more capable optional structuring - then fine-tune in the picker.
Fine-tune in Miccy
Every computer is a bit different. Use the cards above as a starting point, then download from Miccy's model picker, run a short test dictation, and move up or down if speed or accuracy is not right.
Technical details for IT and advanced users
Know your machine (30 seconds)
Before picking a model, note:
- System RAM (e.g. 8 / 16 / 32 GB) - shared on Apple Silicon with GPU tasks.
- Discrete GPU VRAM (NVIDIA/AMD) if you have one - e.g. 4 / 8 / 12 GB. Integrated graphics usually shares system RAM.
- Apple Silicon (M1/M2/M3) - treat "unified memory" as both RAM and GPU pool; larger models need more total memory headroom.
- CPU-only is fine for smaller ASR models and small quantised LLMs - expect higher latency than GPU.
Speech-to-text (ASR)
Whisper-class models are usually named by size. Larger = better accuracy for many accents and languages, but heavier on disk, RAM, and compute. Numbers below are rules of thumb, not guarantees.
| Whisper tier (illustrative) | Rough fit |
|---|---|
| Tiny / Base | Low-end laptops, CPU-only, or very tight VRAM. Fastest, lowest accuracy - ok for quick tests, not ideal for difficult clinical dictation. |
| Small | Balanced for many machines with limited GPU; good compromise on CPU if you can wait a bit. |
| Medium | Strong quality; prefers a modern CPU with many cores or a GPU with several GB VRAM / enough unified memory on Apple Silicon. |
| Large | Best quality for demanding audio; needs a capable GPU or Apple chip with plenty of unified RAM. Slow on weak CPUs. |
| Turbo (where offered) | Often a speed-focused variant - check release notes in the app; still match to your hardware tier above. |
Parakeet / Moonshine (where available): typically optimised for English and very fast on supported hardware. Prefer them when your workflow is English-only and latency matters more than maximum language coverage - still confirm fit in the picker.
Optional note structuring (LLM)
Structuring uses a text model (often GGUF weights via llama.cpp, or your own Ollama-compatible server). Model "size" is measured in billions of parameters (7B, 13B, 70B...) and quantisation (Q4, Q5, Q8...) - lower quantisation = smaller file and less RAM/VRAM, some quality trade-off.
Very approximate VRAM / RAM needs (GGUF, inference):
- ~7B at Q4 - often on the order of 4-6 GB VRAM or unified memory for comfortable speed; CPU-only possible with enough system RAM but slower.
- ~13B at Q4 - often ~8 GB+ VRAM or headroom on unified memory machines.
- Larger (34B / 70B) - typically high-end GPUs or workstations; not a first choice on laptops with 8 GB RAM.
If you point Miccy at Ollama (or similar) on another host, that machine's RAM/VRAM limits apply - your laptop only sends text, but the remote box must handle the model size you load there.