Local AI
Download a model once and run every rewrite entirely on your device. No API key, no internet required, zero text sent to external servers.
Why use Local AI
| Reason | Detail |
|---|---|
| Privacy | Your text never leaves your computer — not to ReWryte, not to any AI provider. |
| Offline use | Works without internet — on a plane, in a meeting room, anywhere. |
| Zero cost | No API keys, no tokens, no billing — download once, use forever. |
| No rate limits | No RPD or RPM caps — rewrite as fast as your hardware allows. |
| Sensitive documents | Legal, medical, HR, financial — content that should not go to a third-party cloud. |
Trade-off: Local models are slower than cloud on most machines, and the quality ceiling is lower than top-tier cloud models like GPT-4 or Claude Sonnet. For short-to-medium rewrites, the difference in quality is often imperceptible.
System detection
When you open Main Window → Local AI, ReWryte automatically scans your machine and shows you exactly which model fits your hardware — in real time, no manual configuration required.
What gets analyzed
What you see on the page
Tier assignment by RAM
| Your RAM | Recommended tier | Model selected |
|---|---|---|
| Under 8 GB | Light | Qwen 2.5 1.5B (~940 MB) |
| 8 GB – 15 GB | Light | Qwen 2.5 1.5B (~940 MB) |
| 16 GB – 23 GB | Balanced | Qwen 2.5 3B (~1.9 GB) |
| 24 GB or more | Quality | Qwen 2.5 7B (~4.4 GB) |
The three Qwen 2.5 variants
ReWryte ships three variants of Qwen 2.5 Instruct, quantized to Q4_K_M format (4-bit quantization that balances quality and file size). All three run fully on-device via llama.cpp.
| Model | Download | RAM (min / rec.) | Context | Best for |
|---|---|---|---|---|
| Qwen 2.5 1.5B Light | ~940 MB | 3 GB → 4 GB | 2,048 tokens | Any laptop, quick rewrites |
| Qwen 2.5 3B Balanced Recommended | ~1.9 GB | 6 GB → 8 GB | 2,048 tokens | Mid-range Macs, everyday use |
| Qwen 2.5 7B Quality | ~4.4 GB | 10 GB → 16 GB | 2,048 tokens | High-RAM Macs, best local quality |
Download and activate
- Open Main Window → Local AI. ReWryte analyzes your system and displays the model catalog with a "Recommended" badge on the right model for your hardware.
- Read the system card at the top — it shows your chip type, total RAM, available RAM, CPU cores, free disk, and recommended tier. Confirm the recommended model fits your setup.
- Click Download on the model with the Recommended badge. A progress bar shows download percentage and MB in real time.
- Qwen 2.5 1.5B: ~940 MB (~3–5 min on typical connection)
- Qwen 2.5 3B: ~1.9 GB (~6–10 min)
- Qwen 2.5 7B: ~4.4 GB (~15–25 min)
- Once downloaded, click Activate. ReWryte automatically switches to local inference mode.
- Go to Main Window → Pick your AI to confirm the local model shows as "Active" with an "On your device" tag.
Switching between cloud and local
You can toggle between cloud and local inference at any time — no restart needed.
Method 1 — Local AI page toggle
Main Window → Local AI → "Where rewrites run" toggle (On your device / In the cloud)
Method 2 — Pick your AI
Main Window → Pick your AI → click "Use this →" on any cloud provider (switches to cloud) or on a downloaded local model (switches to local)
Method 3 — Dashboard
Main Window → Dashboard → "Currently writing with" card → "Change" button → opens Pick your AI
Model comparison
How the Qwen 2.5 family compares to other local models for on-device text rewriting. All ratings use Q4_K_M quantization tested on Apple Silicon M2.
| Model | Size (Q4) | RAM needed | Instructions | Multilingual |
|---|---|---|---|---|
| Qwen 2.5 1.5B ReWryte Light tier ReWryte | ~940 MB | 3–4 GB | Good for size | Excellent |
| Qwen 2.5 3B ReWryte Balanced tier ReWryte | ~1.9 GB | 6–8 GB | Very good | Excellent |
| Qwen 2.5 7B ReWryte Quality tier ReWryte | ~4.4 GB | 10–16 GB | Excellent | Excellent |
| Llama 3.2 3B Meta's small model, English-first | ~2.0 GB | 6–8 GB | Good | Moderate |
| Llama 3.1 8B Strong English quality | ~4.7 GB | 10–16 GB | Very good | Moderate |
| Phi-3.5 Mini 3.8B Microsoft's compact model | ~2.3 GB | 6–8 GB | Good | Moderate |
| Gemma 2 2B Google's smallest model | ~1.4 GB | 4–6 GB | Fair | Limited |
| Mistral 7B v0.3 General purpose, good baseline | ~4.4 GB | 10–16 GB | Good | Moderate |
| Phi-4 Mini 3.8B Microsoft, strong reasoning | ~2.3 GB | 6–8 GB | Very good | Moderate |
Blue-bordered rows are models used by ReWryte. Ratings reflect performance on tone-guided rewriting tasks, not general benchmarks.
Why Qwen 2.5 is our choice
We chose the Qwen 2.5 family after testing each option across all four built-in tones in all 10 supported output languages. Here is the reasoning:
Multilingual strength
Qwen 2.5 was trained on a much larger multilingual corpus than Llama or Phi. If you write in any language other than English, Qwen performs significantly better. This matters because ReWryte supports output in 10 languages.
Instruction following at small sizes
The 1.5B and 3B Qwen models follow tone instructions more reliably than comparable-sized Llama or Gemma models. For a rewriting app where the prompt IS the instruction, this consistency is the most important factor.
Efficiency above weight class
Qwen 2.5 achieves competitive quality at lower parameter counts. The 3B and 7B variants punch above their weight relative to file size and RAM usage compared to models of similar size from other families.
Quantization quality
Q4_K_M is one of the most quality-preserving 4-bit quantization formats available. Qwen 2.5's architecture retains more capability after quantization than older model families, meaning less quality loss for the same file size reduction.
Frequently asked questions
Does Local AI work offline?
Yes. Once the model is downloaded, ReWryte requires no internet connection for rewrites. The model file is stored in the app data directory on your device.
Can I use Local AI and cloud providers at the same time?
Yes. Cloud providers remain connected while you use local inference. If you toggle local off, ReWryte falls back to the active cloud provider instantly. Switch between them any time using Pick your AI or the Local AI page toggle.
How much disk space does the model take?
Qwen 2.5 1.5B is ~940 MB, 3B is ~1.9 GB, and 7B is ~4.4 GB. You can have multiple models downloaded simultaneously. Delete any model from Main Window → Local AI to reclaim disk space — the catalog entry remains so you can re-download anytime.
Is GPU acceleration required?
No. llama.cpp runs on CPU only. However, Apple Silicon Macs use Metal GPU acceleration by default, providing roughly 3–5× faster inference compared to CPU-only mode. On an M1 Mac with 16 GB RAM, Qwen 2.5 3B typically responds in 1–3 seconds per rewrite.
Which model should I use on an Intel Mac?
CPU-only inference is inherently slower on Intel Macs. Use Qwen 2.5 1.5B for the fastest possible response. Expect 5–15 seconds per rewrite depending on RAM and the complexity of the text.
Can I download a model while using the app?
Yes. You can use cloud AI providers for rewrites while a local model is downloading in the background. The download appears as a progress bar on the Local AI page.
Can I resume a stopped download?
No. If a download is cancelled or fails, clicking Download restarts from the beginning. Ensure a stable connection before starting the 3B or 7B models.