| | param value | |
-h | --help | | Show this help message and exit |
-i | --interactive | | Run in interactive mode |
| --interactive-first | | Run in interactive mode and wait for input right away |
| -ins , --instruct | | Run in instruction mode (use with Alpaca models) |
-r | --reverse-prompt | PROMPT | Run in interactive mode and poll user input upon seeing PROMPT (can be specified more than once for multiple prompts). |
| --color | | Colorise output to distinguish prompt and user input from generations |
-s | --seed | SEED | Seed for random number generator (default: -1 , use random seed for <= 0) |
-t | --threads | N | Number of threads to use during computation (default: 12) |
-p | --prompt | PROMPT | Prompt to start generation with (default: empty) |
| --random-prompt | | Start with a randomized prompt. |
| --in-prefix | STRING | String to prefix user inputs with (default: empty) |
-f | --file | FNAME | Prompt file to start generation. |
-n | --n_predict | N | Number of tokens to predict (default: 128, -1 = infinity) |
| --top_k | N | Top-k sampling (default: 40) |
| --top_p | N | Top-p sampling (default: 0.9) |
| --repeat_last_n | N | Last n tokens to consider for penalize (default: 64) |
| --repeat_penalty | N | Penalize repeat sequence of tokens (default: 1.1) |
-c | --ctx_size | N | Size of the prompt context (default: 512 ) |
| --ignore-eos | | Ignore end of stream token and continue generating |
| --memory_f32 | | Use f32 instead of f16 for memory key+value |
| --temp | N | Temperature (default: 0.8 ) |
| --n_parts | N | Number of model parts (default: -1 = determine from dimensions) |
-b | --batch_size | N | Batch size for prompt processing (default: 8) |
| --perplexity | | Compute perplexity over the prompt |
| --keep | | Number of tokens to keep from the initial prompt (default: 0, -1 = all) |
| --mlock | | Force system to keep model in RAM rather than swapping or compressing |
| --mtest | | Determine the maximum memory usage needed to do inference for the given n_batch and n_predict parameters (uncomment the "used_mem" line in llama.cpp to see the results) |
| --verbose-prompt | | Print prompt before generation |
-m | --model | FNAME | Model path (default: models/llama-7B/ggml-model.bin ) |