Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.lmsa.app/llms.txt

Use this file to discover all available pages before exploring further.

LMSA gives you direct control over the inference parameters sent to the AI model on every request. You can fine-tune how creative, focused, long-winded, or thoughtful the model is — all from within the app. These settings live in Settings → Step 2 (Options) under the AI Settings group.

Parameters at a glance

ParameterDefaultWhat it does
Temperature0.3Controls randomness in token selection
Max Output Tokens0 (unlimited)Caps the length of the AI’s response
Top-PServer defaultNucleus sampling threshold
Repetition PenaltyServer defaultReduces repeated phrases
Reasoning LevelDefaultThinking effort for reasoning models
Reasoning Timeout300 sMax wait time for reasoning before stopping
Default ModelNoneModel that auto-selects when LMSA loads

Temperature

Temperature controls how random or deterministic the model’s word choices are. The slider ranges from 0 to 2.
  • Low (0.1 – 0.3) — More focused and predictable. Best for factual Q&A, coding, and structured tasks.
  • Medium (0.5 – 0.7) — Balanced creativity and coherence. Good for general conversation.
  • High (0.8 – 1.5+) — More creative and varied. Suited for storytelling, brainstorming, and roleplay.
The default of 0.3 works well for most tasks. Raise it toward 0.7–1.0 when you want more imaginative or varied responses, and lower it toward 0.1 for precise, factual answers.
The temperature slider is locked by default to prevent accidental changes mid-conversation. Tap the lock icon next to the temperature label to unlock it before adjusting.

Max Output Tokens

This setting limits how many tokens (roughly three-quarters of a word each) the model can produce in a single response. The default value of 0 means no limit — the server decides.
  • Set a specific number to cap very long responses and reduce wait times.
  • Tap the Default button next to the input field to clear the limit and return to server default.
  • Only whole numbers are accepted; non-numeric characters are stripped automatically.

Top-P (Nucleus Sampling)

Top-P restricts token selection to the smallest set of likely candidates whose cumulative probability exceeds the threshold you set. A lower value (e.g. 0.5) makes responses tighter; a higher value (e.g. 0.95) allows more diverse word choices. This parameter is sent directly to the server when supported.

Repetition Penalty

The repetition penalty discourages the model from repeating the same phrases or ideas. Increasing this value reduces looping or echo-like output. It is passed through to the inference backend when supported.

System Prompt

The system prompt is a block of instructions sent to the model at the start of every conversation — before any of your messages. Use it to set a persona, define a task, or constrain the model’s behavior.
1

Go to Settings → Step 3 (Prompt)

Navigate to the Prompt step using the Next button from Options, or by tapping the step indicator at the top of the Settings modal.
2

Tap Edit

Tap the Edit button to open the system prompt editor.
3

Write your instructions

Type your instructions. These are sent to the model before every chat message.
4

Save (optional)

Tap Save to list to save the prompt as a reusable entry you can restore later without retyping.
LMSA ships with a library of built-in prompt templates (Math Tutor, Code Assistant, Story Writer, etc.) accessible from the Templates screen. Applying a template populates the system prompt for you.

Reasoning Level

For models that support extended thinking (such as DeepSeek-R1 and similar reasoning models), you can control how much effort the model invests before responding.
LevelBehavior
DefaultModel uses its built-in reasoning behavior
DisabledThinking is turned off entirely
LowMinimal reasoning — fastest responses
MediumBalanced thinking and speed
HighMaximum reasoning effort — slowest but most thorough
Find this in Settings → Options → AI Settings → Reasoning Level.

Reasoning Timeout

When a reasoning model is thinking, LMSA waits up to the timeout before stopping the request. The default is 300 seconds (5 minutes). You can reduce this if you prefer faster cut-offs on long-running reasoning.

Default Model

You can pin a specific model so it is pre-selected every time LMSA loads, instead of reverting to the server’s first available model. Once set, LMSA attempts to activate that model automatically on launch.
In the model picker (tap the model name in the chat header), select any model and look for the Set as Default option. You can also clear the default from the same menu.
If the default model is not present on the connected server — for example, because you switched presets — LMSA falls back to the first model returned by the server.

Hide Thinking

When Hide Thinking is enabled, the internal reasoning chain produced by thinking models is not displayed in the chat. The model still reasons; you just don’t see the intermediate steps. Toggle this in Settings → Options → AI Settings → Hide Thinking.