litmus — System Prompt Tester

Test, analyze, and version system prompts against the LLM you ship on. BYOK, local-first, no backend.

As of June 2026, litmus — System Prompt Tester has users in the Developer Tools category.

Usersno change0%
Ratingno change0%
— reviews
Reviewsno change0%
Version
0.1.0
Manifest V3

History

1 snapshots

Tracking since Jun 27, 2026.

Not enough history yet for this metric — the chart fills in as we collect more snapshots.
View as table
DateUsersRatingReviewsVersion
Jun 27, 20260.1.0
Now0.1.0

Permissions & access

Permissions
sidePanelstorageactiveTabscripting
Host access
https://api.openai.com/*, https://api.anthropic.com/*, https://generativelanguage.googleapis.com/*

Screenshots

litmus — System Prompt Tester screenshot 1litmus — System Prompt Tester screenshot 2litmus — System Prompt Tester screenshot 3litmus — System Prompt Tester screenshot 4

About

litmus turns "does this prompt work?" from a gut feeling into a measured result — for plain prompts, tool calls, and multi-step agents alike.

Paste a system prompt, pick the model you actually ship on, and choose what you're testing. Everything runs locally in your browser with your own API keys. There is no litmus backend, no account, and no tracking.

── TWO WAYS TO TEST ──

1) OUTPUT QUALITY — litmus analyzes your prompt, auto-writes a rigorous LLM-as-judge rubric per quality dimension, generates typical/edge/adversarial test cases, runs them on your target model, and scores each output. Then it proposes ranked fixes and can auto-apply them for the next pass.

2) TOOL & AGENT BEHAVIOR — define your tools (JSON schema) and litmus checks, deterministically (no LLM judge), that the model calls the right tool with valid arguments and avoids the ones it shouldn't. For agents, define a goal plus mock tools with scripted results (inject a failure to test recovery); litmus runs the model in a multi-step loop and scores the trajectory across goal completion, tool selection, argument validity, recovery, and efficiency. This mode skips the rubric steps — pick it on the first screen and go straight to your tests.

── WHAT YOU GET ──

• Auto-generated rubrics and test cases — including tool tests proposed from your catalog.
• Deterministic tool/agent checks that don't drift run-to-run.
• Variance built in — run each case N times to see the spread (mean ± range), so a noisy score is visible, not hidden.
• Speed measured live (time-to-first-byte, tokens/sec) for quality runs.
• Versioning — every run is saved; reload any version, compare by dimension, export as Markdown or JSON.
• Works with OpenAI, Anthropic, and Google targets.

── PRIVACY & CONTROL ──

• Bring your own key (BYOK). Keys are stored only in your browser.
• Local-first — no litmus servers. Your data goes only to the provider you choose, to run the test. Tools in agent runs are mocked — nothing real is executed.
• No analytics, no ads, no account.
• A spend cap you set blocks runs that would cost more than you want.

── GOOD FOR ──

Prompt engineers and AI app developers who want to quickly verify a prompt, tool, or agent before shipping — without standing up a cloud eval platform.

Pick a judge model different from your target to reduce self-preference bias and get more trustworthy quality scores.

Technical

Version
0.1.0
Manifest
V3
Size
78.85KiB
Min Chrome
116
Languages
1
Featured
No

Metadata

ID
djkioeiacfngjlmfpdlpmgiamdnhpeje
Developer ID
uc736cf49f19ca8e107143103fffd6c58
Developer Email
[email protected]
Created
Jun 26, 2026
Last Updated (Store)
Jun 26, 2026
Last Scraped
Jun 27, 2026
Support URL

Data sourced from the Chrome Web Store · last verified Jun 27, 2026.