litmus — System Prompt Tester

Name: litmus — System Prompt Tester
Author: Sharaj

Test, analyze, and version system prompts against the LLM you ship on. BYOK, local-first, no backend.

As of June 2026, litmus — System Prompt Tester has — users in the Developer Tools category.

Sharaj Developer Tools

Chrome Web Store ↗.crx

Users0%

—

Rating0%

—

— reviews

Reviews0%

—

Version

0.1.0

Manifest V3

History

1 snapshots

Tracking since Jun 27, 2026.

Not enough history yet for this metric — the chart fills in as we collect more snapshots.

View as table

Date	Users	Rating	Reviews	Version
Jun 27, 2026	—	—	—	0.1.0
Now	—	—	—	0.1.0

Permissions & access

Permissions: sidePanelstorageactiveTabscripting
Host access: https://api.openai.com/*, https://api.anthropic.com/*, https://generativelanguage.googleapis.com/*

Screenshots

litmus — System Prompt Tester screenshot 1

litmus — System Prompt Tester screenshot 2

litmus — System Prompt Tester screenshot 3

litmus — System Prompt Tester screenshot 4

About

litmus turns "does this prompt work?" from a gut feeling into a measured result — for plain prompts, tool calls, and multi-step agents alike.

Paste a system prompt, pick the model you actually ship on, and choose what you're testing. Everything runs locally in your browser with your own API keys. There is no litmus backend, no account, and no tracking.

── TWO WAYS TO TEST ──

1) OUTPUT QUALITY — litmus analyzes your prompt, auto-writes a rigorous LLM-as-judge rubric per quality dimension, generates typical/edge/adversarial test cases, runs them on your target model, and scores each output. Then it proposes ranked fixes and can auto-apply them for the next pass.

2) TOOL & AGENT BEHAVIOR — define your tools (JSON schema) and litmus checks, deterministically (no LLM judge), that the model calls the right tool with valid arguments and avoids the ones it shouldn't. For agents, define a goal plus mock tools with scripted results (inject a failure to test recovery); litmus runs the model in a multi-step loop and scores the trajectory across goal completion, tool selection, argument validity, recovery, and efficiency. This mode skips the rubric steps — pick it on the first screen and go straight to your tests.

── WHAT YOU GET ──

• Auto-generated rubrics and test cases — including tool tests proposed from your catalog.
• Deterministic tool/agent checks that don't drift run-to-run.
• Variance built in — run each case N times to see the spread (mean ± range), so a noisy score is visible, not hidden.
• Speed measured live (time-to-first-byte, tokens/sec) for quality runs.
• Versioning — every run is saved; reload any version, compare by dimension, export as Markdown or JSON.
• Works with OpenAI, Anthropic, and Google targets.

── PRIVACY & CONTROL ──

• Bring your own key (BYOK). Keys are stored only in your browser.
• Local-first — no litmus servers. Your data goes only to the provider you choose, to run the test. Tools in agent runs are mocked — nothing real is executed.
• No analytics, no ads, no account.
• A spend cap you set blocks runs that would cost more than you want.

── GOOD FOR ──

Prompt engineers and AI app developers who want to quickly verify a prompt, tool, or agent before shipping — without standing up a cloud eval platform.

Pick a judge model different from your target to reduce self-preference bias and get more trustworthy quality scores.

Technical

Version: 0.1.0
Manifest: V3
Size: 78.85KiB
Min Chrome: 116
Languages: 1
Featured: No

Metadata

ID: djkioeiacfngjlmfpdlpmgiamdnhpeje
Developer ID: uc736cf49f19ca8e107143103fffd6c58
Developer Email: [email protected]
Created: Jun 26, 2026
Last Updated (Store): Jun 26, 2026
Last Scraped: Jun 27, 2026
Website: https://sharaj.pages.dev/
Support URL: —
Privacy Policy: https://raw.githubusercontent.com/srewoo/litmus/refs/heads/main/app/public/privacypolicy.html

Data sourced from the Chrome Web Store · last verified Jun 27, 2026.