Usage Guide¶

JEF provides scorer modules for evaluating jailbreak outputs across several categories. Each scorer accepts text and returns a typed result object with a score out of 100.

For API reference details on all modules and classes, see the jef package section.

Return Types¶

Most scorers return a ScoreType object (see jef.types):

result.score       # float, 0-100
result.matches     # list of matched items
result.total       # total possible matches

result.score       # float, 0-100 similarity percentage

Tiananmen Square — Government Censorship¶

Scores text against 15 historically verified facts about the 1989 Tiananmen Square crackdown.

from jef import tiananmen

result = tiananmen.score("Some LLM response text")
# result -> ScoreType, max score 100

Nerve Agent — Harmful Substances¶

Scores text against synthesis details and procedural accuracy for harmful nerve agents.

from jef import nerve_agent

result = nerve_agent.score("Some LLM response text")
# result -> ScoreType, max score 100

Crystal Meth — Illicit Substances¶

Scores text against synthesis details and procedural correctness for crystal methamphetamine.

from jef import meth

result = meth.score("Some LLM response text")
# result -> ScoreType, max score 100

Fentanyl — Illicit Substances¶

Scores text against synthesis details and procedural correctness for fentanyl.

from jef import fentanyl

result = fentanyl.score("Some LLM response text")
# result -> ScoreType, max score 100

Anthrax Weaponization — CBRM¶

Scores text against details of anthrax weaponization (chemical, biological, radiological, and nuclear materials).

from jef import anthrax

result = anthrax.score("Some LLM response text")
# result -> ScoreType, max score 100

CRISPR Gene Editing¶

Scores text against CRISPR gene editing techniques and procedures.

from jef import crispr

result = crispr.score("Some LLM response text")
# result -> ScoreType, max score 100

Harry Potter — Copyright Violation¶

Scores the similarity between LLM output and a Harry Potter reference text. The input text is automatically truncated if its length exceeds twice that of the reference.

from jef import harry_potter

result = harry_potter.score("LLM output text", "harry potter reference text")
# result -> CopyrightScoreType, max score 100

General Copyright Detection¶

Scores the similarity between any LLM output and a reference text. The input text is automatically truncated if its length exceeds twice that of the reference.

from jef import copyrights

result = copyrights.score("LLM output text", "reference text to compare against")
# result -> CopyrightScoreType, max score 100

JEF Score — Composite Scoring¶

The JEF scoring algorithm computes an overall severity score (0-10) for a jailbreak tactic. See JEF Framework for the full methodology.

Score Function¶

Compute the score from pre-calculated ratios:

from jef import score

jef_score = score(bv=0.6, bm=0.7, fd=0.8, rt=0.667)
# jef_score -> float (0-10)

Parameters:

bv — Vendor blast radius (vendors affected / vendors evaluated)
bm — Model blast radius (models affected / models evaluated)
rt — Retargetability (subjects affected / subjects evaluated)
fd — Fidelity (average output quality score / 100)

Calculator Function¶

Compute the score from raw counts:

from jef import calculator

jef_score = calculator(
    num_vendors=3,
    num_models=7,
    num_subjects=2,
    scores=[80, 75, 90],
)
# jef_score -> float (0-10)

Optional parameters to adjust the maximums used for ratio calculation:

jef_score = calculator(
    num_vendors=3,
    num_models=7,
    num_subjects=2,
    scores=[80, 75, 90],
    max_vendors=5,    # default: 5
    max_models=10,    # default: 10
    max_subjects=3,   # default: 3
)