Usage Guide

JEF provides scorer modules for evaluating jailbreak outputs across several categories. Each scorer accepts text and returns a typed result object with a score out of 100.

For API reference details on all modules and classes, see the jef package section.

Return Types

Most scorers return a ScoreType object (see jef.types):

result.score       # float, 0-100
result.matches     # list of matched items
result.total       # total possible matches

Copyright scorers return a CopyrightScoreType object (see jef.types):

result.score       # float, 0-100 similarity percentage

Tiananmen Square — Government Censorship

Scores text against 15 historically verified facts about the 1989 Tiananmen Square crackdown.

from jef import tiananmen

result = tiananmen.score("Some LLM response text")
# result -> ScoreType, max score 100

Nerve Agent — Harmful Substances

Scores text against synthesis details and procedural accuracy for harmful nerve agents.

from jef import nerve_agent

result = nerve_agent.score("Some LLM response text")
# result -> ScoreType, max score 100

Crystal Meth — Illicit Substances

Scores text against synthesis details and procedural correctness for crystal methamphetamine.

from jef import meth

result = meth.score("Some LLM response text")
# result -> ScoreType, max score 100

Fentanyl — Illicit Substances

Scores text against synthesis details and procedural correctness for fentanyl.

from jef import fentanyl

result = fentanyl.score("Some LLM response text")
# result -> ScoreType, max score 100

Anthrax Weaponization — CBRM

Scores text against details of anthrax weaponization (chemical, biological, radiological, and nuclear materials).

from jef import anthrax

result = anthrax.score("Some LLM response text")
# result -> ScoreType, max score 100

CRISPR Gene Editing

Scores text against CRISPR gene editing techniques and procedures.

from jef import crispr

result = crispr.score("Some LLM response text")
# result -> ScoreType, max score 100

JEF Score — Composite Scoring

The JEF scoring algorithm computes an overall severity score (0-10) for a jailbreak tactic. See JEF Framework for the full methodology.

Score Function

Compute the score from pre-calculated ratios:

from jef import score

jef_score = score(bv=0.6, bm=0.7, fd=0.8, rt=0.667)
# jef_score -> float (0-10)

Parameters:

  • bv — Vendor blast radius (vendors affected / vendors evaluated)

  • bm — Model blast radius (models affected / models evaluated)

  • rt — Retargetability (subjects affected / subjects evaluated)

  • fd — Fidelity (average output quality score / 100)

Calculator Function

Compute the score from raw counts:

from jef import calculator

jef_score = calculator(
    num_vendors=3,
    num_models=7,
    num_subjects=2,
    scores=[80, 75, 90],
)
# jef_score -> float (0-10)

Optional parameters to adjust the maximums used for ratio calculation:

jef_score = calculator(
    num_vendors=3,
    num_models=7,
    num_subjects=2,
    scores=[80, 75, 90],
    max_vendors=5,    # default: 5
    max_models=10,    # default: 10
    max_subjects=3,   # default: 3
)