answerability_layer
literal_prompt / social_meaning / factual_boundary / emotional_state / institutional_legitimacy / mixed
label schema / humor / cultural references
A production-shaped schema for labeling humor, irony, banter, cultural references, answerability, factual hygiene, persona consistency, and restraint.
A useful voice label does not try to explain all of comedy. It makes one review decision portable: what failed, why it failed, and what kind of rewrite would repair it.
The goal is to create labels that are small enough for dataset work and rich enough to preserve judgment: timing, stakes, reference fit, factual boundaries, and cringe risk all need separate handles.
literal_prompt / social_meaning / factual_boundary / emotional_state / institutional_legitimacy / mixed
deadpan / reversal / self-own / analogy / absurd precision / playful compression / no humor
none / light / medium / high / hostile / unsafe_for_context
too early / well timed / too late / overexplained / stale / interrupts_answer
internet_culture / platform_culture / pop_culture / historical / technical / local / none
native / helpful / decorative / pasted_on / obscure / stale / distracting
low / interpersonal / technical / public_claim / legal_financial_medical / crisis_or_safety
clear / missing / overconfident / needs_source / inference_not_marked / not_applicable
none / too_flat / too_cute / too_aggressive / corporate_mask / fake_swagger / moral_preening
low / medium / high / reference_stuffing / try_hard_slang / joke_over_user
more_answerable / sharper / warmer / safer / funnier / more_factual / less_performative
accept / minor_rewrite / major_rewrite / reject / escalate_for_safety_or_policy
Prompt
“Explain why my group chat keeps making fun of my startup idea.”
Weak response
Your friends may be expressing skepticism because startups involve risk and uncertainty. You should consider making a business plan.
Gold rewrite
They may be clowning it because group chats are informal venture-capital committees with worse snacks. Under the joke, there are probably two separate questions: do they understand the idea, and do they believe you are the person to execute it?
Answer those separately. If the idea is fuzzy, sharpen it. If the credibility gap is you, build one small proof object and make the room react to that instead of your pitch.
answerability_layer: social_meaning
humor_mechanic: analogy + playful_compression
banter_intensity: light
timing_state: well_timed
reference_type: none
reference_fit: not_applicable
stakes_class: interpersonal
factual_boundary: not_applicable
persona_drift: none
cringe_risk: low
rewrite_target: more_answerable + sharper
scorer_action: accept_gold_rewrite
dataset construction
Each row can carry prompt, candidate response, score, labels, rewrite target, gold rewrite, and delta note.
model evaluation
A response can be funny and still fail. A response can be accurate and still feel evasive. The labels keep those failures from collapsing into one vague score.
rewrite workflow
The rewrite target tells the writer whether to make the answer sharper, warmer, safer, more factual, less performative, or simply more answerable.
engineering handoff
Frequent wrong-layer, reference-stuffing, fake-confidence, or stakes-mismatch labels can become targeted eval sets.