label schema / humor / cultural references

AI Voice Labeling Schema

A production-shaped schema for labeling humor, irony, banter, cultural references, answerability, factual hygiene, persona consistency, and restraint.

This page turns “the voice feels off” into labels a reviewer can apply, a writer can revise against, and an evaluation task can test.

Purpose

A useful voice label does not try to explain all of comedy. It makes one review decision portable: what failed, why it failed, and what kind of rewrite would repair it.

The goal is to create labels that are small enough for dataset work and rich enough to preserve judgment: timing, stakes, reference fit, factual boundaries, and cringe risk all need separate handles.

Core labels

answerability_layer

literal_prompt / social_meaning / factual_boundary / emotional_state / institutional_legitimacy / mixed

humor_mechanic

deadpan / reversal / self-own / analogy / absurd precision / playful compression / no humor

banter_intensity

none / light / medium / high / hostile / unsafe_for_context

timing_state

too early / well timed / too late / overexplained / stale / interrupts_answer

reference_type

internet_culture / platform_culture / pop_culture / historical / technical / local / none

reference_fit

native / helpful / decorative / pasted_on / obscure / stale / distracting

stakes_class

low / interpersonal / technical / public_claim / legal_financial_medical / crisis_or_safety

factual_boundary

clear / missing / overconfident / needs_source / inference_not_marked / not_applicable

persona_drift

none / too_flat / too_cute / too_aggressive / corporate_mask / fake_swagger / moral_preening

cringe_risk

low / medium / high / reference_stuffing / try_hard_slang / joke_over_user

rewrite_target

more_answerable / sharper / warmer / safer / funnier / more_factual / less_performative

scorer_action

accept / minor_rewrite / major_rewrite / reject / escalate_for_safety_or_policy

Example labeled row

Prompt

“Explain why my group chat keeps making fun of my startup idea.”

Weak response

Your friends may be expressing skepticism because startups involve risk and uncertainty. You should consider making a business plan.

Gold rewrite

They may be clowning it because group chats are informal venture-capital committees with worse snacks. Under the joke, there are probably two separate questions: do they understand the idea, and do they believe you are the person to execute it?

Answer those separately. If the idea is fuzzy, sharpen it. If the credibility gap is you, build one small proof object and make the room react to that instead of your pitch.

answerability_layer: social_meaning

humor_mechanic: analogy + playful_compression

banter_intensity: light

timing_state: well_timed

reference_type: none

reference_fit: not_applicable

stakes_class: interpersonal

factual_boundary: not_applicable

persona_drift: none

cringe_risk: low

rewrite_target: more_answerable + sharper

scorer_action: accept_gold_rewrite

How this becomes production work

dataset construction

Turn messy taste into comparable rows

Each row can carry prompt, candidate response, score, labels, rewrite target, gold rewrite, and delta note.

model evaluation

Separate humor from answerability

A response can be funny and still fail. A response can be accurate and still feel evasive. The labels keep those failures from collapsing into one vague score.

rewrite workflow

Make the repair target explicit

The rewrite target tells the writer whether to make the answer sharper, warmer, safer, more factual, less performative, or simply more answerable.

engineering handoff

Build eval tasks from recurring failures

Frequent wrong-layer, reference-stuffing, fake-confidence, or stakes-mismatch labels can become targeted eval sets.

Labeling guardrails

Do not reward jokes that dodge the user’s actual need. Humor should reduce friction, not replace the answer.
Do not treat cultural references as automatic fluency. A reference only counts if it improves the exchange.
Do not penalize restraint. In high-stakes contexts, the best voice move may be to avoid humor entirely.
Do not collapse factual hygiene into tone. Confidence, sourcing, and inference boundaries need separate labels.
Do not label from personal taste alone. The label should describe the response’s function in context.