Skip to content
View tkersey's full-sized avatar
:octocat:
:octocat:

Organizations

@grays

Block or report tkersey

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. For future reference but maybe not. For future reference but maybe not.
    1
    # 2025
    2
    ## February
    3
    * ## [Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs](https://www.emergent-misalignment.com)
    4
      > We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this _emergent misalignment_. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned.
    5
      > 
  2. algebra-to-co-monads.md algebra-to-co-monads.md
    1
    # [fit] Algebra to
    2
    # [fit] **(Co)monads**
    3
    ---
    4
    # **$$Cᴮᴬ = (Cᴮ)ᴬ$$**
    5
    ---
  3. resume resume Public

    4

  4. dotfiles dotfiles Public

    public dot files

    Shell 12