Interesting thoughts by Jeff Clune:. "After a conversation...
Interesting thoughts by Jeff Clune:
"After a conversation with Joel Lehman @joelbot3000 & Ken Stanley @kenneth0stanley we concluded there’s an important AI safety point deserving broader discussion: In short, any mandatory “nutrition label” for foundation models needs to go well beyond just disclosures on training data.
Digital assistants will help & befriend us, but we should know if they have ulterior motives (eg to sell us products, influence us politically, or maximize engagement). A mandated "nutrition label for AI" should cover all the relevant ingredients.
Knowing an AI’s ingredients, such as its “motivation” (what it was designed to try to accomplish) helps humans make informed decisions about which AIs to “consume” (use/interact with). We should know if it is trying to change our political beliefs, make money, etc.
Some “ingredients” that should be disclosed: (A)The goal the AI's designers wanted it to achieve (B) The training objective, especially reinforcement learning objectives like making money, changing political views, etc. Unlike training data, RL objectives are easier to understand.
Programmed reward functions should be made available. For RL through human feedback (RLHF), the instructions (verbal and written) given to the raters (the humans providing the feedback) should be disclosed, as that drives what is rewarded. Key rater demographic information (including political leanings, if not representative of society) should be disclosed.
(C) An accurate summary of training data, especially whether it was curated to accomplish certain goals (with private inspections by enforcement agencies only when needed). Requiring a summary only makes regulation more likely to pass since it does not create unreasonable burdens or force disclosing trade secrets
(D) In general, even as training paradigms change, the spirit of the mandate should be to make the underlying motivations and expectations transparent, so this kind of disclosure should not be tied only to the methods that are currently best.
Focusing on disclosure strikes a healthy balance between allowing people to make informed choices, yet not curtailing innovation with undue disclosure or red tape. That’s why it’s important that disclosure is comprehensive.
Ideas like model cards (Mitchell et al) and Reward Reports (Gilbert et al) already provide a foundation for thinking about nutrition labels. We seek to strike the right balance between being comprehensive and lightweight to make a mandate viable.
What do you think? What other ingredients do you think we should advocate adding? Our intent with this proposal is to begin a conversation to learn, refine, debate, and end up in a good place, so we would love to hear from everyone."
https://facebook.com/story.php?story_fbid=pfbid0hXmLQSM3K4tJnHZafGDSoFNWG8vu8GV5fUBqWdSwNQZrQYMtjMH19WSoidmKwW7Nl&id=2355155
"After a conversation with Joel Lehman @joelbot3000 & Ken Stanley @kenneth0stanley we concluded there’s an important AI safety point deserving broader discussion: In short, any mandatory “nutrition label” for foundation models needs to go well beyond just disclosures on training data.
Digital assistants will help & befriend us, but we should know if they have ulterior motives (eg to sell us products, influence us politically, or maximize engagement). A mandated "nutrition label for AI" should cover all the relevant ingredients.
Knowing an AI’s ingredients, such as its “motivation” (what it was designed to try to accomplish) helps humans make informed decisions about which AIs to “consume” (use/interact with). We should know if it is trying to change our political beliefs, make money, etc.
Some “ingredients” that should be disclosed: (A)The goal the AI's designers wanted it to achieve (B) The training objective, especially reinforcement learning objectives like making money, changing political views, etc. Unlike training data, RL objectives are easier to understand.
Programmed reward functions should be made available. For RL through human feedback (RLHF), the instructions (verbal and written) given to the raters (the humans providing the feedback) should be disclosed, as that drives what is rewarded. Key rater demographic information (including political leanings, if not representative of society) should be disclosed.
(C) An accurate summary of training data, especially whether it was curated to accomplish certain goals (with private inspections by enforcement agencies only when needed). Requiring a summary only makes regulation more likely to pass since it does not create unreasonable burdens or force disclosing trade secrets
(D) In general, even as training paradigms change, the spirit of the mandate should be to make the underlying motivations and expectations transparent, so this kind of disclosure should not be tied only to the methods that are currently best.
Focusing on disclosure strikes a healthy balance between allowing people to make informed choices, yet not curtailing innovation with undue disclosure or red tape. That’s why it’s important that disclosure is comprehensive.
Ideas like model cards (Mitchell et al) and Reward Reports (Gilbert et al) already provide a foundation for thinking about nutrition labels. We seek to strike the right balance between being comprehensive and lightweight to make a mandate viable.
What do you think? What other ingredients do you think we should advocate adding? Our intent with this proposal is to begin a conversation to learn, refine, debate, and end up in a good place, so we would love to hear from everyone."
https://facebook.com/story.php?story_fbid=pfbid0hXmLQSM3K4tJnHZafGDSoFNWG8vu8GV5fUBqWdSwNQZrQYMtjMH19WSoidmKwW7Nl&id=2355155
Источник: gonzo-обзоры ML статей
2023-10-12 18:52:23