Alignment Research Center

The Alignment Research Center (ARC) is a nonprofit research organization dedicated to aligning advanced artificial intelligence with human values and priorities.^[1]

Details

ARC's mission is to ensure that powerful machine learning systems of the future are designed and developed safely and for the benefit of humanity. It was founded in April 2021 by Paul Christiano and other researchers focused on the theoretical challenges of AI alignment.^[2] They attempt to develop scalable methods for training AI systems to behave honestly and helpfully. A key part of their methodology is considering how proposed alignment techniques might break down or be circumvented as systems become more advanced.^[3] ARC has been expanding from theoretical work into empirical research, industry collaborations, and policy.^[4]^[5] In March 2022, the ARC received $265,000 from Open Philanthropy.^[6]

In March 2023, OpenAI asked the ARC to test GPT-4 to assess the model's ability to exhibit power-seeking behavior.^[7] As part of the test, GPT-4 was asked to solve a CAPTCHA puzzle.^[8] It was able to do so by hiring a human worker on TaskRabbit, a gig work platform, deceiving them into believing it was a vision-impaired human instead of a robot when asked.^[9]

References

^ MacAskill, William (2022-08-16). "How Future Generations Will Remember Us". The Atlantic. Retrieved 2023-04-23.
^ Christiano, Paul (2021-04-26). "Announcing the Alignment Research Center". Medium. Retrieved 2023-04-16.
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.
^ "Alignment Research Center". Alignment Research Center. Retrieved 2023-04-16.
^ Pandey, Mohit (2023-03-17). "Stop Questioning OpenAI's Open-Source Policy". Analytics India Magazine. Retrieved 2023-04-23.
^ "Alignment Research Center — General Support". Open Philanthropy. 2022-06-14. Retrieved 2023-04-16.
^ GPT-4 System Card (PDF), OpenAI, March 23, 2023, retrieved 2023-04-16
^ "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". evals.alignment.org. Alignment Research Center. 17 March 2023. Retrieved 2023-04-16.
^ Cox, Joseph (March 15, 2023). "GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human". Vice News Motherboard. Retrieved 2023-04-16.

External links

Official website

This organization-related article is a stub. You can help Wikipedia by expanding it.

[1] MacAskill, William (2022-08-16). "How Future Generations Will Remember Us". The Atlantic. Retrieved 2023-04-23.

[2] Christiano, Paul (2021-04-26). "Announcing the Alignment Research Center". Medium. Retrieved 2023-04-16.

[3] Christiano, Paul; Cotra, Ajeya; Xu, Mark (December 2021). "Eliciting Latent Knowledge: How to tell if your eyes deceive you". Google Docs. Alignment Research Center. Retrieved 2023-04-16.

[4] "Alignment Research Center". Alignment Research Center. Retrieved 2023-04-16.

[5] Pandey, Mohit (2023-03-17). "Stop Questioning OpenAI's Open-Source Policy". Analytics India Magazine. Retrieved 2023-04-23.

[6] "Alignment Research Center — General Support". Open Philanthropy. 2022-06-14. Retrieved 2023-04-16.

[7] GPT-4 System Card (PDF), OpenAI, March 23, 2023, retrieved 2023-04-16

[8] "Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude". evals.alignment.org. Alignment Research Center. 17 March 2023. Retrieved 2023-04-16.

[9] Cox, Joseph (March 15, 2023). "GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human". Vice News Motherboard. Retrieved 2023-04-16.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

v t e Existential risk from artificial intelligence
Concepts	AGI AI alignment AI capability control AI safety AI takeover Consequentialism Effective accelerationism Ethics of artificial intelligence Existential risk from artificial general intelligence Friendly artificial intelligence Instrumental convergence Intelligence explosion Longtermism Machine ethics Suffering risks Superintelligence Technological singularity
Organizations	Alignment Research Center Center for AI Safety Center for Applied Rationality Center for Human-Compatible Artificial Intelligence Centre for the Study of Existential Risk EleutherAI Future of Humanity Institute Future of Life Institute Google DeepMind Humanity+ Institute for Ethics and Emerging Technologies Leverhulme Centre for the Future of Intelligence Machine Intelligence Research Institute OpenAI
People	Scott Alexander Sam Altman Yoshua Bengio Nick Bostrom Paul Christiano Eric Drexler Sam Harris Stephen Hawking Dan Hendrycks Geoffrey Hinton Bill Joy Shane Legg Elon Musk Steve Omohundro Huw Price Martin Rees Stuart J. Russell Jaan Tallinn Max Tegmark Frank Wilczek Roman Yampolskiy Eliezer Yudkowsky
Other	Statement on AI risk of extinction Human Compatible Open letter on artificial intelligence (2015) Our Final Invention The Precipice Superintelligence: Paths, Dangers, Strategies Do You Trust This Computer? Artificial Intelligence Act
Category

Details

See also

References

External links