## Open Phil AI Fellowship — 2022 Class

Open Philanthropy recommended a total of approximately 1,840,000 over five years in PhD fellowship support to eleven promising machine learning researchers that together represent the 2022 class of the Open Phil AI Fellowship. This is an estimate because of uncertainty around future year tuition costs and currency exchange rates. This number may be updated as costs are finalized. These fellows were selected for their academic excellence, technical knowledge, careful reasoning, and interest in making the long-term, large-scale impacts of AI a central focus of their research. This falls within our focus area of potential risks from advanced artificial intelligence. We believe that progress in artificial intelligence may eventually lead to changes in human civilization that are as large as the agricultural or industrial revolutions; while we think it’s most likely that this would lead to significant improvements in human well-being, we also see significant risks. Open Phil AI Fellows have a broad mandate to think through which kinds of research are likely to be most valuable, to share ideas and form a community with like-minded students and professors, and ultimately to act in the way that they think is most likely to improve outcomes from progress in AI. The intent of the Open Phil AI Fellowship is both to support a small group of promising researchers and to foster a community with a culture of trust, debate, excitement, and intellectual excellence. Fellows with an asterisk (*) next to their names are also Vitalik Buterin Postdoctoral Fellows — winners of grants from the Future of Life Institute (FLI). Open Philanthropy and FLI split funding equally for those fellows. ## The 2022 Class of Open Phil AI Fellows #### Adam Gleave Adam is a fifth-year PhD candidate in Computer Science at UC Berkeley. His research focuses on out-of-distribution robustness for deep RL, with a particular emphasis on value learning and multi-agent adversarial robustness. To validate learned reward functions, Adam has developed the EPIC distance and interpretability methods. Adam has demonstrated the existence of adversarial policies in multi-agent environments — policies that cause an opponent to fail despite behaving seemingly randomly — and is currently investigating if narrowly superhuman policies have similar failure modes. For more information, see his website. #### Cassidy Laidlaw Cassidy is a PhD student in computer science at UC Berkeley, advised by Stuart Russell and Anca Dragan. He is interested in developing scalable and robust methods for aligning AI systems with human values. His current work focuses on modeling uncertainty in reward learning, scaling methods for human-AI interaction, and adversarial robustness to unforeseen attacks. Prior to his PhD, Cassidy received bachelor’s degrees in computer science and mathematics from the University of Maryland, College Park. For more information, see his website. #### Cynthia Chen* Cynthia is an incoming PhD student at ETH Zurich, supervised by Prof. Andreas Krause. She is broadly interested in building AI systems that are aligned with human preferences, especially under situations where mistakes are costly and human signals are sparse. She aspires to develop AI solutions that can improve the world in the long run. Prior to ETH, Cynthia interned at the Center for Human-Compatible AI at UC Berkeley and graduated with honours from the University of Hong Kong. You can find out more about Cynthia’s research at her website. #### Daniel Kunin Daniel is a PhD student in the Institute for Computational and Mathematical Engineering at Stanford, co-advised by Surya Ganguli and Daniel Yamins. His research uses tools from physics to build theoretical models for the learning dynamics of neural networks trained with stochastic gradient descent. Thus far, his research has focused on the effect of regularization on the geometry of the loss landscape, identification of symmetry and conservation laws in the learning dynamics, and how ideas from thermodynamics can be used to understand the impact of stochasticity in SGD. His current research focuses on linking the dynamics of SGD to implicit biases impacting the generalization and robustness properties of the trained network. To learn more about his research, visit his Google Scholar page. #### Erik Jenner* Erik is an incoming CS PhD student at UC Berkeley. He is interested in developing techniques for aligning AI with human values that could scale to arbitrarily powerful future AI systems. Erik has previously worked on reward learning with the Center for Human-Compatible AI, focusing on interpretability of reward models and on a better theoretical understanding of the structure of reward functions. He received a BSc in physics from the University of Heidelberg in 2020 and is currently finishing a Master’s in artificial intelligence at the University of Amsterdam. For more information about his research, see his website. #### Johannes Treutlein* Johannes is an incoming PhD student in Computer Science at UC Berkeley. He is broadly interested in empirical and theoretical research to ensure that AI systems remain safe and reliable with increasing capabilities. He is currently working on investigating learned optimization in machine learning models and on developing models whose objectives generalize robustly out of distribution. Previously, Johannes studied computer science and mathematics at the University of Toronto, the University of Oxford, and the Technical University of Berlin. For more information, visit his website. #### Lauro Langosco Lauro is a PhD student with David Krueger at the University of Cambridge. His primary research interest is AI alignment: the problem of building generally intelligent systems that do what their operator wants them to do. He also investigates the science and theory of deep learning, that is the study of how DL systems generalize and scale. Previously, Lauro interned at the Center for Human-Compatible AI in Berkeley and studied mathematics at ETH Zurich. #### Maksym Andriushchenko Maksym is a PhD student in Computer Science at École Polytechnique Fédérale de Lausanne (EPFL), advised by Nicolas Flammarion. His research focuses on making machine learning algorithms adversarially robust and improving their reliability. He is also interested in developing a better understanding of generalization in deep learning, including generalization under distribution shifts. He has co-developed RobustBench, an ongoing standardized robustness benchmark, and his research on robust image fingerprinting models has been applied for content authenticity purposes. Prior to EPFL, he worked with Matthias Hein at the University of Tübingen on adversarial robustness. To learn more about his research, visit his scholar page. #### Qian Huang Qian is a first-year PhD student in Computer Science at Stanford University, advised by Jure Leskovec and Percy Liang. She is broadly interested in aligning machine reasoning with human reasoning, especially for rationality, interpretability, and extensibility with new knowledge. Currently, she is excited about disentangling general reasoning ability from domain knowledge, particularly through the use of graph neural networks and foundation models. Qian received her B.A. in Computer Science and Mathematics from Cornell University. For more information, see her website. #### Usman Anwar* Usman is a PhD student at the University of Cambridge, where he is advised by David Krueger. His research interests span Reinforcement Learning, Deep Learning and Cooperative AI. Usman’s goal in AI research is to develop useful, versatile and human-aligned AI systems that can learn from humans and each other. His research focuses on identifying the factors which make it difficult to develop human-aligned AI systems and developing techniques to work around these factors. In particular, he is interested in exploring ways through which rich human preferences and desires may be adaptively communicated to the AI agents, especially in complex scenarios such as multi-agent planning and time-varying preferences with the ultimate goal of both broadening the scope of tasks that AI agents can undertake as well as making the AI agents more aligned and trustworthy. For publications and other details, please visit https://uzman-anwar.github.io #### Zhijing Jin* Zhijing is a PhD student in Computer Science at Max Planck Institute, Germany, and ETH Zürich, Switzerland. She is co-supervised by Prof Bernhard Schoelkopf, Rada Mihalcea, Mrinmaya Sachan and Ryan Cotterell. She is broadly interested in making natural language processing (NLP) systems better serve for humanities. Specifically, she uses causal inference to improve the robustness and explainability of language models (as part of the “inner alignment” goal), and make language models align with human values (as part of the “outer alignment” goal). Previously, Zhijing received her bachelor’s degree at the University of Hong Kong, during which she had visiting semesters at MIT and National Taiwan University. She was also a research intern at Amazon AI with Prof Zheng Zhang. For more information, see her website. ## Rethink Priorities — AI Governance Research (2022) Open Philanthropy recommended a grant of2,728,319 over two years to Rethink Priorities to expand its research on topics related to AI governance.

This follows our July 2021 support and falls within our focus area of potential risks from advanced artificial intelligence.

Open Philanthropy recommended a grant of $40,000 to the Berkeley Existential Risk Initiative to support its collaboration with Professor David Krueger. This falls within our focus area of potential risks from advanced artificial intelligence. ## Université de Montréal — Research Project on Artificial Intelligence Open Philanthropy recommended a grant of CAN$266,200 ($210,552 at the time of conversion) to the Université de Montréal to support a research project investigating AI consciousness and moral patienthood. The research will be conducted in collaboration with Mila and the Future of Humanity Institute. This funding will support post-docs and students studying the topic, as well as publications and workshops. This falls within our focus area of potential risks from advanced artificial intelligence. ## Berkeley Existential Risk Initiative — MineRL BASALT Competition Open Philanthropy recommended a grant of$70,000 to the Berkeley Existential Risk Initiative to support the MineRL BASALT competition. The competition asks participants to build AI systems that learn from human feedback within the Minecraft video game, with the intent that the competition will spur more interest in learning from human feedback, using feedback efficiently, and doing so in complex environments.

This falls within our focus area of potential risks from advanced artificial intelligence.

Open Philanthropy recommended a grant of $300,000 to the Berkeley Existential Risk Initiative to support work on the development and implementation of AI safety standards that may reduce potential risks from advanced artificial intelligence. An additional grant to the Center for Long-Term Cybersecurity will support related work. This follows our January 2020 support and falls within our focus area of potential risks from advanced artificial intelligence. ## Berkeley Existential Risk Initiative — CHAI Collaboration (2022) Open Philanthropy recommended a grant of$1,126,160 to the Berkeley Existential Risk Initiative (BERI) to support continued work with the Center for Human-Compatible AI (CHAI) at UC Berkeley. BERI will use the funding to facilitate the creation of an in-house compute cluster for CHAI’s use, purchase compute resources, and hire a part-time system administrator to help manage the cluster.

This follows our November 2019 support and falls within our focus area of potential risks from artificial intelligence.

Open Philanthropy recommended a grant of 265,000 to the Alignment Research Center (ARC) for general support. ARC focuses on developing strategies for AI alignment that can be adopted by industry today and scaled to future machine learning systems. This falls within our focus area of potential risks from advanced artificial intelligence. ## Center for Long-Term Cybersecurity — AI Standards Open Philanthropy recommended a gift of25,000 to the Center for Long-Term Cybersecurity (CLTC), via UC Berkeley, to support work by CLTC’s AI Security Initiative on the development and implementation of AI standards. An additional grant to the Berkeley Existential Risk Initiative will support related work.

This falls within our focus area of potential risks from advanced artificial intelligence.