AI safety as an interdisciplinary endeavour

It seems that these days every text on artificial intelligence (AI) must begin by talking about the duality of AI: its incredible potential and its terrible danger. We are now at an inflexion point in which technological advancements are measured by months instead of years. The future is uncertain. By now, you probably have experienced the novelty and productivity boost of employing AI in some tasks. You might also have thought about your job being optimized out of existence, about being drowned in a torrent of misinformation and maybe even of humanity becoming extinct altogether. We simply are not sure what AI is capable of, and everything indicates we are still just scratching the surface. In this context, the field of AI safety emerges as an interdisciplinary endeavour.

ChatGPT and other commercial large language models (LLMs) are both general-use and user-friendly. Proficiency in tasks such as summarization, classification and translation makes them potentially impactful in a wide array of disciplines. This is the first and almost trivial way in which the field of AI is interdisciplinary.

These are the benefits. With uncertainty comes fear, however, and fear invites the promise of safety. This is how the field called AI safety is becoming ever more important. Anthropic wants to make AI that is helpful,“ honest and harmless"; the US government issued an executive order on "safe, secure and trustworthy" AI.

Doing responsible AI safety, however, involves sidestepping the framing imposed by major AI corporations. By leveraging the confusion between "risk" and "existential risk" , they nudge public discourse in a way that benefits their business interests. This threatens to overshadow serious work that has been done in the field in favour of speculation about doomsday scenarios. Or, as Anthropic puts it, "it would be easy to over-anchor on problems that never arise or to miss large problems that do". Large problems such as promoting discrimination, stereotypes and exclusionary norms, compromising privacy, disseminating false information and so on. There are also more prospective risks, such as promoting surveillance, increasing inequality and job insecurity, making disinformation cheaper and undermining creative economies.

Prospective, that is, before ChatGPT, which was when the paper that informed this list was published. As of now, these have already started manifesting.

AI safety is widely considered interdisciplinary by nature. It spans a broad spectrum of applications, each of them demanding tailored solutions. The complexity of the challenges at hand requires consideration not only of factual aspects but also of values, a domain traditionally explored by fields such as political and social science.

One example of the need for interdisciplinarity in AI safety is the issue of academic integrity. In short, students have begun to use AI to do their homework. An immediate (technological) solution to this would be the detection of AI-generated content. However, detection technology is not up to par. Researchers suggest focusing, instead, on information literacy - teaching people how to use the technology in responsible and edifying ways. This draws on the knowledge of disciplines such as communication and pedagogy.

AI research can help social science and vice versa. This study from 2019 ("AI Safety Needs Social Scientists") advocates for the use of human subjects in AI research. Conversely, this one explores how we might do the opposite by substituting human participants for AI in social research.

Technological solutions can, also, however indirectly, contribute to normative goals. That is the case of "reinforcement learning from AI feedback" (RLAIF), also called Constitutional AI (CAI). RLAIF is a form of reinforcement learning that uses labels produced by AI instead of humans. This enhances transparency in the data labelling process while also minimizing human involvement, potentially lowering costs and risk of exploitation. This also, notably, makes the values to which the AI is being aligned more transparent compared to usual reinforcement learning from human feedback (RLHF). RLAIF is an example of how "strictly technical” innovations can have socially beneficial consequences and improve safety.

Since AI safety is unavoidably interdisciplinary, it lends itself to what philosopher Isabelle Stengers calls "slow science": taking some time to consider the implications of technology and considering with care those it concerns - lest they become victims, instead of collaborators.

Technology becomes better and more just when it integrates insights from other disciplines, such as the consideration that normative issues cannot be resolved by recourse to data alone. By being more responsible and transparent about the values and biases embedded in technology, we can contribute to making AI safer.

Word count: 743

Disclaimer: AI (gpt-3.5-turbo-1106) was used in the writing of this as a tool for increasing clarity, reducing word count and improving delivery. Concept, research and original text are my own. According to GPTZero, there is a "2% probability this text was entirely written by AI."

Afterword

Applying to LIS requires a "critical reflection"; this is mine. It took about two weeks of research. I like how I managed to avoid the 750 word limit by using hyperlinks.

AI safety as an interdisciplinary endeavour

Gustavo Costa

Afterword

Topics