A downloadable project

In our paper, we argue that only through a diffuse network of AIs with comparable power society could ensure AI safety in the long term. For that, we first discuss how thinkers like Thomas Hobbes, Adam Smith, Richard Dawkins, and Thomas Schelling thought about the problem of cooperation in society and how their insights may help us to design a world in which there are unaligned AI systems that reach a human-aligned equilibrium when interacting with each other. Building upon these ideas, we then reinterpret a theoretical model based on the theory of firm of Ronald Coase, giving it a new interpretation in the context of transformative AI, in which the amount of advanced AI systems that will coexist and what percentage of a given system each one will control is defined endogenously (and simulated with Python). The main intuition reached from our model is that friction between AI systems and marginal benefits are key variables for policymakers to try to impact to guarantee that the relative power of AI systems is relatively close to each other. Based on that, we tentatively propose a system with complaint watchmen AI systems that could be able to maintain a human-aligned equilibrium even with a significant amount of unaligned AI systems, in an analogous way of many social systems.

Authors (alphabetical order):

Gabriel de Almeida Prado
Henrique Dau
Kauã Victor
Matheus Hector
Tomás Aguirre
Vinícius  Marcial

Download

Download
Can the pin factory defeat the paperclip factory.pdf 877 kB