AI Safety Blog
MariChatmen: Teaching a Chatbot to Write Andalûh Without Becoming a Mascot
What I learnt while building an experimental written Andalûh assistant: spelling was the easy part; keeping the answer useful was the real problem.
Read PostConsensus Through Debate
How can we align advanced AI systems with human values without falling into the trap of polarized "winner-takes-all" debates? A hybrid methodology for AI safety.
Read Post