Mitigating Adversarial Norm Training with Moral Axioms

Taylor Olson; Kenneth D. Forbus

doi:10.1609/aaai.v37i10.26402

Back

Conference proceeding

Mitigating Adversarial Norm Training with Moral Axioms

Taylor Olson and Kenneth D. Forbus

Proceedings of the ... AAAI Conference on Artificial Intelligence, Vol.37(10), pp.11882-11889

AAAI Conference on Artificial Intelligence

06/27/2023

DOI: 10.1609/aaai.v37i10.26402

Files and links (1)

url

https://doi.org/10.1609/aaai.v37i10.26402View

Published (Version of record) Open Access

Abstract

This paper addresses the issue of adversarial attacks on ethical AI systems. We investigate using moral axioms and rules of deontic logic in a norm learning framework to mitigate adversarial norm training. This model of moral intuition and construction provides AI systems with moral guard rails yet still allows for learning conventions. We evaluate our approach by drawing inspiration from a study commonly used in moral development research. This questionnaire aims to test an agent's ability to reason to moral conclusions despite opposed testimony. Our findings suggest that our model can still correctly evaluate moral situations and learn conventions in an adversarial training environment. We conclude that adding axiomatic moral prohibitions and deontic inference rules to a norm learning model makes it less vulnerable to adversarial attacks.

Arts & Humanities

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Theory & Methods

History & Philosophy Of Science

Science & Technology

Technology

Details

Title: Subtitle: Mitigating Adversarial Norm Training with Moral Axioms
Creators: Taylor Olson - Northwestern University
Kenneth D. Forbus - Northwestern University
Contributors: B Williams (Editor)
Y Chen (Editor)
J Neville (Editor)
Resource Type: Conference proceeding
Publication Details: Proceedings of the ... AAAI Conference on Artificial Intelligence, Vol.37(10), pp.11882-11889
Series: AAAI Conference on Artificial Intelligence
DOI: 10.1609/aaai.v37i10.26402
ISSN: 2159-5399
eISSN: 2374-3468
Publisher: Assoc Advancement Artificial Intelligence
Number of pages: 8
Grant note: FA9550-20-1-0091 / Air Force Office of Scientific Research; United States Department of Defense; Air Force Office of Scientific Research (AFOSR)
Language: English
Date published: 06/27/2023
Academic Unit: Computer Science
Record Identifier: 9984948140302771

Metrics

1 Record Views