The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) took place last week virtually from Aug. 1-6, with numerous scholars presenting work on machine translation (MT), cybersecurity, and machine learning.
One particularly innovative cybersecurity presentation — on a model used to detect fake news and spam or phishing emails — was developed by researchers at Pennsylvania State University and Yonsei University in Korea. The researchers created a machine learning-based framework to defend against “universal trigger”-based attacks. Such attacks allow malicious groups or individuals to “fool an indefinite number of inputs” using a phrase or set of words — the researchers who presented at the ACL-IJCNLP 2021 essentially developed a “honeypot” that makes it easier to identify bad actors.
“Attackers try to find these universal attack phrases, so we try to make it very attractive for them to find the phrases that we already set,” said Thai Le, a doctoral student of information sciences and technology at Penn State and the lead author on the paper. “We try to make the attacking job very easy for them, and then they fall into a trap.”
Currently, most defense methods against such attacks involve reacting against them, rather than preemptively defending against attacks before they can occur. When these attacks occur on natural language processing (NLP) applications, they can negatively impact their performance on tasks such as fake news detection or spam filtering, leading to an influx of fake news on social media or spam emails. The researchers developed a model, called DARCY, to bait attacks against NLP applications and then catch them, protecting the machine learning models during the training process.
According to Le, the model significantly outperformed current means of detecting universal trigger attacks, with a 99% true positive rate and a less than 2% false-positive rate.
“As far as we know, this is the first work that utilizes the concept of honeypot from the cybersecurity domain in defending textual neural network models against adversarial attacks,” said Dongwon Lee, a professor of information sciences and technology at Penn State, also the principal investigator of the project.