Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Guiding Deep Learning System Testing Using Surprise Adequacy
Korea Adv Inst Sci & Technol, KOR..
Blekinge Institute of Technology, Faculty of Computing, Department of Software Engineering. Chalmers Univ, Dept Comp Sci & Engn, Gothenburg, Sweden.;Blekinge Inst Technol, Dept Software Engn, Karlskrona, Sweden..ORCID iD: 0000-0002-5179-4205
Korea Adv Inst Sci & Technol, KOR.
2019 (English)In: International Conference on Software Engineering, IEEE , 2019, p. 1039-1049Conference paper, Published paper (Refereed)
Abstract [en]

Deep Learning (DL) systems are rapidly being adopted in safety and security critical domains, urgently calling for ways to test their correctness and robustness. Testing of DL systems has traditionally relied on manual collection and labelling of data. Recently, a number of coverage criteria based on neuron activation values have been proposed. These criteria essentially count the number of neurons whose activation during the execution of a DL system satisfied certain properties, such as being above predefined thresholds. However, existing coverage criteria are not sufficiently fine grained to capture subtle behaviours exhibited by DL systems. Moreover, evaluations have focused on showing correlation between adversarial examples and proposed criteria rather than evaluating and guiding their use for actual testing of DL systems. We propose a novel test adequacy criterion for testing of DL systems, called Surprise Adequacy for Deep Learning Systems (SADL), which is based on the behaviour of DL systems with respect to their training data. We measure the surprise of an input as the difference in DL system's behaviour between the input and the training data (i.e., what was learnt during training), and subsequently develop this as an adequacy criterion: a good test input should be sufficiently but not overtly surprising compared to training data. Empirical evaluation using a range of DL systems from simple image classifiers to autonomous driving car platforms shows that systematic sampling of inputs based on their surprise can improve classification accuracy of DL systems against adversarial examples by up to 77.5% via retraining.

Place, publisher, year, edition, pages
IEEE , 2019. p. 1039-1049
Series
International Conference on Software Engineering, ISSN 0270-5257
Keywords [en]
Test Adequacy, Deep Learning Systems
National Category
Software Engineering Other Computer and Information Science
Identifiers
URN: urn:nbn:se:bth-20385DOI: 10.1109/ICSE.2019.00108ISI: 000560373200090ISBN: 978-1-7281-0869-8 (print)OAI: oai:DiVA.org:bth-20385DiVA, id: diva2:1465030
Conference
41st IEEE/ACM International Conference on Software Engineering (ICSE), MAY 25-31, 2019, Montreal, CANADA
Available from: 2020-09-08 Created: 2020-09-08 Last updated: 2023-06-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Feldt, Robert

Search in DiVA

By author/editor
Feldt, Robert
By organisation
Department of Software Engineering
Software EngineeringOther Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 33 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf