The amount of spyware increases rapidly over the Internet and it is usually hard for the average user to know if a software application hosts spyware. This paper investigates the hypothesis that it is possible to detect from the End User License Agreement (EULA) whether its associated software hosts spyware or not. We generated a data set by collecting 100 applications with EULAs and classifying each EULA as either good or bad. An experiment was conducted, in which 15 popular default-configured mining algorithms were applied on the data set. The results show that 13 algorithms are significantly better than random guessing, thus we conclude that the hypothesis can be accepted. Moreover, 2 algorithms also perform significantly better than the current state-of-the-art EULA analysis method. Based on these results, we present a novel tool that can be used to prevent the installation of spyware.
Spridandet av spyware har ökat dramatiskt och det är ofta svårt för användaren att veta om spyware kommer att installeras samtidigt som en nedladdat applikation skall installeras. Den här studien undersöker om det är möjligt att avgöra om en applikation innehåller spyware genom att applicera data mining tekniker på applikationens slutanvändarlicens.