In: Computer Science
Explain how/why data mining and web mining respectively pose a threat to both individual and group privacy. Why does it matter? List at least three privacy-enhancement technologies (PETS) that exist to protect users and how/whether they work
Web mining and data mining issues:
Web mining and data mining have their own privacy issues. They are techniques that use collection of algorithms and methods from web data. The behaviour of data is used to analyze the users behind it. The technique works with users' privacy and hence there is always a need to improve them so that privacy of the user is protected.
One of the key issues to individuals and businesses has been the transactions. Because during these transactions huge amounts of data, preferences and user habits are exposed. This certainly threatens the privacy of group or individual and can risk the database. It matters because apart from risking someone's sensitive data it also risks the reputation of the business.
PET or privacy enhancing technologies aim at reducing the use of personal data by protecting the information or basic data.
Three PETs in use are:
1. Cryptographic algorithms:
These are used in encryption of data and for secure computations. The result generated can be decrypted by finding the operational matches. This type of data can be transferred easily and securely over the internet. The main operations used on data are multiplication or addition. However, there is no restriction on how many operations should be performed.
2. Data masing:
These are used by many businesses to secure the sensitive data of the data sets. Obfuscation is a very common masking technique in which many methods are used to replace the sensitive data with misleading data. In many other methods, identifier fields are removed and fictitious data is placed.
3. AI algorithms:
There are some data generation algorithms for synthetic use. It is also useful in testing scenarios in which case even a third party client can access data. Federated learning is another way of doing so. In this method, the algorithm is trained over many edge devices. Servers hold data and they are used to train data as well. It also helps in minimizing data.