Big data is typically characterized by 3,5, or 7 Vs’. The 3 Vs’ stand for volume, veracity and value (O’Leary, 2015). Big data is characterized by large volumes of data, data originating from different sources (such as smart devices, social media, weblogs, operational databases, flat files and so on) which is likely useful to analyze.
The information that exists about different entities on the Internet is unevenly distributed. There is a lot of data on some organizations and people while others remain largely anonymous. The less information there is about a person – the greater the privacy he/she has (O’Leary, 2015). However, regarding big data, less data usually means biases of the inferences made by the different algorithms when analyzing it (O’Leary, 2015).
Privacy allows individuals to ensure that there are information asymmetries over their data while big data allows organizations to reduce existing information asymmetries (O’Leary, 2015). Privacy involves the right to constrain people from learning certain facts about you while big data allows organizations to learn more about you. Essentially, big data and privacy are opposites.
The harms that can originate from big data result from an attempt to use that data for useful purposes. Some examples of this phenomenon are shown in the table below:
Table 1: Examples of Harms vs. Benefits of Big Data (Agnellutti, 2014)
|Type of issue||Positive side||Negative side|
|Invasion of private communications||Social networking – social and political participation on very large scale||Very low barriers to interception|
|Public disclosure of inferred private facts||Analytics lead to better and timelier treatments in the health sphere/ ads that you might be interested in, etc.||Analytics can infer personal facts from harmless input data|
|Tracking, stalking||Location can be used for navigation, finding better routes, nearby friends, avoiding natural hazards, etc.||A burglar can use the information to come in your home when you are out|
Privacy is yet to catch up to the ongoing technological changes. To this purpose, The US President’s Council of Advisors on Science and Technology has issued the following recommendations regarding protecting the privacy in the times of big data (Agnellutti, 2014):
- Policy should focus on uses of big data instead of collection/analysis (the harm is typically not caused by the data or by the program that works with it, but it is a result of the interaction or use of that data with that particular program)
- Policies and regulations should not embed technological solutions but should state intended outcomes (because technology and technological situations change rapidly the policies should focus on the purpose of the big data/analysis)
- U.S research in the field of technologies that has to do with privacy, the legal aspects of privacy and the social mechanisms for preserving privacy is necessary
- Education and training concerning privacy protection is necessary, and the technological industry should involve digital privacy experts in fields such as software development and management
- The USA needs to lead this privacy movement through privacy-enabling cloud services, standards and so on
With big data, anonymization becomes harder, and people can be re-identified. Possibly, the current issue of privacy is that it rests on the user (he/she gives consent to the various software/hardware he/she will be using). To remedy this, PCAST has proposed the idea of “privacy protection profiles.” Users choose a profile from a list of possible ones and the service/app provider vets apps against the chosen profile (US White House, 2014). Thus, the idea is that the privacy protection would rest more in the hands of the service/app provider.
Privacy needs in the IoT
Who needs to ensure privacy in the IoT?
The stakeholders that need to ensure that privacy is maintained do not end with the IoT device manufacturers. Entities that are responsible for maintaining privacy include the actual IoT device user, the government which enacts policies, standards, and regulations that guide the IoT, the providers of the cloud services and platforms of the device and any potential third-party app developers (Perera et al., 2015).
User consent is typically requested through privacy terms and policies which are essentially long passages of text. This has proved inefficient as developers may provide inaccurate information, the users may not have the technical knowledge to understand exactly to what they consent to, or the user may not have the time to read everything (Perera et al., 2015). This method of acquiring consent needs to be enhanced to ensure greater privacy – principles from human-computer interaction and the cognitive sciences need to be used in order to arrive at a more efficient and effective way of requesting consent from users that takes into account the users’ limited time and technical skills (Perera et al., 2015).
Ethical Hacking Training – Resources (InfoSec)
Data control needs to be enhanced in the IoT field. Currently, IoT solutions give only a limited control to users (Perera et al., 2015). To remedy this, the owners of the data generated by the devices need to be able to move the data elsewhere or delete specific data. They also need to be able to choose what kind of data they share with the different service providers and with what rights, and the data owners need to be able to revoke or modify consents at any time (Perera et al., 2015).
Contemporary IoT solutions are typically not anonymized, and the user’s location can be revealed with ease (Perera et al., 2015). To remedy this, new IoT solutions should adopt technologies such as Tor to ensure anonymity (and thus privacy) (Perera et al., 2015).
All of the stakeholders mentioned above need to ensure that devices and the software in connection with them are secure. As with any software, there are many security principles to regard such as automatic updating/patching of the software to combat newly found vulnerabilities. In the IoT devices, users should also be able to disable various hardware components whenever they want (Perera et al., 2015). This would grant them better control over their privacy. For example, they would be able to disable security cameras whenever they are at home which would mitigate potential privacy leaks due to stolen camera footage.
Privacy Implications of Big Data mining of Social Media
Analyzing the data available in different social media can also have a negative impact on its users.
Analyzing user-generated data from Facebook such as a user’s likes, posts, tags, groups, friends can give accurate predictions regarding the user’s personality (Mansour, 2016, p. 349) which is an obvious privacy risk.
Furthermore, the user’s Facebook likes alone (which are usually publicly available) may be sufficient to predict a variety of personal information about him/her besides personality such as his/her sexual orientation, ethnicity, religion, political orientation, IQ and drug use personality (Mansour, 2016, p. 349).
It may also be possible to determine the user’s social strategies and social motivation by analyzing social networking patterns such as the connectivity patterns of users over time. For example, the connectivity patterns of users over time only differ substantially from their normal rate when a user is creating connections with the purposeful intention to seek social capital (Mansour, 2016, p. 349).
Thus, big data can have a negative impact on users as well. The potential of big data in social media “could be used to intelligently craft tools and traps to manipulate users of social networking websites, or indeed, other websites” (Mansour, 2016, p. 350).
Security risks in social media are not lacking as well. Attackers rely on various forms of technological and psychological manipulation. They place imitations of real navigational buttons of the GUI of social media websites, launch bots which are fake profiles with a fake history, walls and pictures that carry out information attacks and so on (Mansour, 2016, pp.350-351).
Big data is here, and it brings both positive and negative effects. It has many faces and uses. To combat the negative effects, all involved stakeholders must focus not only on the possible benefits that they can reap from it but find out ways of using, storing and analyzing that data that will not lead to security issues.
Mansour, R. (2016). Understanding how big data leads to social networking vulnerability. Computers in Human Behavior, 57, pp.348-351.
O’Leary, D. (2015). Big Data and Privacy: Emerging Issues. IEEE Intell. Syst., 30(6), pp.92-96.
Agnellutti, C. (2014). Big Data: An Exploration of Opportunities, Values, and Privacy Issues. Nova Science Publishers Incorporated.
Perera, C., Ranjan, R., Wang, L., Khan, S. and Zomaya, A. (2015). Big Data Privacy in the Internet of Things Era. IT Professional, 17(3), pp.32-39.
US White House. (2014). REPORT TO THE PRESIDENT BIG DATA AND PRIVACY: A TECHNOLOGICAL PERSPECTIVE. [online] Available at: https://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_big_data_and_privacy_-_may_2014.pdf [Accessed 12 Apr. 2016].