Tuesday, 23 June 2020

i4C Blogathon - First Runner Up Prize Winning Entry by Mr. Saurabh Gupta| Theme - Cyber Security and Data Protection

Innovative Cyber Security and Data Protection Practices for the Digitally Driven World.

First Runner Up Prize Winning Entry  |  Cyber Security and Data Protection 

Mr. Saurabh Gupta
PhD, Comp/IT
Indraprastha Institute of Information Technology, Delhi

A digitally driven world is a place where people and services understand the value of data they generate and innovative techniques to use it efficiently. As is a common saying, data is the new oil. Therefore, it is hardly a surprise that most organizations today focus on collecting user data, be it purchase or browsing history, mobile trace information, or even medical records. And ones that cannot generate this data tend to acquire it from elsewhere. However, when personally identifiable information is involved in a transaction between two entities, the threat of data-theft and leaks becomes highly significant. In this blog, I will discuss how we as consumers, knowingly or unknowingly, are exposed to cyber-attacks and defenses with a focus on data privacy and its protection. 

The State of Legislation 

Several major countries do not currently have a privacy protection law in place to prevent misuse and mishandling of personal data. As a result, proper security measures, which are usually expensive to implement, are often a second priority (well behind usability) for profit-driven organizations. This is one of the main reasons why we encounter countless stories of companies suffering from data breaches that end up leaking information about millions of users. This 1 carelessness leaves a large number of vulnerabilities in applications prone to attacks sitting there like a ticking time bomb ready to explode. In recent times, Aarogya Setu, an app launched by the Indian government, was found to have privacy flaws as it lacked user input validation and could be exploited to know whether your neighbor has the coronavirus. Similarly, a Twitter user found that 2 the Delhi Traffic Police website leaks mobile numbers of challaned vehicle owners.
There are several other examples where sensitive data is being leaked without 3any authentication and authorization from servers fueled by the lack of accountability and legislation. In countries like India and the US, despite a lack of privacy protection laws, data protection and privacy are promoted using other existing legislations such as the UN declaration of human rights, which recognized privacy as a fundamental right. 4 India launched its first draft of such data protection laws in 2018, which is currently in a review phase. 
Cyber Security for Breakfast

Now I’ll talk about how cyber security plays a role in our day to day life. From the moment we wake up in the morning until we go to sleep, we tend to use a lot of technology that can pose a threat or carry a privacy flaw. For the sake of simplicity, I’ll use 'Saurabh' as a name for the explanation. Assuming Saurabh is someone who is concerned about his health, he wakes up and goes for a walk with a fitness tracker on his arm. After returning from that walk, he unlocks his phone with a fingerprint or a face recognition sensor and checks his social media. Then he pours milk in his cereal bowl, which he bought online using his browser. After having breakfast, he turns on the burglar alarm and a security camera, locks his house and leaves. He travels using a cab with location services turned on and accesses his company’s infrastructure using a VPN. Once Saurabh enters his organization, he trusts its infrastructure to protect his privacy. After work, he comes back home and plays video games and surfs social media for recreation. 

In a typical day from Saurabh’s life, a lot of things happened behind the scenes, just to ensure that his privacy is protected at all times. All the services mentioned above use at least two of the five pillars of cyber security, viz. confidentiality, integrity, availability, authorization, and authentication. The fitness tracker uses a lightweight authentication to prevent unauthorized access to health data. The 5 mobile companies like Apple and Google claim to store only a representation of your fingerprint and your face, so your images cannot be used for malicious purposes. While online, whether accessing social media platforms or buying 6 something from online shops, the browser uses the Transport Layer Security protocols to make sure that the right kind of packet reaches it, unchanged and 7 ‘un-sniffed’. A VPN uses protocols like IPSec, SSL, or white/black-listing to protect valuable assets inside the organization. Video games already use all 8 possible ways to protect their users, but they pose different types of cyber threats like cyber bullying or trolling. Overall, a holistic viewpoint is necessary as you are 9 only as secure as your weakest link. 
As can be seen from the above example, there are various points where our privacy may get compromised. It can become daunting to know that there are several adversaries out there, trying hard to find new ways to harm internet users. But there is a yang for every yin, a hero for every villain. Cyber security researchers and ethical hackers come up with new techniques to improve the current state of protection from known threats. Although the scope of discussion on cyber security is huge, as promised in the beginning, I will now discuss the data privacy protection landscape with some borrowed examples from the community and a few ideas of my own. 
The Types of Privacy Needs 

From an abstract viewpoint, privacy requirements can be divided into three cases:  i) When data is stored and controlled by a central authority using central/distributed servers. Social media platforms like Facebook & Twitter, or services like Uber & Paytm have a large number of users accessing their services. These companies are required to protect an individual’s data as they collect a lot of sensitive information like phone numbers, card information for payments, personal chats, and so on. Protecting data on central/distributed services has been studied for ages, and various solutions already exist. Some examples can be: nudging users to have a strong password, storing passwords in an encrypted format, encrypting data 10 11 before storing them on databases or servers, having a firewall with a white/black-list of IP addresses for access control, and so on. The majority of the 12 current infrastructure is protected using these techniques as such systems are the most commonplace as of today. 
ii) When data is being shared for analysis and to generate statistics about a population. For meaningful policy-making, statistics about a population are often essential to ensure that the services that are meant to help reach the right people. Behaviour modeling to deliver the relevant ads to users or understanding demographics from census data can be some examples. This further leads to two possible scenarios – first, the data is collected by the government agencies and needs to be shared with third-parties to provide various services to the users; second, the data-owners want to analyze the data for themselves without leakage of any private information. Anonymization techniques like k-anonymity, l-diversity, t-closeness and their 13 14 15 variations are used in the former, and differential privacy (or homomorphic 16 encryption, hardly used in practice as of now) is used to achieve the latter. 17 
However, anonymization is vulnerable to attacks and is shown to be insufficient to protect privacy. By inferencing and cross-referencing attacks, researchers successfully leaked information about the users present in an anonymized Netflix Prize dataset. Another group of researchers revealed the Health records of the 18 Massachusetts governor using anonymized public health records. Researchers 19 have shown that 87 percent of all Americans could be uniquely identified using only three bits of information: ZIP code, birth date, and gender. In a similar study 20 I did, I was able to start from a tweet posted during the elections to cross-link information present in electoral rolls to reveal a lot of PII of Twitter users including their age, family, address, and voter IDs (the work is under review, and therefore, is not public yet). The figure below shows an example of the cross-linking: 

A Twitter user NaXXXa XXXe (name censored) tweeted about their vote, hence, revealing their preferences towards a party and losing their voter privacy. The Twitter display name is successfully linked to their entry in the 21  electoral 
rolls. I have censored their voter ID, husband's name, and house number from the results for ethical reasons. 
Due to the lack of protection with anonymization, researchers have shifted towards using differential privacy for the analysis, and have created frameworks to use it 22 efficiently with data. Generative models are being used to generate fake private 23 samples to create a separate dataset altogether that has the same distribution as the original dataset but is less likely to reveal any private information. The generated 24 dataset, with 'fake' samples, can then be consumed to generate statistics about a population. 
iii) When data is stored and controlled by decentralized blockchain-based platforms, where the ownership of data remains with the user. Amidst all the privacy breaches we witness, one might feel that there is no perfect system that can protect your privacy. Enter, blockchains. Many define blockchains as unhackable. Its decentralized nature and cryptographic algorithm make it immune to attack. However, there are several quid pro quos when it comes to using blockchains to replace the current systems in place. For one, it is slow, and there is no central authority. Therefore, the accountability is not there, albeit there exist blockchain-based frameworks, especially in the healthcare domain, as the data is highly sensitive making privacy preservation their utmost priority. 
MedRec and MeDShare are two such blockchain-based platforms that can be 25 26 used in healthcare to provide a secure flow of data among the stakeholders. However, the maturity of blockchain technology is in debates, and the platform is at least five years away from actual adoption. To make such architectures more usable and generic, I am working on a data-sharing platform for secure and consensual data movement among several entities. Suppose Saurabh goes to a hospital H for regular checkups, to a food court F to eat, and to a gym G to workout. In the current scenario, the three entities, H, F, and G, do not share data, and Saurabh’s subscription to each of these is independent. The platform will allow these entities to access data from other entities with Saurabh’s consent to provide him better services. For example, Saurabh’s health report says he has high cholesterol. Now, when Saurabh goes out to order food, F can look at Saurabh’s health records and suggest a low cholesterol meal. Similarly, when he goes to G, G can look at data from H and F, and plan his workout accordingly. 

Data sharing platform for secure and consensual data movement between an user and several entities he is subscribed to 

Conclusion From journalists to politicians, no one is safe with a piece of modern technology in their hands. The highly secure app, WhatsApp, having more than a billion users, was hacked using spyware named Pegasus. Despite having vast knowledge about 27 the known threats, there are zero-day vulnerabilities being exploited every day to break into highly secure systems. However, with the broadening of the attack landscape, innovative defenses are also being invented. The Pan-European Privacy-Preserving Proximity Tracing is developed to protect individual privacy 28 while tracking them at the same time during the coronavirus pandemic.
In theory and research, more privacy-enhancing technologies are being invented as we speak. To conclude, I believe that there is a need to build a system that intrinsically uses privacy-enhancing technologies from research to make it accessible to non-technical users. The gap between theory and practice needs to be reduced. While the research is going on at large, Saurabh should be careful about how much data he shares with whom. 

1 "8 biggest data leaks of 2019 that hit Indian users hard - What ...." 17 Dec. 2019, https://economictimes.indiatimes.com/industry/tech/8-biggest-data-leaks-of-2019-that-hit-indian-users-hard/what-ca uses-data-breach/slideshow/72839190.cms. Accessed 25 May. 2020. 2 "Hacker Elliot Alderson explains privacy flaws in Aarogya Setu ...." 7 May. 2020, https://www.businesstoday.in/technology/news/hacker-elliot-alderson-explains-privacy-flaws-in-aarogya-setu-app-c 3 "NotABot (@federated_monk) | Twitter." https://twitter.com/federated_monk.
Accessed 25 May. 2020. 4 "Privacy and Human Rights - Overview - Global Internet Liberty ...." http://www.gilc.nl/privacy/survey/intro.html. Accessed 25 May. 2020.  5 "Lightweight authentication protocols for wearable devices ...." https://dl.acm.org/citation.cfm?id=3162915. Accessed 25 May. 2020. 6 "Twitter पर Apple: "Face ID only stores a mathematical ...." 9 Jan. 2020, https://twitter.com/apple/status/1215289219972849664?lang=hi. Accessed 25 May. 2020. 7 "Transport Layer Security - Wikipedia." https://en.wikipedia.org/wiki/Transport_Layer_Security. Accessed 25 May. 2020. 8 "How to use a VPN to protect your internet privacy | ZDNet." 17 May. 2018, https://www.zdnet.com/article/how-to-use-a-vpn-to-protect-your-internet-privacy/. Accessed 25 May. 2020. 9 "Cyber-bullying and video games | VentureBeat." 26 Sep. 2014, https://venturebeat.com/community/2014/09/26/cyber-bullying-and-video-games/.Accessed 25 May. 2020. 10 "The Importance of Strong, Secure Passwords." https://www.securedatarecovery.com/resources/the-importance-of-strong-secure-passwords. Accessed 25 May. 2020. 11 "Safely Storing User Passwords: Hashing vs. Encrypting - Dark ...." 4 Jun. 2014, https://www.darkreading.com/safely-storing-user-passwords-hashing-vs-encrypting/a/d-id/1269374. Accessed 25 May. 2020. 12 "Firewall security for network protection | ESET." https://www.eset.com/int/firewall/. Accessed 25 May. 2020. 13 "k-Anonymity: a model for protecting privacy - Data Privacy Lab." https://dataprivacylab.org/projects/kanonymity/index.html. Accessed 25 May. 2020. 14 "l-diversity - Wikipedia." https://en.wikipedia.org/wiki/L-diversity. Accessed 25 May. 2020. 15 "t-closeness - Wikipedia." https://en.wikipedia.org/wiki/T-closeness. Accessed 25 May. 2020.16 "Differential privacy - Wikipedia." https://en.wikipedia.org/wiki/Differential_privacy.Accessed 25 May. 2020. 17 "Homomorphic encryption - Wikipedia." https://en.wikipedia.org/wiki/Homomorphic_encryption. Accessed 25 May. 2020. 18 "De-anonymization of Netflix Reviews using ... - csail - MIT." https://courses.csail.mit.edu/6.857/2018/project/Archie-Gershon-Katchoff-Zeng-Netflix.pdf. Accessed 25 May. 2020. 19 "“Anonymized” data really isn't—and here's why not | Ars ...." 8 Sep. 2009, https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/. ccessed 25 May. 2020. 20 "Simple Demographics Often Identify People Uniquely - Data ...." https://dataprivacylab.org/projects/identifiability/paper1.pdf. Accessed 25 May. 2020. 21 "Voting Privacy - EPIC." https://epic.org/privacy/voting/.Accessed 30 May. 2020. 22 "Differential Privacy for Statistics - The University of Texas at ...." 14 Jan. 2009, https://www.utdallas.edu/~muratk/courses/crypto-for-dbsec10s_files/DworkSmith.pdf. Accessed 25 May. 2020. 23 "OpenMined." https://www.openmined.org/. Accessed 25 May. 2020. 24 "generating differentially private datasets using gans." https://openreview.net/pdf?id=rJv4XWZA-. Accessed 25 May. 2020. 25 "MedRec:Using Blockchain for Medical Data Access and ...." 22 Sep. 2016, https://ieeexplore.ieee.org/document/7573685. Accessed 25 May. 2020. 26 "MeDShare:Trust-Less Medical Data Sharing Among Cloud ...." 24 Jul. 2017, https://ieeexplore.ieee.org/document/7990130/). Accessed 25 May. 2020. laims-five-unwell-at-pmo/story/403101.html. Accessed 25 May. 2020. 28 "PEPP-PT: HOME." https://www.pepp-pt.org/. Accessed 25 May. 2020. 27 "WhatsApp Hack:What Is Pegasus Spyware That Allegedly ...." 31 Oct. 2019, https://gadgets.ndtv.com/apps/news/pegasus-spyware-whatsapp-attack-facebook-nso-group-what-2125279.
Accessed 25 May. 2020. 

No comments:

Post a Comment