How private is the Instagram Private mode?
Introduction:
In today's world maintaining a social media presence is just as important as maintaining a resume. With Globalization, comes the need of fast and reliable communication and social media platforms like Twitter, Facebook, Instagram, LinkedIn, etc are quick to fulfill this need, thus creating, a still expanding, market of people's identity, expressions and emotions with the help of computer algorithms, powerful processors and mighty memory units.
"Why call it a market?", One might ask, and rightfully so. Such a person must understand that despite the wholesome implications of bringing families and friends together, Social Media platforms are businesses run by big corporations. They run on advertisers money (among other things) and their ultimate goal is to rake in more and more profit. And a business whose most defining feature is to collect, store and display the private and sensitive data of billions of people, must implement at least some measure of hiding such information. Instagram's answer to this problem was the introduction of the "Private Mode", that limits the information shared with general public to only the Profile Image, Username and Information in the Bio, only the followers of the account can enjoy access to all the posts and information other than the above mentioned (Similar idea was later implemented by Facebook and Whatsapp.)
It is a given today that many people have social media accounts on multiple platforms, either for entertainment or by necessity. All of them containing user's private information that can not only reveal things about the private life but also likes and dislikes, truly terrifying in the hands of businesses (Ever felt like some Netflix recommendation was "Meant for you" ?)
Just
like a mug with more than one hole leaks coffee everywhere, a person
enjoying multiple social media platform coverage makes themself more and
more vulnerable to data leaks. No amount of safe modes can save a
person from getting their data getting farmed from a different site
(unless that site too is protected by such private modes). We try to
justify this statement through this project.
The purpose of this project is to use the limited information scrapped from the private Instagram profiles and then use that information to gather new (and sensitive) information from other social media handles of the same person. To sum it up, we see "How Private is the Instagram Private mode."
Problem Statement:
Our goal is to find as much information as we can about a person based on the limited information we can receive from the privated Instagram profile using other platforms.
Solution Plan:
The first step towards solving any real world problem is to introduce a layer of abstraction, through which we can express our real life problem as a set of smaller, simpler, algorithmically achievable tasks. Based on this approach, what we need to do is...
1) Scrape data from "Private" Instagram profile.
The idea is to gather whatever limited information we can receive from the private Instagram profile, that is, the username (NOT necessarily the real name), Instagram Bio and Profile Photo. These are the only bits of information we get to further search for the user on other social media sites.
2) A method to match users on different platforms: Face Recognition Algorithms.
A person might change their username from profile to profile, but unless they have a serious concern for their privacy AND a huge sum of money, they cannot change their face on different profile pictures. We exploit this simple fact to identify the Instagram user over different social media sites. For this we shall employ a python script that can label the people (there can be more than one too) in the profile picture and match thus help in matching the profiles.
3) Scrapping data from the matched profiles on a different social media platform.
For the purpose of this experiment, we are using only two external social media site, Facebook (because it has the largest number of users) and LinkedIn. We shall use Face Recognition algorithm to identify the Instagram user's Facebook profile and use a Facebook Scrapper to scrape all the information we can gather. Scraping has been banned by LinkedIn on their platform so we are forced to scrape the data manually. We will collect all the data we can about the subject, thus rendering the Instagram "Private" mode useless.
4) Automating the process.
All the above mentioned steps need to be integrated into one final script, that on taking some input, completes the above three steps on it's own and gives us the desired output. The desired output being the information scrapped from other social media platforms.
The Code:
1) Instagram Scraping:
We mainly make use of two libraries in this step:
1) Instaloader: Instaloader lets us communicate with the Instragram API to obtain media and metadata.
2) Selenium Webdriver: This module provides webdriver implementations that are useful for scraping dynamic websites like Instagram.
Using these two libraries we obtain the name, profile picture and bio of the target account. The name and profile picture are used directly while the bio is processed to find extra information like DOB, school/university name, city, etc.
2) User identification on Facebook:
For this step we first use the name of the user as a search term on Facebook and scrape the profile pictures from all the matching accounts. We then run face recognition on these images using the Instagram profile picture as reference.
For face recognition we use the python library ‘face_recognition’ it combines some OpenCV code with deep learning code from the open source collection ‘dlib’ to provide several face recognition methods. We use the ‘compare_faces’ method.
Once we have identified the user account on Facebook, we try to extract the available profile information. We use two libraries:
1) Re: Regex is used to handle pattern-based API
2) Selenium Webdriver: This module provides webdriver implementations that are useful for scraping dynamic websites like Facebook.
We can typically gain a lot of information, such as:
1) Current Location
2) Native Place
3) School/University related information
4) Relationship status
5) Sexual Orientation
6) Some bios also include quotes, which might tell us more about the user’s personality
7) Interests
8) Date of Birth, Marriage Anniversary Date and so on.
9) Relatives of the user
Available information is extracted and stored or displayed on Terminal.
4) LinkedIn Scraping:
Attempts were made to create a LinkedIn scraper, however due to strong anti-scraping measures taken by the site we had to do it manually. The usernames from Facebook and Instagram were used as search terms and profiles were found manually, then available information was recorded. This information typically included:
1) School/University and related information
2) Current and past workplaces
3) Position held at workplace
This data was used in conjunction with other site data to create a more complete profile of the individual.
Interface:
Results:
Following are the results of the number of accounts identified on Facebook and LinkedIn
It can be seen that we had a significantly lower success rate on LinkedIn. This is mainly due to a large number of people not having LinkedIn accounts as the website is used mainly by professionals.
Other reasons for not identifying an account include:
1) User didn’t have an account on Facebook
2) User face was visible due to which the face recognition failed
3) Locked Facebook account (scraping is not possible on locked profile)
4) Incomplete information on Instagram/Facebook (no profile picture or no name)
Demonstration:
As a demonstration of how this data can be used by people with less-than-noble intentions, we designed a small-scale experiment.
The goal of the experiment was to get the user the click on a link, and see how a standard spam message would fare against and ‘personalized’ message created from the user’s scraped data.
Typical Spam Message:
“Hi, please click the link below and fill the survey to help with our research, thanks. <link>”
“Hi <name>, I am your junior from <current/past school or college>, I am conducting some research, please help me out by clicking this link, thanks! <link>”
The links led to blank google forms that simply counted number of visitors. Both types of messages were sent via Instagram DM to 25 users each, these are the results.
As seen here, 19 out of the 25 people replied to the personalized message. This shows how the effectiveness of such a basic spam attack can be increased manifold by including some simple data that can be easily found on the user’s accounts itself. While we used a benign link, a malicious person could have easily linked to a download for a virus file or a phishing site made to look like a login page.
This clearly shows how dangerous it can be to carelessly provide personal data on multiple social media sites.
Conclusion:
Although the blatant lack of awareness regarding online privacy is indeed alarming, things can be made better rather easily, from rock bottom we can only go up. People often disregard their online privacy and leave their security to chance. After all in a big chain, no link ever expects to be the one that will be targeted and attacked, but then again, a chain is only as strong as the weakest link.
It is imperative that people understand that although their data on it's own is not of much value but the data of millions of like minded people together is something that can decide life and death and move nations, which, it already has done (https://www.netflix.com/in/title/81254224?s=a&trkid=13747225&t=wha&vlang=en&clip=81275293). People must also understand that getting their data stolen and used against them is far more easier and common than they think it is. People maintaining and regularly updating accounts on multiple social media sites does not help them one bit. The chances of getting phished or deceived has never been greater!
Your social media presence is an entry into your world and a virtual journal that compiles all and as much data you put in it, only it's open for all to see online. Every private mode is breachable and certainly easy to bypass, as shown in our project. Your conscious effort is the best defense you have as we have just established, "Instagram Private mode, is not that private."
Important Links and Images:
Credits:
Amar Gwari (2019102009)
Harshwardhan Prasad (2019102021)
Vishal Kumar Singh (2019102002)
Rishabh Kumar Singh (2019102013)
Rishin Chakraborty (2019112008)
Link to YouTube Video:
Video explaining the project: https://youtu.be/cf0uOpGrrns






Comments
Post a Comment