CP Analyzer: Extract the coding habits of students
Introduction
Nowadays, its a very common sight that students put their online coding plat orms handle in their resume while applying for a job/internship in various companies. but there could be ways in which this information of the candidate could be used against him to carry out the information about the candidate that he might not want to be disclosed to the interviewer of the company. This exploitation of the privacy and carrying out of the information of the candidate is also called as Inference Attack, that is a type of attack in which a person’s sensitive information is inferred by the data disclosed by him. In this project, we took the example of Codeforces which is one of the most used platform for online coding competitions and coding practice.
Implementation
Data Fetching
Codeforces provides a variety of APIs. One of them was designed to fetch all the submissions of a user. The result of the API for a user is given below.
Data Interpretation
Now that we have all the information about all the submissions of that user we need to carry out various information out of it. Such as, the number of problems solved by the user in each language, like 40 problems in C++, 50 problems solved by the user in python etc. This information helps the interviewer or any adversary to find the strongest and weakest languages of that user. Secondly, the count of the problems solved by the user accoding to the rating of that problem. In codeforces, the difficulty level of the problems are labeled using a rating tag. This info could be used to infer the problem solving skills level of that user like how much difficult and how much easy-medium level problems are there that this
user can solve. And the most important, the sorting of the problems according to their topic. For example, the user solved 70 problems on graph and among these 70, the ratings of each problem so that the interviewer can get insight about every topic that the user has solved and also the count of the problems in each category that shows the number of Wrong submissions, number of submissions that gave TLE, number of submissions that gave Runtime error etc. Last but not the least, the list of problems in every topic that the user was not able to solve. These are the problems that user tried and got wrong submission and was not able to do these problems till date.
Results
After fetching all the results we plotted graphs of various kinds so that the interviewer could infer all the coding habits of the user at a glance. To showcase the results, we have taken an example user tourist who is a very famous coder on codeforces.
Conclusion and Mitigation
In conclusion, The Inference Attack was successfully done on the codeforces platform. This could be scaled up to other platforms like codechef, atcoder etc. In order to mitigate this, the platform should provide a facility to the users to chose the visibilty of their submissions. That is, the User should have liberty to make their submissions public or private.
Video presentation
The Team
- Danial Kafeel
- Janmejay Pratap Singh Baghel
- Akshay M
- Divya Donapati
- Chirag Shilwant
Acknowledgement
This project was carried out as part of the course Online Privacy , under the guidance of Professor Ponnurangam Kumaraguru, at International Institute of Information Technology, Hyderabad.