Publications

Investigating the "Wisdom of Crowds" at Scale

Published in Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology.

Under the guidance of Dr Sharad Goel, Assistant Professor, Stanford University.

In a variety of problem domains, it has been observed that the aggregate opinions of groups are often more accurate than those of the constituent individuals, a phenomenon that has been termed the "wisdom of the crowd". Yet, perhaps surprisingly, there is still little consensus on how generally the phenomenon holds, how best to aggregate crowd judgements, and how social influence affects estimates. We investigate these questions by taking a meta wisdom of crowds approach. With a distributed team of over 100 student researchers across 17 institutions in the United States and India, we develop a large-scale online experiment to systematically study the wisdom of crowds effect for 1,000 different tasks in 50 subject domains. These tasks involve various types of knowledge (e.g., explicit knowledge, tacit knowledge, and prediction), question formats (e.g., multiple choice and point estimation), and inputs (e.g., text, audio, and video). To examine the effect of social influence, participants are randomly assigned to one of three different experiment conditions in which they see varying degrees of information on the responses of others. In this ongoing project, we are now preparing to recruit participants via Amazon's Mechanical Turk.

Keywords :

Crowdsourcing
online experiment
crowd consensus

Curating A Semantic Bibliographic Catalog

Published in 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

Under the guidance of Dr Kavi Mahesh, Professor and Dean of Research, PES University.

Large online libraries are now common across the Web. However these libraries are usually fragmented and are not semantically connected thereby making their search and access difficult. Also there are no large bibliographic Linked Open Datasets available for use in research and analytics. This paper shows how to create a large, comprehensive, RDF triple store of semantic data about books. The primary aim of this work is to establish a linked relation between all the available books in the world and connect them to the already available linked datasets. The Open Library dataset, which has over 45 million records is first serialized by converting it into a triple format and then linked together using predicates from different ontologies. A simple endpoint for a semantic book search engine called BookLOD is created to demonstrate the utility of the dataset.

Keywords :

digital libraries
linked data
n-triples
library catalog

Application of Blooms Taxonomy in day-to-day Examinations

Published in 6th IEEE International Advance Computing Conference (IACC-2016).

Under the guidance of Dr Nitin V Pujari, Professor and Head PES Institute of Technology.

Bloom’s Taxonomy describes the classification of learning into various domains. The “Cognitive domain” tries to separate learning process into different levels based on the ability of a person to think. This is very helpful to detect whether a given question is memory based or application based. The primary aim of this paper is to demonstrate the utilization of Bloom’s Taxonomy to grade a given paragraph and the utilization of prediction models over that grading. This paper describes the technique to utilize the past marks of a student and the question paper contents to classify the question paper to a particular level using the taxonomic principles of the Cognitive domain and the application of linear regression to foretell the total marks that the student may score. This paper also describes various techniques in which Bloom’s Taxonomy can be applied and analyses the accuracy of each of those techniques.

Keywords :

Bloom's Taxonomy
Classification Algorithms
Data Analysis
Predictive models

Measurement of the Zone of Inhibition of an Antibiotic

Published in 6th IEEE International Advance Computing Conference (IACC-2016).

Under the guidance of Mahendra M Nayak, Associate Professor, PES Institute of Technology.

In the Disk Diffusion Antibiotic Sensitivity test (The Kirby-Bauer test) a thin film of bacteria applied on a plate is subjected to various antibiotics. The Zone of inhibition is a circular area around the spot of the antibiotic in which the bacteria colonies do not grow. The zone of inhibition can be used to measure the susceptibility of the bacteria to wards the antibiotic. The process of measuring the diameter of this Zone of Inhibition can be automated using Image processing. In this work an algorithm is developed, using Computer Vision, which will detect the zones of inhibition of the bacteria . This work demonstrates an effective approach of measuring the Zone of Inhibition by calculating the radius of the zone by drawing contours and setting the right value of threshold. This work also determines if a particular bacteria is susceptible or resistant to the applied antibiotic using the calculated Zone of Inhibition and the prescribed standard values.

Keywords :

Biomedical Image Processing
Computer Vision

BookLOD - An Open Library Catalog as Semantic Data

Published in 5th International Library and Information Professionals Summit 2016.

Under the guidance of Dr Kavi Mahesh, Professor and Dean of Research, PES University.

Online Library catalogs are very common to find these days. Thanks to the digitalization of the literary works, many libraries have come forward, to catalog the list of books available on their shelves, online. However many of these are stored straightforward in tables and other similar data structures. Very few catalogs have tried to exploit the linked nature of the data to create either graph based catalogs or semantic catalogs. Hence there is no single large linked catalog of all the books that have been published. In this project we aim to create a single large curated semantic dataset after converting The Open Library Data dump, which is a comprehensive dataset of books with over 45 million records, to RDF. We present BookLOD - a web based graphical interface for easily extracting required information from the data using Semantic Web technologies such as RDF and Linked Open Data.

Keywords :

E Books
Linked Data
Online Library Catalogs

Ongoing Projects

Analysis of the ACOA Dataset

Under the guidance of Dr Stan Matwin, Director, BigData Analytics Institute, Dalhousie University

The ACOA Database maintained by the Atlantic Canada Opportunities Agency (ACOA) since contains information about projects that have been approved by ACOA since 1995. The information present also contains the amount allocated to a particular project and the location of the project. Another famous dataset maintained by Elections Canada is the list of the elected representatives of the Canadian parliament. The main aim of this project is to visualize the political affiliation of a given elected representative of a given region alongside the funds that are allocated by ACOA. The main challenge lies in the fact that the ACOA database is divided based on postal codes whereas the other dataset is based on federal electoral district numbers. We describe the process of extracting relevant data from these datasets, visualizing them geo-spatially, generating a few interesting patterns and finally the analysis these patterns.

CommonLak - Extracting information and Visualization of LAK Dataset

Under the guidance of Dr Nitin V Pujari, Head, Department of Comp Sc., PESIT.

The LAK Dataset or the Learning Analyics and Knowledge dataset is composed of all the details of the papers published in both the Educational Data Mining Conferences and the Learning Analytics Conferences. Provided as LOD, by the Learning Analytics Summer Institute (LASI), it can be analyzed using RDF queries. CommonLAK project aims to create a exploration and a visualization tool for the same using Javascript. The name CommonLAK was chosen as the project tries to bring the LAK dataset to the common people who do not have ample knowledge to parse the dataset.

NATOBot - Bringing 10k moderation to All

The New Answers to Old Questions tool on Stack Overflow helps us to find all the answers added to questions which are more than 30 days old. However the issue with the tool is that it is not real time. There are requests on Stack Overflow Meta to not only make it real time but also to enhance the tool to add more features to it. However these requests have yet not been implemented. The NATOBot project aims to not only overcome the "not real time" issue but also help those users below 10,000 reputation to take a look and moderate the answers.

Other Projects

Sales Data Visualization and Analysis

Under the guidance of Dr Somayeh Fatahi, Researcher, BigData Analytics Institute, Dalhousie University

The amount of sales in a given region depends on many factors like the population, ethnicity and age of the people, presence of other retail stores and so on. Hence using data visualization and analytics we can identify the trends that are deciding the sales of that region. In this project we aim to visualize the sales data of supermarkets in a few cities of Canada by using geospatial visualization. We describe the procedure to create bubble charts and ink-drop visualization of maps and the cluster analysis of those bubbles.

Autoscaling of Cloud Foundry

Under the guidance of Dr Dinkar Sitaram, Head, Centre for Cloud Computing, PESIT.

Cloud Foundry is a portable, open source platform-as-a-service (PAAS) initiative maintained by EMC, VMWare and General Electric that has recently garnered a great deal of attention due to its newly formed multivendor foundation. Cloud Foundry implements auto scaling using the BOSH autoscaler which is a reactive model of auto scaling. This reactive approach has some disadvantages. The primary aim of our work is to prevent these disadvantages by utilizing a preemptive approach to autoscaling using Machine Learning models. We intend to use machine learning to predict the parameters involved in maintaining QoS. In this way we can overcome the problems associated with the reactive approach of BOSH.

Mapping Engineering Research in India

Under the guidance of Dr. I K Ravichandra Rao & Dr. K S Raghavan, Researchers, Center for Knowledge Analytics and Ontological Engineering.

Higher education institutions in India especially in Engineering, barring a few notable institutions, have largely focused on teaching at undergraduate and graduate levels. In recent years, some programmes have been initiated (e.g. TEQIP) to encourage research in these institutions. In this paper an attempt has been made to examine Indian research output in Engineering. The papers with at least one of the authors having an Indian affiliation and published in any of the IEEE journals have been considered for this study. The growth of output over a 10 year period from 2004 to 2013 has been examined and compared with that of China. The major countries that output a substantial volume of research literature in engineering have been identified based on the data and their h-index computed and compared. The top institutions in India have been identified as also the major journals in which Indian research appear

Undergraduate Course Projects

Simulation of a Load Balancer

Under the guidance of Dr. Ashutosh Bhatia

The project work for partial credit of the subject, System Modelling Simulation, involved the simulation of a load balancer and multiple queues. The various methods of load balancing and it's visualization was done using Javascript. The different methods of queueing which were studied were Round Robin, Random allocation and Least Connections. The results were then analyzed using python and the round robin was found to be a better choice than the rest.

Local Network Drive

Under the guidance of Prof. H Phalachandra

The project work for partial credit of the subject, Software Engineering, involved the creation of a real time network based service which monitors different storage devices and allows mutual sharing of the disks upon request from the user of these disks. The heart of the project was the Tracker server, a RESTful API. The tracker keeps the track of all the changes going on in the system by making use of a SQL backend database. The Reliable Transfer Protocol (RTP) and the Real Time Streaming Protocol (RTSP) was used to create components for sharing the files and video streaming respectively. The entire project could be accessed using a web based GUI.

Implementation of First and Follow of a Simple Grammar

Under the guidance of Prof. H B Mahesh

The project work for partial credit of the subject, Compiler Design, involved the generation of First and Follow set of any given grammar. The project helps in the construction of FIRST and FOLLOW sets which are essential for the LL parsing of input. Written completely in Python, the project provides a tkinter GUI where the user can input a given grammar. The FIRST and FOLLOW sets are generated after analyzing the grammar.

Hospital Management Systems

Under the guidance of Prof. Prafullata K Auradkar

The project work for partial credit of the subject, Database Management Systems, involved creating an management interface for Hospitals to increase the efficiency, in managing the patient records, doctor logs, etc. The entire project was written in Java, using PostGreSQL as a backend. The GUI was written using Java swings.