Nishant - My Projects

PROJECTS AT A GLANCE

Visual Question Answering

Real Time Vehicle Recognition and Automatic Number Plate Recognition

RT Vehicle Recognition

Visual Story Telling

Merging Point Clouds

Predicting Human Arm Reach

Adaptive Strategies for Prisoner's Dillema

Adaptive Strategies for IPD

Personality Trait Analysis

ADA Compiler

Adaptive Scan Pooling

Multi View Clustering

CS425A Course Project

Word Embeddings using multiple prototypes

This Is Folio Link2

Multiple Word Embeddings

This Is Folio Link3

Alternate Paths in Graphs

This Is Folio Link3

Social Networks

NachOS Operating System

Geometric Data Structures

Multi Modal Emotion Recog.

News Report Classification

Other Minor Projects

INTERNSHIPS

University of British Columbia

THE INTERNSHIP

I interned at CARIS Lab, University of British Columbia during summers '16 through the MITACS internship programme. I was mentored by Justin Hart, Post Doc, CARIS Lab and Elizabeth Croft, Head, CARIS Lab, for prediction of single-arm reaching motion by humans in order to create smooth and safe Human-Robot interactions. The internship involved training in ROS (Robot Operating System) and working on several robot platforms, including a Barrett WAM 7-DOF Robot and the Willow Garage PR2 Robot.

SINGLE-ARM REACH PREDICTION

The main project I worked on was related to prediction of single arm motion by humans to create smooth human robot interactions, and was mentored by Justing Hart and Elizabeth Croft. It involved studying and analyzing the performance of multiple Hand and Model trackers and their inclusion in our pipeline. I also developed multiple interfaces to be used in the experimental setup for the final Human subject experiments.

MERGING POINT CLOUDS

The project aims at merging unaligned point clouds from multiple kinects to gain additional information, and was mentored by Justin Hart, Post Doc, CARIS Lab. It involved a literature survey on existing work for Camera Calibration and Distortion reduction in cameras. A transformation between the Kinects was computed using the extracted camera parameters, which was further used to align the Point Clouds. Averaging using Rodriguez representation along with Bundle Adjustment was performed to improve the results.

PRESENTATION

CONFIDENTIAL (FOR NOW)

Xerox Research Centre India

THE INTERNSHIP

I interned at Xerox Research Centre India, Bangalore during winters '15 (i.e. December '15). I was mentored by Om Deshmukh, Senior Researcher (Area Manager, Multimedia Analytics), XRCI and Sumit Negi, Principal Researcher, XRCI , for developing and evaluating algorithms for Multi View Clustering using Non Negative Matrix factorization. The work involved proposing and evaluating various algorithms for the task. Further details upon completion of the work. The work has been accepted as an Oral presentation at the International Conference on Pattern Recognition '16.

THE PROJECT

The project deals with the problem of clustering data using information present in Multiple Views. We propose several extensions to tackle the Partial Multi View problem which involve data with missing views i.e. not all instances have all views. There has been relatively less work in the field even though it is quite a realistic assumption when we consider real world data. Our proposed models have simple update rules which result in ease of computation. We compare the performance of our approaches with previous models on diverse datasets (including image and textual datasets) and find that our model outperforms them.

THE RESULTS

We compare the performance of our approaches with previous models in the field; our model outperforms the state of the art methods. We use both Image and Textual data to ensure diversity during the experiments. As our approach includes graph regularization, we also study the effect of different kernels on the performance. Additional experiments have been described in the paper.

PRESENTATION

PROJECT DETAILS

CODE REPOSITORY

I.N.R.I.A. Rocquencourt

THE INTERNSHIP

I interned at I.N.R.I.A., Rocquencourt, France during Summers '15 (i.e. May '15 - July '15). I was working simultaneously on two projects during my stay there. I was mentored by Mentored by Laurent Viennot, Senior Researcher, INRIA and Adrian Kosowski, Researcher, INRIA , for finding routes substantially different from the shortest path based on different criteria. The work involved proposing and evaluating various algorithms for the task. I was also working with Adrian Kosowski, Researcher, INRIA for finding good local features which are suitable predictors for global features. Further details for the alternate path project will be available upon completion of the work.

ALTERNATE PATHS

The aim of the project was finding routes substantially different from the shortest path based on different criteria. I implemented various shortest path algorithms and compared their efficiency on real world road networks. It also involved proposing algorithms to compute paths according to another feasible (also proposed) definition. FInally, we created measures to compare different algorithms developed efficient algorithms for the involved computations. Exact plots of the computed alternate paths on various road networks and scores/results will be available soon.

SOCIAL NETWORK ANALYSIS

The project aims at finding good local features which are suitable predictors for global features. I implemented and studied randomized rumor spreading, the relation between size and steps for spread of the rumor. I further explored different local features in graphs based on walks, subgraph densities, centrality measures and their relation with other global properties along with arguments to explain the obtained results. I also collaborated with the ERC (Funded by the European Research Council) World Seastems project during the course of the internship.

MORE SOON

CONFIDENTIAL (FOR NOW)

PROJECTS

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

Abstract:

We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while discarding a majority of the non-informative frames in a single temporal scan of the video. Our algorithm does so by continuously predicting the discriminative importance of each video frame and subsequently pooling them in a deep learning framework. We show the effectiveness of our proposed pooling method on standard benchmarks where it consistently improves on baseline pooling methods, with both RGB and optical flow based Convolutional networks. Further, in combination with complementary video representations, we show results that are competitive with respect to the state-of-the-art results on two challenging and publicly available benchmark datasets.

Results:

Relevant Links:

PAPER

CODE REPOSITORY (SOON)

Visual Story Telling

THE PROJECT

This was a course project in the course CS698A: Recent Advances in Computer Vision, under Prof. Gaurav Sharma. Visual Story telling is relatively new task with hardly any prior work. The task involves mapping sequential images to sequential, human like, narrative descriptive sentences or ’stories’. Simply put, it involves understanding a sequence of images and trying to explain its contents in a story like manner. The problem was introduced recently along with a newly constructed dataset (released by MSR).

PROJECT DESCRIPTION

The project deals with the task of visual story telling i.e. constructing narratives given sequential images. In our approach, we encode the image sequence by passing it through a GRU. This is used as the initial hidden state of the story decoder network, which is also modelled as a GRU. Finally, beam search is used to produce the story for each sequence. Specific heuristics are also discussed to further improve performance. We work with a newly constructed dataset released by MSR which consists of around 80k images in 20k sequences. We use METEOR and BLEU scores as evaluation metrics for this task. The code repository is hosted at GitHub.

THE RESULTS

Although the metrics used (METEOR and BLEU) correlate well with human judgements, their ineffectiveness in such a task is notable. They often fail to assign accurate scores to the generated stories, mainly due to the numerous 'correct' stroies for each image sequence. Other qualitative results are provided in the report and presentation.

PRESENTATION

REPORT

CODE REPOSITORY (SOON)

Visual Question Answering via Semantic Attention

THE PROJECT

This was a course project in the course CS676A: Computer Vision and Image Processing, under Prof. Vinay Namboodiri. The project deals with the problem of Visual Question Answering. We propose several models to tackle the problem consisting of models consisting of Bag of Words, and deep networks (such as CNNs and LSTMs). We also explore the role of attention in improving the performance of the model. The dataset used is the popular VQA dataset based on MS COCO.

PROJECT DESCRIPTION

The project deals with the task of visual question answering i.e. building a system capable of answering open ended questions on real world images. In our approach, we first compute a representation of the question by using word vectors. We encode the image by taking the activations of VGG-16. These represenations are then used to compute the answer. We study the effect of attention models on the performance of the system. More details are present in the report.

THE RESULTS

We use the VQA dataset based on the popular MS COCO image dataset. It currently has 360K questions on 120K images. All the questions are human-generated, and were specifically designed to stump a 'smart robot'. We find that although attention based do not provide competitive performance, they are able to get a few tough questions correct. Other qualitative results are provided in the report and presentation.

POSTER

REPORT

CODE

Bi-modal Regression for Apparent Personality Trait Recognition

Abstract:

The task of the ChaLearn Apparent Personality Analysis: First Impressions Challenge is to rate/quantify personality traits of users in short video sequences. Although the validity of personality judgments from short interactions is questionable, studies show the possibility of predicting attributed traits (First Impressions) using facial and acoustic features. The challenge introduces a newly constructed dataset which consists of manually annotated videos collected from YouTube. In this paper, we present our approach for predicting traits by combining multiple modality specific models. Our models include Deep Networks which focus on leveraging visual information in the given faces, Networks focusing on supplementary information from the background and models using acoustic features. We also discuss another approach for modeling traits as a combination of global and trait-specific variables. We explore methods for extracting fixed length descriptors of videos based on frame-level predictions. We also experiment with various methods for fusing model predictions. We observe that fusion achieves a considerable gain in accuracy over the best stand-alone model, possibly due to utilizing information from all modalities. The proposed method achieves an accuracy gain of approximately 18% above the provided challenge baseline.

Relevant Links:

PRESENTATION

PAPER

CODE REPOSITORY

Real Time Vehicle Classification and License Plate Recognition

THE PROJECT

This was a course project in the course CS771A: Machine Learning, Tools and Techniques, under Prof. Harish Karnick. The project aims at detecting and classifying relevant objects in a video stream (Surveillance Video) in real time. In case of four wheelers, detection and recognition of license plate is also desired. We discuss the major intermediate steps required for the same. In the report, we propose multiple methods and discuss the related results. We also study inter class relationships and effect of fusing classes on the performance.

PROJECT DESCRIPTION

The project aims at detecting and classifying relevant objects in a video stream (Surveillance Video) in real time. In case of four wheelers, detection and recognition of license plate is also desired. The dataset consists surveillance videos from the campus security cameras. We perform data preprocessing and cleaning. Due to the poor dataset quality (and small size), aggressive data augmentation has been performed. The pipeline is divided into intermediate steps, which are discussed in detail in the report and presentation.

THE RESULTS

The dataset has been collected through crowd sourcing and consists of seven classes. There are many issues in the dataset such as numerous incorrectly marked objects, low diversity and low number of images. We experiment with varioius features and classfication algorithms, their effect on performance is discussed in the report. We also experiment with various combination of classes and find that some pair of classes tend to be more confused (Such as bicyle and person). Other qualitative results are also provided.

PRESENTATION

REPORT

CODE

Word Embeddings Using Multiple Word Prototypes

THE PROJECT

This was a project for course CS671A: Introduction to Natural Language Processing, under Prof. Amitabha Mukherjee. Since, existing word vector models do not account for polysemy (reducing the quality), the project aimed at improving word vectors by computing individual word vectors for each sense of a word. The project was completed during Aug '15 - Nov '15.

PROJECT DESCRIPTION

The project deals with the problem of construction of Multiple Sense Embeddings for different words. We develop several models to tackle the problem consisting of Online Clustering Methods, Methods involving Parameter Estimation and also look at methods related to Word Word Co-occurrence matrices. Our model is comparable to the state of the art models and even outperforms them in some cases. We also discuss the possibility of our model giving better (semantically coherent) senses than present models. The proposal, poster, slides and the final report are avialable (given in the links below). The code repository is hosted at GitHub.

THE RESULTS

We compare the performance of our approaches with current state of the art models and find that our model is comparable to the state of the art models and even outperforms them in some cases. Our model is extremely efficient and takes less than 6 hours for complete training and computation of senses. The main task used for comparing the models is the SCWS task. The comparisons have been done using human judgment of semantic similarity.

PROPOSAL

POSTER

REPORT

PRESENTATION

CODE REPOSITORY

Partial Multi-View Clustering Using Graph Regularized NMF

Abstract:

Real-world datasets consist of data representations (views) from different sources which often provide information complementary to each other. Multi-view learning algorithms aim at exploiting the complementary information present in different views for clustering and classification tasks. Several multi-view clustering methods that aim at partitioning objects into clusters based on multiple representations of the object have been proposed. Almost all of the proposed methods assume that each example appears in all views or at least there is one view containing all examples. In real-world settings this assumption might be too restrictive. Recent work on Partial View Clustering addresses this limitation by proposing a Non-negative Matrix Factorization based approach called PVC. Our work extends the PVC work in two directions. First, the current PVC algorithm is designed specifically for two-view datasets. We extend this algorithm for the k partial-view scenario. Second, we extend our k partial-view algorithm to include view specific graph laplacian regularization. This enables the proposed algorithm to exploit the intrinsic geometry of the data distribution in each view. The proposed method, which is referred to as GPMVC (Graph Regularized Partial Multi-View Clustering), is compared against 7 baseline methods (including PVC) on 5 publicly available text and image datasets. In all settings the proposed GPMVC method outperforms all baselines. For the purpose of reproducibility, we provide access to our code.

Results:

Relevant Links:

PRESENTATION

PAPER

CODE REPOSITORY

Computer Networks Course Project

THE PROJECT

This was a course project for the course CS425A: Introduction to Computer Networks, under Prof. Sandeep Shukla. The project is an implementation intensive project consisting of multiple mini projects. The mini projects included implementing a basic HTTP server, Proxy server, basic STCP layer and a simple static router. The project was completed during the 7th semester (Aug '16 - Nov '16).

PROJECT DESCRIPTION

The project is implementation intensive consisting of multiple mini projects. It involved studying and implementing protocols and network layers. The mini projects included implementing a basic HTTP server, Proxy server, basic STCP layer and a simple static router. The relevant reports and code are provided below.

REPORT

CODE

ADA Compiler

THE PROJECT

This was a course Project for the completion of the course CS335A: Compiler Design, under Prof. Subhajit Roy. The project involved creating an End-to-End Compiler for a subset of the programming language ADA in the x86 architecture. The project was completed during the semester (Aug '15 - Nov '15).

PROJECT DESCRIPTION

The project involved implementing a Lexical Analyzer and Assembly-Code Generator in python, constructing grammar rules for parsing our identified language and creating the TAC (Three Address Code) for intermediate code. The compiler supported basic types, operations for Strings, Library support, Short circuiting, conditionals, Loops with strict type-checking and error handling. It also supported functions (Allowed overloading) with multiple return values and scopes.

NachOS Operating System

THE PROJECT

This was a project for course CS330A: Operating Systems, under Prof. Mainak Chaudhuri. The project involved extending the NachOS operating system to perform basic operating system functions including Fork, Join, Sleep and Exec. The project was completed during the semester (Aug '15 - Nov '15).

PROJECT DESCRIPTION

The project involved extending the NachOS operating system to perform basic operating system functions including Fork, Join, Sleep and Exec. Implemented and evaluated performance of various algorithms for scheduling processes. Developed and added support for Demand Paging, Shared Memory, Condition Variables and Semaphores.

Adaptive Strategies for Infinite Prisoner's Dillema

THE PROJECT

This was a course project in the course ECO502A: Applied Game Theory, under Prof. Vimal Kumar. The project deals with the problem of computing successful strategies for Iterated (Infinite) Prisoner’s dilemma. The discussed algorithms include Evolutionary Strategies and Reinforcement Learning (Which compute optimal and also adaptive strategies which perform well against multiple opponents). We proposed new algorithms based on Reinforcement Learning to compute good strategies which perform well against a set of baseline algorithms.

PROJECT DESCRIPTION

The project deals with the problem of computing successful strategies for Iterated (Infinite) Prisoner’s dilemma. We simulate multiple results and also confirm previous findings and claims. We also show the relevance of choosing memory depth as three in Axelrod’s experiments and provide arguments for the same. The discussed algorithms are based on Evolutionary Strategies and Reinforcement Learning. We propose new algorithms based on Reinforcement Learning to compute good strategies performing well against a variety of other strategies. This strategy is also compared against Axelrod’s Evolutionary Strategy.

THE RESULTS

To compute results and analyse the effectiveness of the strategies, we consider Axelrod's Tournament and use a similar setup. We find that our proposed adaptive strategy performs better than other variants. We also discuss the shortcomings and strengths of the algorithms. Other qualitative discussions and results are provided in the report.

REPORT

EXTRA

CODE REPOSITORY

Geometric Data Structures

THE PROJECT

This was a project for Advanced Track in course CS210: Data Structures and Algorithms, under Prof. Surendar Baswana. The project involved re-invention of several geometric data structures to efficiently answer specified queries. Some of the involved topics were Convex Hulls, Dynamic hulls, Fractional Cascading, Simplex query. The project was completed during the semester alongside the course (Sep '14 - Nov '14).

PROJECT DESCRIPTION

The project involved re-invention of several geometric data structures to efficiently answer specified queries. The queries handled were Point in Polygon, Polygon-Line intersection, Simplex problem, Orthogonal Range Search and Half Plane problem. We also re invented solutions for problems such as Improving time complexity of Orthogonal Range Search using Fractional Cascading and Dynamic Convex Hulls.

QUERIES HANDLED

The operation handled were Computation of Convex Hull, Point in Polygon, Maintaining dynamic Convex hull, Line polygon intersection, Half plane query : Number of points on a given side of a line (Using convex hulls), Fractional Cascading : Improvement in Half Plane Query, Orthognal Range search : Number of points in a rectangle, Half Plane query : Using Ham sandwich cuts, Simplex problem

DESCRIPTION

REPORT

Multi Modal Emotion Recognition

THE PROJECT

This was a project in the Summer Camp organized by SnT Council in '14. It was completed in the summers under the Programming Club. The project aims at performing emotion recognition using multiple features i.e. text, visual and acoustic features. We then use ensemble methods on the classifiers formed to get improved results (To consider multiple aspects).

PROJECT DESCRIPTION

This project presents a multimodal emotion recognition model which aims at acheiving higher accuracies than the present ones using all the prevalent features in a video, namely text, audio, video. We are classifying the videos into neutral and the 6 basic Ekman emotions [1]. We have decided to use three independent classifiers, one each for text, audio and video. The text classifier gives positive-negative scores showing the extent to which a given statement is positive or negative.

THE RESULTS

I'll briefly describe the recognition rates of the different classifiers. Our model consisted of three independent classifiers which were merged later. For the text classifier, the accuracy was roughly 74% for all of them, with the Naive Bayes model performing slightly better. The Image classifier used many different approaches. The best accuracy achieved was around 65%. The audio classifier uses Mel frequency cepstrum coefficients as features after a bit of pre and post processing. Support Vector Machines and K Nearest Neigbors are used, the best accuracy acheived was 60%. The result improves when the results of all three are considered together.

PROJECT WIKI

REPORT

DEMONSTRATION

CODE REPOSITORY

News Report Classification

THE PROJECT

This was a project taken in the Second semester (Of First Year) under ACA (Association of Computing Activities). This was my first project related to Machine Learning, and introduced me to the wonderful world of ML. The project aims at classifying new reports into multiple categories. We use basic concepts related to Natural Langauge Proccesing and try a few models to compare their performance.

PROJECT DESCRIPTION

The project aims at classifying news articles into various categories. We trained a Naive Bayes Classifier after processing the article text (Tokenisation, Stemming, removing Stopwords, etc) on scraped news data. We implemented K Nearest Neighbors and automated scraping of online news articles for collection of Training data. We also used NLTK (Library) and The Reuters News dataset as validation sets.

Other Minor Projects

Developed a bot to play Othello. It was based on the Minimax algorithm. Alpha-beta Pruning is also performed to speed up computations. Secured 19th place amongst 2000+ participants from over the world.
Designed a bot to play Battleship. A probabilistic model of the ships and board was used to decide the next move.
Designed application for Analysing brand sentiment through social media channels. Developed during Web-Dev, Takneek '14 and secured First position.
Developed models for predicting Search trends and detecting Spam messages.
Implemented models for Topic assignment for articles, Multi-Label Question classification (Tested on questions taken from Quora).
Developed a Captcha Decoder, able to work with mild occlusions. Clustering and Segmentation based methods used for extracting candidate regions containing characters. They are further passed through a classifier to get the final results.
Completed project to discover patterns and trends about the New York Subway, during the Udacity course: Intro to Data Science.

CODE REPOSITORY

BACK TO HOMEPAGE

Welcome to the Projects page

Read more for details about my projects

Projects at a Glance

Click below to see a categorized list of the projects

This page also contains details of my Internships

The description of my internship projects are also provided below

PROJECTS AT A GLANCE

INTERNSHIPS

University of British Columbia

THE INTERNSHIP

SINGLE-ARM REACH PREDICTION

MERGING POINT CLOUDS

Xerox Research Centre India

THE INTERNSHIP

THE PROJECT

THE RESULTS

I.N.R.I.A. Rocquencourt

THE INTERNSHIP

ALTERNATE PATHS

SOCIAL NETWORK ANALYSIS

PROJECTS

AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

Abstract:

Results:

Relevant Links:

Visual Story Telling

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

Visual Question Answering via Semantic Attention

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

Bi-modal Regression for Apparent Personality Trait Recognition

Abstract:

Relevant Links:

Real Time Vehicle Classification and License Plate Recognition

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

Word Embeddings Using Multiple Word Prototypes

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

Partial Multi-View Clustering Using Graph Regularized NMF

Abstract:

Results:

Relevant Links:

Computer Networks Course Project

THE PROJECT

PROJECT DESCRIPTION

ADA Compiler

THE PROJECT

PROJECT DESCRIPTION

NachOS Operating System

THE PROJECT

PROJECT DESCRIPTION

Adaptive Strategies for Infinite Prisoner's Dillema

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

Geometric Data Structures

THE PROJECT

PROJECT DESCRIPTION

QUERIES HANDLED

Multi Modal Emotion Recognition

THE PROJECT

PROJECT DESCRIPTION

THE RESULTS

News Report Classification

THE PROJECT

PROJECT DESCRIPTION

Other Minor Projects