• Hello, I'm Nishant Rai

    Computer Vision and Machine Learning Enthusiast

    Specially interested in Vision-Language Models, Multi-Modal Learning, and Video Understanding.

    Currently a Tech Lead in the Perception team at Waymo
    Teaching robots how to understand the most complex agents on roads - humans!

    Soft spot for startups having jumped around at Nuro, Fyusion, Rubrik, and more.

    Stanford University: Master's in CS, AI track. Member of Stanford Vision and Learning Lab.
    IIT Kanpur: Bachelors in Computer Science, graduated top of the class.


    Beyond my professional pursuits, I am a passionate sports enthusiast along with a love for skiing and diving. In quieter moments, I find solace in sketching and reading books.

  • Experience and Research


    I am broadly interested in Computer Vision and Deep Learning applications.
    While my journey in the field began with a focus on Multi-Modal Learning, it branched into Video Understanding and Self-Supervised Learning, and has recently expanded into Foundation Vision-Language Models.

    I was fortunate to have my time at Stanford funded through research and course appointments. During the program, I was working with Prof. Juan Carlos Niebles and Prof. Ehsan Adeli as a research assistant in the Stanford Vision and Learning Lab, focusing on human action understanding and video representation learning using multiple modalities.

    Prior to this, I was pursuing my undergrad in computer science at the Indian Institute of Technology Kanpur, where I was introduced to computer vision research working with Dr. Gaurav Sharma and Dr. Karan Sikka on human action understanding.

    Prior to finding my current research interests, I jumped around as a research assistant at University of British Columbia, developing algorithms for safe human-robot interaction, and at I.N.R.I.A. Paris researching algorithms for shortest routes in massive road networks.

  • Recognition


    • Selected as a "Rising Star in Self-Driving" in Business Insider's "Self-Driving: 35 under 35" list.
    • Recevied Distinction in Research at Stanford University on my thesis "Less is More: Video Understanding with Limited Supervision".
    • Gold Medal for best academic performance in the department at IIT Kanpur.
    • Academic Excellence Award for outstanding performance across all academic years.
    • Scholarship for outstanding all-round achievement at IIT Kanpur.
    • Gold Medal for exceptional performance in the Indian National Physics Olympiad.
    • Amongst 6 senior track recipients to be awarded a Gold Medal in the Indian National Mathematics Olympiad.

    Teaching


    • Teaching Assistant for CS231N: Convolutional Neural Networks for Visual Recognition in '21 at Stanford.
    • Teaching Assistant for CS231N: Convolutional Neural Networks for Visual Recognition in '20 at Stanford.
    • Teaching Assistant for CS145: Data Management and Data Systems in '19 at Stanford.
    • Section Leader for ESC101: Introduction to Programming in '16 at IIT Kanpur.
    • Section Leader for ESC101: Introduction to Programming in '17 at IIT Kanpur.
    • Member of the institute-wide Academic Core Team in Counseling Service at IIT Kanpur.





ABOUT ME

profile

I am currently a Tech Lead in the wonderful Perception team at Waymo. I focus on empowering robots to interpret complex road scenarios, particularly human behaviors.

My expertise lies in Computer Vision and Machine Learning, with a keen interest in Vision-Language Models, Multi-Modal Learning, and Video Understanding.

I graduated from a Master's program at Stanford University, where I was fortunate to be funded through research and course appointments engaging in human action understanding and video representation research. During my undergrad at IIT Kanpur, my curiosity led me to explore human-robot interaction and shortest path algorithms at the University of British Columbia and I.N.R.I.A., before delving into computer vision and machine learning.

Apart from research, I had a strong interest in algorithmic challenges, which guided me to the world of competitive programming. Apart from that, I held a hovering interest in quantitative trading and still enjoy talking about markets and derivatives in day to day life.

Beyond my professional pursuits, I am a passionate sports enthusiast along with a love for skiing and diving. In quieter moments, I find solace in sketching and reading books.




PUBLICATIONS


AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos

Amlan Kar* (IIT Kanpur), Nishant Rai* (IIT Kanpur), Karan Sikka (SRI & UCSD), Gaurav Sharma (IIT Kanpur)
  • Accepted in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017

Bi-Modal Regression for Apparent Personality Trait Recognition

Nishant Rai (IIT Kanpur)
  • Accepted as an Oral Presentation in the workshop on Multimedia Challenges Beyond Visual Analysis at the 23rd International Conference on Pattern Recognition (ICPR)

Partial Multi-View Clustering Using Graph Regularized NMF

Nishant Rai (IIT Kanpur), Sumit Negi (Amazon Dev. Center), Santanu Chaudhury (IIT Delhi), Om Deshmukh (XRCI)
  • Accepted as an Oral Presentation at the 23rd International Conference on Pattern Recognition (ICPR)

INTERNSHIPS

CARIS Lab, University of British Columbia

MITACS Research Internship

XRCI Centre, Bangalore, India

XRCI Research Internship

INRIA Rocquencourt, France

INRIA Research Internship






SELECTED PROJECTS


AdaScan: Adaptive Scan Pooling for Human Action Recognition in Videos

  • Project aimed at Human Action Recognition through Deep Convolutional Neural Networks by proposing a novel method for temporally pooling frames in a video. The proposed method learns to pool such discriminative and informative frames, while discarding a majority of the non-informative frames in a single temporal scan of the video

ChaLearn: Apparent Personality Trait Analysis

  • The task of the ChaLearn Apparent Personality Analysis: First Impressions Challenge is to rate/quantify personality traits of users in short video sequences.
  • Our approach focuses on combination of multiple modality specific models. Our models include Deep networks focusing on leveraging visual information, networks focusing on supplementary background information and models using acoustic features.

Visual Story Telling

  • Project aimed at generating stories given a sequence of images. The project involved implementing an attention based model and baselines for the recently released Visual Storytelling task (#TODO).

Visual Question Answering via Semantic Attention

  • Project aimed at implementing Deep Neural Network based approaches for answering Open-ended questions about images. We propose a semantic attention based model which employs attention upon the semantic represenation of the image.

Real Time Vehicle Recognition and Automatic Number Plate Recognition

  • Project aimed at Real-time vehicle recognition along with extracting Registration Numbers from the License Plates of four-wheelers in real world surveillance videos.

Partial Multi-View Clustering Using Graph Regularized NMF

  • The project deals with the problem of clustering data using information present in Multiple Views. We propose several extensions to tackle the Partial Multi View problem which involve data with missing views i.e. not all instances have all views.
  • We extend our algorithm to the k partial-view scenario. We also extend our algorithm to include view specific graph laplacian regularization, enabling the model to exploit the intrinsic geometry of the data distribution in each view.

Alternate Paths in Road Networks

  • Project deals with finding alternate routes substantially different from the shortest path based on specific criteria.
  • The project involved implementing various shortest path algorithms and comparing their efficiency on real world road networks. We proposed algorithms to compute paths according to another feasible definition. We also proposed measures to compare different algorithms and developed efficient algorithms for the involved computations.




OTHER PROJECTS


Aug '15 - Nov '15

  • Project aimed at constructing of Multiple Sense Embeddings for different words using purely unsupervised approaches

May '16 - Jul '16

  • Project (During internship) aimed at prediction of single-arm reaching motion by humans in order to create smooth and safe Human-Robot interactions

May '16 - Jul '16

  • Project (During internship) aimed at for aligning points clouds being received from multiple Kinects. Supporting project for improving the performance of other setups present in the lab.

May '14 - Jun '14

  • Project aimed at performing Emotion Detection using multiple modalities i.e. textual, speech and visual.

Aug '16 - Nov '16

  • Course Project for the course CS425A: Computer Networks. Involved implementing a basic HTTP server, Proxy server, basic STCP layer and a router.

Jan '16 - Apr '16

  • Proposed, implemented and analyzed new adaptive strategies for Infinite Prisoner's Dilemma

Jan '16 - Apr '16

  • Implemented an End-to-End Compiler for a subset of ADA in the x86 Architecture

May '15 - Jul '15

  • Project aimed at finding good local features which are suitable predictors for global features of small-world graphs

Jul '15 - Nov '15

  • Extended the NachOS operating system to perform basic operating system functions, implementing various scheduling algorithms and adding support for paging and locks

Jan '14 - Apr '14

  • Project aimed at classifying news articles into various categories (Such as Politics, Technology, etc)

Sep '14 - Nov '14

  • Project involved re-invention, implementation and analysis of geometric data structures to efficiently answer queries (See details)

Aug '14

  • Application developed during Web-Dev, Takneek '14 and secured First position. Created an interface to analyze the past and present social sentiment of brands and their products.

Other Minor Projects

  • Developed a bot to play Othello. Secured 19th place amongst 2000+ participants from over the world.
  • Designed a bot to play Battleship. A probabilistic model of the ships and board was used to decide the next move.
  • Developed models for predicting Search trends and detecting Spam messages.
  • Implemented models for Topic assignment for articles, Multi-Label Question classification (Tested on questions taken from Quora).
  • Designed a Captcha Decoder, able to work with mild occlusions. Clustering and Segmentation based methods used for extracting candidate regions containing characters.
  • Completed project to discover patterns and trends about the New York Subway, under the Udacity course: Intro to Data Science.




GET IN TOUCH

Stanford, California

first_name + last_name (at) google (dot) com