User Tools

Site Tools


2014pc:view_projects06

Affective Computing


Project Members:


- Nishant Rai
- Atanu Chakraborty
- Sahil Grover
- Amlan Kar


Project Info:


Inspiration for our approach :

Our project in simple words is ” Emotion Recognition”. Let's say we have a video, in which we have to decide the emotions of a subject, the general approach towards it is to use only the facial expressions to decide the emotion. But this has some drawbacks, some emotions are confused with others.
So our approach is basically using a triad based method i.e. considering all visual, audio and textual data for classification. We expect this to increase the accuracy of the classifier. Let's say the visual-emotion classifier is confused between some emotions, then the audio-classifier may come to the rescue. So this is the inspiration for our three way approach to emotion recognition.

(Visual data can be timed shots of the video, audio data are the features of the voice (of the subject), textual data is similar to the subtitles of the video (can be obtained by a speech-text api)).

GitHub Repository :

The link to our repo is : Github Repo.


Project Log:


17 May

- Text Semantic Analysis is tough ! Need to make a database manually, corpus not available easily, still searching.


18 May

Interesting Research Papers :
- We read a couple of research papers, out of which these were the most helpful:
- Text Semantics Analysis : http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4427100
- A wonderful paper on emotion recognition in sentences : http://www.scielo.org.mx/pdf/poli/n45/n45a7.pdf
- Recognizing Emotions in Text : http://www-scf.usc.edu/~saman/pubs/2007-MS-Thesis.pdf

Corpora found : - We found these useful corpora:
- A good Image Database : http://www.kasrl.org/jaffe.html
- A good source of corpora : http://emotion-research.net/wiki/Databases
- Emotional Prosody Speech and Transcripts : https://catalog.ldc.upenn.edu/LDC2002S28. But need permission to access it.
- Mailed the corpus TA at Stanford (Natalia Silveira) for possible access to the Emotional Prosody Speech and Transcripts corpora that's stored on the Stanford servers.
- A text corpora with relevant data for our emotion classifier : http://www.cse.unt.edu/~rada/affectivetext/
- Not exactly a corpora (sort of a dictionary) : http://sentiwordnet.isti.cnr.it/.

- We have made some training data for it manually (Courtesy Atanu and Sahil).
- After some discussion, we decide to assign low priority to the text-emotion classifier. Due to the fact that in normal conversations you really can't decide the emotion solely on the basis of the sentences. We may remake the classifier afterwards.
- Classifier #0 : So moving on… finally, a basic emotion classifier has been completed. Feels good to get things started. The approach was based on finding which 'words' correspond to which emotion, and then a simple 'nearest k' like approach.
- Thinking of a better second approach for the emotion classifier after reading some of the above mentioned papers.


19 May

- Trying to make a slightly different classifier which decides how much a sentence is positive or negative.
- Decided to do two implementations:
- One on the basis of the SentiWordNet corpus, it has a large list of words with their associated positive and negative scores. We plan <br> to do a simple approach consisting of the general housekeeping (removing stopwords, tokenisation, stemming, etc) and then summing up the scores of the words and averaging it.
- The second uses Naïve Bayes approach on following dataset : http://www.cs.cornell.edu/People/pabo/movie-review-data/. The feature vector consist of the dominant affective words remaining after the housekeeping.


20 May

- Wrote a code to extract relevant information from the complicated SentiWordNet 3.0 corpus. The output was written in another text file for future use.
- Classifier #1 : Tried to do an improve our approach by considering POS (part of speech) in a sentence. Handled nouns, adverbs, adjectives and verbs. Modifiers were handled differently, by multiplying their weights with those of the future words. POS tagging requires additional data, but thanks to an already implemented POS-Tagger in NLTK we didn't have to search for it.
- The following links were referred for POS Tagging:
- http://www.monlp.com/2011/11/08/part-of-speech-tags/
- http://www.nltk.org/book/ch05.html
- We tried to classify many simple sentences, and our code produced nice results for a majority of them. Even complicated articles were classified somewhat correctly.


21 May

- Classifier #2 : Finished the Naive Bayes classifier, it's a general approach consisting of some housekeeping and a standard Bernoulli model. The accuracy of the classifier on the movie review dataset is around 0.7 (which is reasonable).
- Classifier #3 : Thinking of a slight improvement in the Naïve Bayes classifier, it consist of marking words according to their POS and then using the Naïve Bayes approach. Had to make a program to check it's accuracy. Ironically, this reduces the accuracy to 0.55, it maybe because the dataset contains only review statements, not natural ones which occur in day to day conversations.
- Classifier #4 : Thinking of improving classifier #1 by merging it with support vector classification. Made a program to check it's accuracy, somehow even this fails to surpass Classifier #2 in terms of accuracy. The accuracy is around 0.6.
- Classifier #5 : Tried to use Classifier #1 with Support Vector Regression instead. The accuracy changes were very minor.
- Classifier #1, #4, #5 had an unconventional and original approach, so it feels bad that they didn't work out as expected. But I still doubt the quality of the test-train data.
- So we have tried almost 6 major implementations (with a lot of hit and trial to find the ideal features). This officially brings an end to the text-classification part (for now).
- One interesting result is that positive sentiments are classified more accurately, around 0.75 or 0.8 in all of them. While the negatives are classified poorly.
- Installed Git for Windows on our computers, finished uploading all the classifiers to the github repo.


22 May

- Installed OpenCV in Windows, integrated it with python. Thus all the code can be written in one language.
- Learnt basic functions such as picture and video I/O, using the laptop's camera in the program, etc.


23 May

- Classifier #6 : Added a modified Naive Bayes classifier to the git repo. This time the accuracy has gone up to 0.75 on the movie review dataset.

- Learnt about the Pickle Library in python, it will be an integral part of all our classifiers. The reason is that it helps in saving the state of a running program :
- Let's say, I have some data as input in my program and I do loads of operations on it to get a final number. This process takes around 2 hours, now I want to use this 'output' in some other program. I obviously won't do that whole operation all over again in the other program, so what to do, I could definitely write down the value manually, but if it's a huge array then it's gonna create some problems. So this is exactly what pickle does. We will personally use it for saving trained classifiers, because we don't want to train it every time we run it.
- Learnt more OpenCV functions and operations like changing color spaces, image smoothing, blurring, edge detection, etc. Made a program which only shows a specified color (blue and red) in a video input.


24 May

- During the project review, we were told about a good API for sentiment analysis : Alchemy, http://www.alchemyapi.com/products/features/sentiment-analysis/.
- But it will just destroy the 6-8 classifiers we have made till now, we won't be able to learn anything significant about NLP (Natural Language Processing) if we use it, thus defeating the whole purpose of this project. Having said that, it really is a “GREAT” API! So we are keeping it on hold, if we aren't able to get good results from our own classifiers then we shall use it.
- Integrated the Pickle library with the rest of the code.
- Continued learning more about OpenCV and its functions.


25 May

- Halfway through the OpenCV tutorials, we have learnt and made programs for Hough transforms, Grabcut algorithm.
- Learning about feature detection (the relevant part for our project).


26 May

- Finished making programs for various feature detection techniques : Corner detection, line detection, etc.
- Finished the rest of the tutorial. Some interesting concepts which maybe useful are background subtraction, optical flow.
- Stuck with Haar Cascade function, it isn't giving any result. Searching the web for possible solutions.
- Finally finished the Haar Cascade program, somewhat a silly mistake. It finds facial features such as nose, mouth, eyes, etc.


27 May

- Searching for libraries which give us the facial feature 'points', since they are extremely important for our project.
- Interesting Libraries :
- FLANDMARK, it finds 7 feature points : http://cmp.felk.cvut.cz/~uricamic/flandmark/. (For C++)
- ASMLIB, it finds many more feature points : https://code.google.com/p/asmlib-opencv/. (For C++)
- STASM, it's surely a keeper, gives 77 feature points and the results are really good : http://www.milbo.users.sonic.net/stasm/. (For C++)
- PYINYOURFACE, Looks promising : https://code.google.com/p/pyinyourface/. (For Python)


28 May

- Thought about the various models we could implement, the best options were :
- Classifying on the basis of motion of the face: The model is based on FACS (Facial Action Coding System). This can only classify emotions in a video.
- Classify on the basis of still images: This model classifies only on the basis of the facial expression at the instant, this can classify emotions in both a video and an image.
- But sadly, both of these need libraries which are only available in C++. So we may need to install OpenCV for C++, and start from there.
- Meanwhile, we thought of other possible implementations of the emotion classifier, which can be done in python. The ideas were related to the still image method and are as follows :
(A) Convert the image to Eigenfaces and then compare them.
(B) Resize the image and construct a feature vector taking all the pixels as features and use SVM, kNN or other methods to classify.
(C) Convert the images to a feature vector, and then reduce the feature size by Principal Component Analysis. Then use classification methods on it.
(D) Instead of considering the whole face as features, use Haar Cascade detectors to find the nose, eyes and mouth, consider only these as features. Then use classification methods on it.
- The dataset we are planning to use is The Japanese Female Facial Expression (JAFFE) Database. It contains 213 images of 7 facial expressions including neutral posed by 10 Japanese female models.


29 May

- Started implementing method B first, since it's the most straight-forward.
- We divided the data into two parts, one the training part (Data X), and the other the testing part (Data Y).

- Implementation details :

- Did some general housekeeping such as:

  1. Using Haar Cascade detector to find the face in the images.
  2. Then crop the image to take only the face region as a new image.
  3. We then convert it to grayscale and resize the image to (25,25), thus giving a feature vector of dimension 625.

- Now that are feature vector is ready, we train and check the result of many classification methods. The classification algorithms used and their results are as follows:

  1. Support Vector Machines: This wasn't expected to work very well since our training data was very small. But it finally gave an recognition accuracy of about 0.35 on 'Data Y'.
  2. Support Vector Machines (With weights): We suspected that some specific emotions were causing havoc, so we tried playing with the weights. But it only improved marginally i.e. recognition accuracy was 0.4.
  3. Decision trees: They scale well even if we have a small dataset. So we had some expectations with it. The single decision gave a recognition accuracy of about 0.3.
  4. Randomised Decision trees: This one is a better algorithm for our case, so it should have worked. It consists of voting of many weak decision trees. This didn't disappoint and gave a recognition accuracy of about 0.5 …. Still bad, I know :(

- Then we tried other variations like one vs one classifiers, one vs the rest classifiers, but the result was almost the same.
- Hmmm… well, if you gave a monkey some placards with different emotions written on it, and asked him to tell your emotion. It would do a better job than our classifier. Despite giving an accuracy of about 0.4 (on average). All the classifiers faltered big time (it was similar to random guesses) when they dealt with real data (through my webcam). Now the possible reasons are… my webcam is bad (I don't want that) or our classifier sucks (I don't want that either!).
- On some debugging, it was found that the classifier always gave almost a constant answer for real data. This is a huge mystery. (Solved, see 31 May)


30 May

- So a new day starts, hopefully better than yesterday.
- The day was spent mainly on reading some (i.e. many) research papers. Some of the interesting ones are as follows:
- Using PCA algorithm: http://www.ijsce.org/attachments/File/v3i4/D1824093413.pdf
- Using AAM models: http://people.uncw.edu/pattersone/research/publications/RatliffPatterson_HCI2008.pdf
- Using PCA method and then kNN: http://www.ijcaonline.org/volume9/number12/pxc3871933.pdf
- Uses motion of the face and then SVM: http://www.cs.cmu.edu/~pmichel/publications/Michel-FacExpRecSVMAbstract.pdf
- Uses PCA and neural networks: http://uav.ro/stiinte_exacte/journal/index.php/TAMCS/article/viewFile/2/11


31 May

- Finally found out the explanation to the mystery, our code wasn't robust. It needed exact lighting conditions (same as that of the training set), same contrast and other things. (Solution to Mystery)
- Started implementing method C. Many research papers say it gives good results.
- We divided the data into two parts, one the training part (Data X), and the other the testing part (Data Y).

- Implementation details :

- Did some general housekeeping such as:

  1. Using Haar Cascade detector to find the face in the images.
  2. Then crop the image to take only the face region as a new image.
  3. We then convert it to grayscale and resize the image to (200,140), this large image size will be taken care of by the PCA part.

- We read about implementation of PCA from A Good tutorial and Implementing PCA in Python.
- Finally decided to use a library function and not waste time, since we are already behind. Read, Answers on Stack overflow, A sklearn function for PCA and A matplotlib function.
- Decided to use the sklearn function. Extracted the features from the images and then finally made a reduced feature vector using the PCA function.
- Trained it using a self implemented algorithm similar to the nearest k neighbors.
- The result was again disheartening, a recognition accuracy of about 0.35. But we felt, it will be better than the other classifiers on real data . Tried it on my webcam, and the results were poor (again, oh boy!).
- But it was expected, since we didn't adjust the lighting conditions.
- Now working on improving the robustness of the program.


1 June

- Read about normalizing image intensity: Stack overflow, Wikipedia, It's implementation in Python and some others.
- Tried to use binary images as features, it gave poor results compared to the grayscale one.


2 June

- Tried to apply a mixture of ideas for the classifier i.e. used idea (D) and tried it with (C) and (B).
- Tried Haar Cascade classifier for finding facial features, the results were a bit poor i.e. it detected 2 noses, 4 eyes and a couple of mouths in a single face!.
- Played around with the threshold for the respective functions to increase accuracy.
- Much to our delight, it rocked! The accuracy for the nose detector was 100% (it always found one nose). The eye pair detector found 2 eyes almost 96% of the time. The mouth detector worked correctly for 92% of the cases.
- Used the Haar Cascade detectors and clubbed it with idea (B).
- (D) with (B): It gave a recognition accuracy of about 0.3 with completely different faces, and 0.55 otherwise.


3 June

- Implemented the algorithmwith (D) and (C).
- (D) with (C): It gave a recognition accuracy of about 0.35 with completely different faces, and 0.65 otherwise.


4 June

- Thinking about the possible improvements in the feature extraction process.
- The area with the largest scope for improvement is robustness. The things we have shortlisted are as follows:

  1. Noise reduction in original image
  2. Image Intensity normalization

- So decide to do some research on them.


5 June

- For noise reduction, we read about different methods (in image smoothing) : Bilateral filter, Gaussian blur, Image noise, Median filter and Noise reduction.
- Finally we searched for functions present in opencv for image smoothing. We found a program performing normalized box filtering.
- We decided to blur the image using a 2×2 kernel and using the opencv filter2D function.
- Tried this method on our images, the results were nice. The images looked much more smooth, and unnecessary blots were also reduced.


6 June

- For image intensity normalisation, we read different sources : Source (1), Source (2), Source (3), Source (4), Source (5).
- Finally decide to use Histogram Normalisation (Source 4) and CLAHE algorithm (Source 5).
- Used the equaliseHist() function in opencv and tested it on our images, it worked very nicely.
- Integrated it into our code along with the noise reducer.


7 June

- After all those modifications, we tested our code. And to our delight the recognition accuracy for different faces went up to 0.5, and 0.72 otherwise.
- The accuracy was dragged down by some emotions such as disgust and fear.
- We tried CLAHE algorithm, but it reduced the overall accuracy.


8-9 June

- A relaxing break in Nainital.


10 June

- Tried and tested other approaches, like changing class weights in LinearSVC, using decision trees, etc.
The accuracy remained roughly same.
- Tried a slightly different approach for normalisation inspired by CLAHE. Instead of normalising the entire image histogram, we could extract only the face from the image and normalise it. This helps in case of a disturbing background.
- The recognition accuracy became 0.45 after this, but we decided to keep it since it felt alright.


11 June

- Tried classifying our own emotions from the webcam.
- The results were not bad, but not good either. So we decided that a database with only Japanese women as subjects isn't really a good idea. So we searched for a different database, and found many others : Link for many Databases.


12 June

- Downloaded an database with around 500 images, of which 300 were usable.
- The database consisted of many people with different ages and genders. So we were hopeful it would help our cause.
- Trained the classifier with it. The final recognition accuracy was 0.43 for different faces, and 0.62 otherwise.
- For real world data it worked considerably better than the previous one. We didn't formally calculate the recognition accuracy, but it was good for emotions like happiness, surprise, anger, disgust and poor otherwise.


13 June

- Started working on the audio classifier.
- Read from some sources about emotion recognition using speech:

- Started looking for suitable libraries which would help in extracting the audio features.


14 June

- Tried installing OPENSMILE, its a library which works in C++. - Successful in installing it after a few failed attempts.
- Read the docs and finally concluded that it won't do us any good. But still it has nice features if you ever want to do some audio processing.
- So we again moved back to python, read about libraries like pyaudio, pythonspeechfeatures, pythoninmusic and others.
- Tried installing some of them, but they were so unpopular that we couldn't use setuptools (*SIGH*).
- A way around this (jugaad) was to go to the source code and directly use all the functions from there, copying them into our code.


15 June

- So we tried the jugaad, and it worked (*YAY*). Then we tried some basic i/o and stuff which the library provided.
- Then we read about the possible features of speech we could use:

  1. Pitch features
  2. Energy contours
  3. MFCC and related stuff (it's median, mean, maxima, etc)
  4. LFPC and related data

- Due to the lack of appropriate libraries, we weren't able to extract anything except the MFCC. And the shortage of time prevented us from implementing the other stuff by ourselves.
- So we decided to make a basic model with only mfcc features.


16 June

- The database we used was “Berlin emotional speech database”.
- It consisted of 6 emotions instead of the 7 we were aiming for. But we decided to add our own data for the the 7th one later.
- We divided the data into two parts, one the training part (Data X), and the other the testing part (Data Y).

- Implementation details :

- Used pyaudio and scikits.audiolab for audio input-output.
- We calculated MFCC features for all the frames of an audio clip. For each frame, the MFCC was a 13 dimensional vector.
- So there was a problem, we couldn't make all of them features, since the dimension of the feature vector would be overwhelming. So we read about vector quantization.
- We used the scipy.cluster library for vector quantization.
- So we had codes with values from 0-63, then each set of mfcc vector (13 values) was assigned a code using the scipy.cluster algorithm.
- Thus we had a sequence of numbers ranging from 0-63. This step brought down the initial problem to a problem which is well suited for Markov Models.
- We read about Hidden Markov Models and Gaussian mixture models.
- We decided to use HMM as the classifier, but finding a good library for it was again a problem.


17 June

- We tried the sklearn hmm, but it wasn't maintained for along time and gave us a lot of problems.
- Then surfed the internet for other sources, found some homebrew codes but they didn't satisfy our needs.
- Found one implementation, but when we used it, it gave a lot of errors. We then tried to debug the whole source code (*IN VAIN*),


18 June

- So hmm's were the ones designed for the job, but due to the lack of libraries we couldn't use them.
- We then tried our good ol' SVM and nearest K algorithms.
- SVM gave a recognition rate of 0.25, while nearest K was at 0.32 (*DON'T LAUGH*).
- Then we tried removing the whole vector quantization thingy. Instead we directly used all the 13 mfcc values.
- It helped a bit, the recognition rate was up to 0.43 for SVM, and 0.45 for nearest K.


19 June

- We then tried modifying the number of frames we considered at a time, to see if it changed the recognition rate. The maxima was attained for 100-200 frames.
- For 100 frames, the recognition rate was 0.525 for SVM, and 0.5 for nearest K.
- And just for those who mock us. The recognition rate for humans on the same database is about 0.7. So our classifier isn't extremely bad.
- We tried to experiment with other possible features like mean, variance and extremes.
- The approach really helped, the dimension was now reduced to 17 from the previous 1300, and the accuracy was almost the same.


20 June

- Next we thought of sorting the mfcc vectors according to some specific element (maybe the second element in the vector). The inspiration came from the fact that the sounds may be displaced a bit.
- So we ran a loop to see which index is the best. And the first and third ones were the best. The recognition rate for them were around 0.58, sometimes 0.64.


21 June

- We tested the classifier on microphone input, and it messed up really really bad!
- The output was invariably sad. Again the same problem like the image classifier, the classifier wasn't robust.
- So we read about the things we could do, noise reduction, loudness equalization.
- To use it we needed another command line tool called SoX. Fortunately, it installed without any major troubles.
- We wrote a code for nose reduction, which would hopefully make the classifier work well in tough situations.


22 June

- We started integrating pickle library in our codes, so that the classifiers could be loaded quickly.
- Finished making pickle files for all the classifiers.
- Selected the best classifiers, i.e. ones with the highest accuracy.
- Merged all of them together into a single program with a basic python interface.
- Implemented classifier merging, the whole aim of the project and finished up the main program.


Finally after a lot of hardships our project is completed :)

2014pc/view_projects06.txt · Last modified: 2014/06/24 15:49 by pcp06