In this brief, the authors discuss recently explored spectral
(sub-segmental and pitch synchronous) and prosodic (global and local
features at word and syllable levels in different parts of the
utterance) features for discerning emotions in a robust manner. The
authors also delve into the complementary evidences obtained from
excitation source, vocal tract system and prosodic features for the
purpose of enhancing emotion recognition performance. Features based on
speaking rate characteristics are explored with the help of multi-stage
and hybrid models for further improving emotion recognition performance.
Proposed spectral and prosodic features are evaluated on real life
emotional speech corpus.