Publications and Data

This page organizes my research publications. Go to the 'Research' page for an overview of my research. See this poster for an overview of the kind of work I have done in the last few years in Computational Affect. If you are a student interested in working with me, go here.

 

Publications on the Google Scholar page:

 

 

Publications and Data by Area (papers within each area are organized reverse chronologically)


Emotions and Language Digital Humanities AI/NLP Ethics
Sentiment AnalysisComputational Social ScienceAI/NLP Scientometrics
Lexical SemanticsNLP for Psychology, Well-BeingAI/NLP for Africa, Asia

in Emotions and Language, Sentiment Analysis

emotion
analysis
emotion
dynamics
sentiment
analysis
Sentiment (Africa, Asia) Stance
detection*
Best-Worst
Annotations*
Personality
Traits

Music
from Text*
in Lexical Semantics
antonymy,
contrast
evolution of words
metaphor
relational
similarity
semantic
distance
text
summarization
textual
inference
word-colour
associations
word sense
disambiguation

Terms of use of the data are at the bottom of the page.

 

Home

Emotions and Language, Emotion Analysis (joy, sadness, fear, optimism, anger, hope, etc.)

 

Pinned Data and Systems

Several word-emotion association lexicons (such as the NRC Emotion Lexicon), word-sentiment lexicons (such as the NRC Hashtag Sentiment Lexicon), and word-colour association lexicons are available here.

Emotion Dynamics: Python software to analyze emotions in text using emotion lexicons. The script generates a csv file with a number of emotion features of the text, including metrics of utterance emotion dynamics. Associated Paper.

For the 2013 and 2014 Competition-winning NRC-Canada sentiment anaysis system, go here.

Pinned Book Chapter

Sentiment Analysis: Automatically Detecting Valence, Emotions, and Other Affectual States from Text. Saif M. Mohammad, Emotion Measurement (Second Edition), Elsevier, 2021.
PDF (arxiv preprint arXiv:2005.11882)    BibTeX

Pinned Journal Paper

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis. Saif M. Mohammad. To Appear in Computational Linguistics.
arXiv:2109.08256. June 2022.
Paper (pdf)    BibTeX    Slides    Poster

Paper

Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada. Krishnapriya Vishnubhotla and Saif M. Mohammad. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC-2022), May 2022, Marseille, France.
Paper (pdf)    BibTeX      Project Home Page (Code and Data)    Poster   Slides

Journal Paper

Emotion Dynamics in Movie Dialogues. Will E. Hipson and Saif M. Mohammad. arXiv preprint arXiv:2103.01345. March 2021. (To appear in PLOS One, 2021)
Paper (pdf)    BibTeX     Code

Examining the Language of Solitude vs. Loneliness in Tweets.  Will E. Hipson, Svetlana Kiritchenko, Robert J. Coplan, Saif M. Mohammad. Journal of Social and Personal Relationships. March 2021. 
Paper (pdf)    BibTeX

Paper

PoKi: A Large Dataset of Poems by Children. Will E. Hipson, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX     Project Home Page and Data

SOLO: A Corpus of Tweets for Examining the State of Being Alone. Svetlana Kiritchenko, Will Hipson, Robert Coplan, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX       Data

Journal Paper

AffectiveTweets: a Weka Package for Analyzing Affect in Tweets. Felipe Bravo-Marquez, Eibe Frank, Bernhard Pfahringer, Saif M. Mohammad. Journal of Machine Learning Research, 20(92):1−6, 2019.
Paper (pdf)    BibTeX      Code

Papers

How do we feel when a robot dies? Emotions expressed on Twitter before and after hitchBOT’s destruction. Kathleen C. Fraser, Frauke Zeller, David Harris Smith, Saif M. Mohammad, and Frank Rudicz. In Proceedings of the NAACL workshop on computational approaches to subjectivity, sentiment, and social media analysis (WASSA-19), June 2019, Minnesota, USA.
Paper (pdf)    BibTeX       Slides       Visualizations

Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. Saif M. Mohammad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 2018.
Paper (pdf)    BibTeX    Project Page and Data       Presentation    Video    Poster

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of *Sem, New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX    Project Page and Data     Presentation

Agree or Disagree: Predicting Judgments on Nuanced Assertions. Michael Wojatzki, Torsten Zesch, Saif M. Mohammad, and  Svetlana Kiritchenko. In Proceedings of *Sem, New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX        Project Page and Data        Presentation

Semeval-2018 Task 1: Affect in tweets. Saif M. Mohammad, Felipe Bravo-Marquez, Mohammad Salameh, and Svetlana Kiritchenko. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX       Data and Visualization     Presentation

SemEval-2018 Task 1: Affect in Tweets Webpage

75 teams and about 200 participants.

DeepMiner at SemEval-2018 Task 1: Emotion Intensity Recognition Using Deep Representation Learning. Habibeh Naderi, Svetlana Kiritchenko, Saif M. Mohammad, and Stan Matwin. In Proceedings of International Workshop on Semantic Evaluation (SemEval-2018), New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX  

WikiArt Emotions: An Annotated Dataset of Emotions Evoked by Art. Saif M. Mohammad and Svetlana Kiritchenko. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)    BibTeX       Poster       Project Page and Data

Word Affect Intensities. Saif M. Mohammad. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)    BibTeX       Presentation       Project Page and Data

Understanding Emotions: A Dataset of Tweets to Study Interactions between Affect Categories. Saif M. Mohammad and Svetlana Kiritchenko. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)    BibTeX       Presentation       Shared Task Page and Data

Quantifying Qualitative Data for Understanding Controversial Issues. Michael Wojatzki, Saif M. Mohammad, Torsten Zesch, and Svetlana Kiritchenko. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)    BibTeX        Presentation       Project Page and Data

WASSA-2017 Shared Task on Emotion Intensity. Saif M. Mohammad and Felipe Bravo-Marquez. In Proceedings of the EMNLP 2017 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), September 2017, Copenhagen, Denmark.
Paper (pdf)    BibTex     Data and Shared Task    Presentation

Emotion Intensities in Tweets. Saif M. Mohammad and Felipe Bravo-Marquez. In Proceedings of the Sixth Joint Conference on Lexical and Computational Semantics (*Sem), August 2017, Vancouver, Canada.
Paper (pdf)    BibTex     Data and Shared Task    AffcetiveTweets package    Presentation

Word Affect Intensities. Saif M. Mohammad. arXiv preprint arXiv:1704.08798, April 2017.
Paper (pdf)   

Metaphor as a Medium for Emotion: An Empirical Study, Saif M. Mohammad, Ekaterina Shutova, and Peter Turney. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX    Presentation       Data and Interactive Visualization

Book Chapter

Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text. Saif M. Mohammad, Emotion Measurement, 2016.
Pre-print version     BibTeX
This is a survey on automatic methods for affect analysis.

Paper

Determining Word-Emotion Associations from Tweets by Multi-Label Classification. Felipe Bravo-Marquez, Eibe Frank, Saif Mohammad, and Bernhard Pfahringer. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI'16), October 2016, Omaha, Nebraska, USA.
Paper (pdf)    BibTeX    Data (scroll to section on this paper)

Interactive Visualization and Paper

Imagisaurus: An Interactive Visualizer of Valence and Emotion in the Roget’s Thesaurus. Saif M. Mohammad. In Proceedings of the EMNLP 2015 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), September 2015, Lisbon, Portugal.
Paper (pdf)    BibTeX    Interactive Visualization

Data

The NRC Emotion Lexicon is now available in over 20 languages.

Tutorial

Computational Analysis of Affect and Emotion in Language. Saif M. Mohammad and Cecilia Ovesdotter Alm. Tutorial at the 2015 Conference on Empirical Methods on Natural Language Processing, September 2015, Lisboa, Portugal.
Presentation       Annotated Bibliography       Extended Bibliography      Proposal

Visualization

Explore the interactive visualization for the NRC Word-Emotion Association Lexicon.

Symposium

My N is Ten Million: Using Social Media to Track Emotion, Mental Health, and Measure Personality Across Entire Populations. Gregory J Park, Saif M Mohammad, and Johannes C Eichstaedt. A symposium at the International Convention of Psychological Science (ICPS), March 2015, Amsterdam, The Netherlands.

Journal paper

Sentiment, Emotion, Purpose, and Style in Electoral Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Xiaodan Zhu, and Joel Martin. Information Processing and Management, Volume 51, Issue 4, July 2015, Pages 480–499.
Paper (pdf)    BibTeX     AnnotatedData

Papers

Semantic Role Labeling of Emotions in Tweets. Saif M. Mohammad, Xiaodan Zhu, and Joel Martin, In Proceedings of the ACL 2014 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, Baltimore, MD.
Paper (pdf)    BibTeX     AnnotatedData

Generating Music from Literature. Hannah Davis and Saif M. Mohammad, In Proceedings of the EACL Workshop on Computational Linguistics for Literature, April 2014, Gothenburg, Sweden.
Paper (pdf)   BibTeX    TransProse Website

Notable Press Mentions: The Physics arXiv Blog, March 20, 2014, TIME, May 7, 2014, PC World, May 15, 2014, Popular Science, May 14, 2014, io9, May 12, 2014, LiveScience, May 11, 2014.

Journal Papers

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Crowdsourcing a Word-Emotion Association Lexicon, Saif Mohammad and Peter Turney, Computational Intelligence, 29 (3), 436-465, 2013.
Paper (pdf)    BibTeX

Press Mention: article in MIT Technology Review
Also published in crowdsourcing.org.

Data

The NRC Word-Emotion Association Lexicon (also called EmoLex) is available here. Explore the interactive visualization.

Papers

Using Nuances of Emotion to Identify Personality, Saif M. Mohammad and Svetlana Kiritchenko, In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA.
Paper (pdf)
   BibTeX  Poster

Identifying Purpose Behind Electoral Tweets, Saif Mohammad, Svetlana Kiritchenko and Joel Martin, In Proceedings of the KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM-2013), August 2013, Chicago, USA.
Paper (pdf)    BibTeX      AnnotatedData

Press Mention: article in TIME

Journal Papers

From Once Upon a Time to Happily Ever After: Tracking Emotions in Mail and Books, Saif Mohammad, Decision Support Systems, Volume 53, Issue 4, November 2012, Pages 730–741.
Paper (pdf)    BibTeX

The 2011 NRC technical report is available here: Sentiment Analysis of Mail and Books.

Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes. Colin Cherry, Saif Mohammad, and Berry de Bruijn. Journal of Biomedical Informatics Insights, 5 (Suppl. 1), 147--154, January 2012.
Paper (pdf)    BibTeX

Paper

#Emotional Tweets, Saif Mohammad, In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*Sem), June 2012, Montreal, Canada.
Paper (pdf)    BibTeX

Data

NRC Hashtag Emotion Lexicon. The Hashtag Emotion Corpus (aka Twitter Emotion Corpus, or TEC) used to create the lexicon.

Papers

Portable Features for Classifying Emotional Text, Saif Mohammad, In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 2012, Montreal, Canada.
Paper (pdf)
    BibTeX

Getting Emotional About News. Alistair Kennedy, Anna Kazantseva, Saif Mohammad, Terry Copeck, Diana Inkpen, Stan Szpakowicz. In Proceedings of the Text Analysis Conference (TAC-2011), November 2011, Gaithersburg, MD.
Paper (pdf)    BibTeX

Tracking Sentiment in Mail: How Genders Differ on Emotional Axes, Saif Mohammad and Tony Yang, In Proceedings of the ACL 2011 Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation 

Data

Collections of love letters, hate mail, and suicide notes.
A mapping of directory names in the Enron email corpus to email ids and to gender.

Papers

From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales, Saif Mohammad, In Proceedings of the ACL 2011 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation

Associations of Words with Emotion, Polarity, and Colour: Crowdsoursing a Lexicon, Saif Mohammad and Peter Turney, Technical Report, National Research Council Canada, Ottawa, Canada.
Paper (pdf)    BibTeX

Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon, Saif Mohammad and Peter Turney, In Proceedings of the NAACL-HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, June 2010, LA, California.
Paper (pdf)    BibTeX    Presentation

Invited Talk

From Once Upon a Time to Happily Ever After: Tracking Emotions in Books and Mail.
         July 2011: Amazon, Social Media group and Digital Books group, Seattle, OR.
         June 2011: Social Media - "Big Data" Analysis Workshop, Defence R&D Canada, Ottawa, Canada.

 

Sentiment Analysis, Valence in Language and Text (positive, negative, neutral)

 

Data

Several word-emotion association lexicons (such as the NRC Emotion Lexicon), word-sentiment lexicons (such as the NRC Hashtag Sentiment Lexicon), and word-colour association lexicons are available here. For the NRC-Canada sentiment anaysis system, go here.

Paper

Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation. Kiritchenko, S. and Mohammad, S. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-2017), Vancouver, Canada, 2017.
Paper (pdf)    BibTeX       Data

Journal Paper

Stance and Sentiment in Tweets. Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Special Section of the ACM Transactions on Internet Technology on Argumentation in Social Media, 2017, 17(3).
Paper (pdf)    BibTeX       Data and Visualization

Paper

Detecting Stance in Tweets And Analyzing its Interaction with Sentiment. Parinaz Sobhani, Saif M. Mohammad, and Svetlana Kiritchenko. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX     Presentation    Data and Visualization

Book Chapter

Challenges in Sentiment Analysis. Saif M. Mohammad, A Practical Guide to Sentiment Analysis, Springer, 2016.
Pre-print version (pdf)    BibTeX

Papers

Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best-Worst Scaling. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016, San Diego, California.
Paper (pdf)   BibTeX    Presentation     Data   

Sentiment Composition of Words with Opposing Polarities. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. June 2016, San Diego, California.
Paper (pdf)   BibTeX    Poster     Data: Opposing Polarity Sentiment Lexicon    Interactive Visualization

Semeval-2016 Task 6: Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the International Workshop on Semantic Evaluation (SemEval ’16). June 2016. San Diego, California.
Paper (pdf)    BibTeX    Presentation    Task Website

Semeval-2016 Task 7: Determining Sentiment Intensity of English and Arabic Phrases. Svetlana Kiritchenko, Saif M. Mohammad, and Mohammad Salameh. In Proceedings of the International Workshop on Semantic Evaluation (SemEval ’16). June 2016. San Diego, California.
Paper (pdf)    BibTeX    Presentation    Task Website

A Practical Guide to Sentiment Annotation: Challenges and Solutions. Saif M. Mohammad, In Proceedings of the NAACL 2016 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2016, San Diego, California.
Paper (pdf)   BibTeX    Presentation    

The Effect of Negators, Modals, and Degree Adverbs on Sentiment Composition. Svetlana Kiritchenko and Saif M. Mohammad, In Proceedings of the NAACL 2016 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2016, San Diego, California.
Paper (pdf)   BibTeX    Presentation     Data and Visualization

Sentiment Lexicons for Arabic Social Media. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation    Video        Data: Arabic Sentiment Lexicons

A Dataset for Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation    Data: Stance Dataset    Interactive Visualization

Happy Accident: A Sentiment Composition Lexicon for Opposing Polarities Phrases. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Poster    Data: Opposing Polarity Sentiment Lexicon    Interactive Visualization

Journal Papers

How Translation Alters Sentiment. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko, Journal of Artificial Intelligence Research, January 2016, Volume 55, pages 95-130.
Paper (pdf)    BibTeX     Data: Arabic Sentiment Lexicons

Developing a Successful SemEval Task in Sentiment Analysis of Twitter and Other Social Media Texts. Preslav Nakov, Sara Rosenthal, Svetlana Kiritchenko, Saif M. Mohammad, Zornitsa Kozareva, Alan Ritter, Veselin Stoyanov, and Xiaodan Zhu. Language Resources and Evaluation. March 2016, Volume 50, Issue 1, pages 35-65.
Paper (pdf)    Preprint Version    BibTeX

Professional Community Involvement

I am organizing these shared task competitions under the aegis of SemEval-2016 (see webpage for schedule):

Detecting Stance in Tweets (new task). Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry.

Determining sentiment intensity of English and Arabic phrases. Svetlana Kiritchenko, Saif M Mohammad, and Mohammad Salameh. This is an expansion of the SemEval-2015 Task 10 subtask E - Determining strength of association of Twitter terms with positive sentiment (or, degree of prior polarity).

Paper

SemEval-2015 Task 10: Sentiment Analysis in Twitter. Sara Rosenthal, Preslav Nakov, Svetlana Kiritchenko, Saif M Mohammad, Alan Ritter, and Veselin Stoyanov. In Proceedings of the ninth international workshop on Semantic Evaluation Exercises (SemEval-2015), June 2015, Denver, Colorado.
Paper (pdf)   BibTeX

Sentiment After Translation: A Case-Study on Arabic Social Media Posts. Mohammad Salameh, Saif M Mohammad and Svetlana Kiritchenko, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2015), June 2015, Denver, Colorado.
Paper (pdf)   BibTeX   Data: Arabic Sentiment Lexicons

Data

Arabic BBN blog posts and Syrian tweets translated manaually and automatically into English and annotated for sentiment. The original Arabic text is also annotated for sentiment.

BBN blog posts: A subset of 1200 Arabic (Levantine dialect) sentences chosen from the BBN Arabic-Dialect/English Parallel Text. The sentences are extracted social media posts and provided with their translation. We manually annotated this subset and its translations (both manual and automatic) for sentiment (positive, negative, or neutral).

Syrian tweets: dataset of 2000 tweets originating from Syria (a country where Levantine dialectal Arabic is commonly spoken). These tweets were collected in May 2014 by polling the Twitter API. This dataset is not provided with manual English translation. We manually annotated this subset and its translations (both manual and automatic) for sentiment (positive, negative, or neutral).

Tutorial

Sentiment Analysis of Social Media Texts. Saif M. Mohammad and Xiaodan Zhu. Tutorial at the 2014 Conference on Empirical Methods on Natural Language Processing, October 2014, Doha, Qatar.
Presentation   Video    Proposal

Journal paper

Sentiment Analysis of Short Informal Texts. Svetlana Kiritchenko, Xiaodan Zhu and Saif Mohammad. Journal of Artificial Intelligence Research, volume 50, pages 723-762, August 2014.
Paper (pdf)    BibTeX

Data

Among other things, the paper above describes how we created a sentiment lexicon by crowdsourcing. This is the first manually created lexicon with real-valued sentiment scores. It was created using the MaxDiff technique. The data was also used in SemEval-2015 Task 10 (Sentiment Analysis in Twitter), subtask E - Determining strength of association of Twitter terms with positive sentiment (or, degree of prior polarity). Task description, trial data, test data, and other details available here.

Papers

Sentiment, Emotion, Purpose, and Style in Electoral Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Xiaodan Zhu, and Joel Martin. Information Processing and Management, Volume 51, Issue 4, July 2015, Pages 480–499.
Paper (pdf)    BibTeX     AnnotatedData

NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews, Svetlana Kiritchenko, Xiaodan Zhu, Colin Cherry, and Saif M. Mohammad. In Proceedings of the eighth international workshop on Semantic Evaluation Exercises (SemEval-2014), August 2014, Dublin, Ireland.
Paper (pdf)    BibTeX     Poster    Various Yelp and Amazon Datasets and Lexicons

Official Rankings: Our team (NRC-Canada) ranked first in three of the six subtasks. About 30 teams participated.

NRC-Canada-2014: Recent Improvements in the Sentiment Analysis of Tweets, Xiaodan Zhu, Svetlana Kiritchenko, and Saif M. Mohammad. In Proceedings of the eighth international workshop on Semantic Evaluation Exercises (SemEval-2014), August 2014, Dublin, Ireland.
Paper (pdf)    BibTeX

Official Rankings: Our team (NRC-Canada) ranked first in five of the ten subtask-domain combinations. About 40 teams participated.

An Empirical Study on the Effect of Negation Words on Sentiment. Xiaodan Zhu, Hongyu Guo, Saif Mohammad and Svetlana Kiritchenko. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, June 2014, Baltimore, MD.
Paper (pdf)    BibTeX

Semantic Role Labeling of Emotions in Tweets. Saif M. Mohammad, Xiaodan Zhu, and Joel Martin, In Proceedings of the ACL 2014 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), June 2014, Baltimore, MD.
Paper (pdf)    BibTeX

NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets, Saif M. Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu, In Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013), June 2013, Atlanta, USA.
Paper (pdf)    BibTeX    System Description and Downloads     Poster     Slides

Official Rankings: Our team (NRC-Canada) ranked first in detecting sentiment of tweets (task 2B - tweets), first in detecting sentiment of SMS messages (task 2B - SMS), first in detecting sentiment of terms within a tweet (task 2A - tweets), and second in detecting sentiment of terms within an SMS message (task 2A - SMS). About 44 teams participated.

Data

Below are the two automatically created sentiment lexicons we used to generate our submissions to SemEval-2013 Task 2. If you use them, please cite this paper.

a. NRC Hashtag Sentiment Lexicon (version 0.1) is a list of words with associations to positive and negative sentiments. The lexicon is distributed in three files: unigrams-pmilexicon.txt (54,129 terms), bigrams-pmilexicon.txt (316,531 terms), and pairs-pmilexicon.txt (480,010 terms). Each line in the three files has the format:

term<tab>sentimentScore<tab>numPositive<tab>numNegative
where:
term is the target word or phrase.
In unigrams-pmilexicon.txt, term is a unigram (single word).
In bigrams-pmilexicon.txt, term is a bigram (two-word sequence). A bigram has the form: "string string". The bigram was seen at least once in the source tweets from which the lexicon was created.
In pairs-pmilexicon.txt, term is a unigram--unigram pair, unigram--bigram pair, bigram--unigram pair, or a bigram--bigram pair. The pairs were generated from a large set of source tweets. Tweets were examined one at a time, and all possible unigram and bigram combinations within the tweet were chosen. Pairs with certain punctuations, @ symbols, and some function words were removed.

sentimentScore is a real number. A positive score indicates positive sentiment. A negative score indicates negative sentiment. The absolute value is the degree of association with the sentiment.
numPositive is the number of times the term co-occurred with a positive marker such as a positive emoticon or a positive hashtag.
numNegative is the number of times the term co-occurred with a negative marker such as a negative emoticon or a negative hashtag.

The hashtag lexicon was created from a collection of tweets that had a positive or a negative word hashtag such as #good, #excellent, #bad, and #terrible. Version 0.1 was created from 775,310 tweets posted between April and December 2012 using a list of 78 positive and negative word hashtags. A list of these hashtags is shown in sentimenthashtags.txt.

b. Sentiment140 Lexicon (version 0.1) is also a list of words with associations to positive an negative sentiments. It has the same format as the NRC Hashtag Sentiment Lexicon. However, it was created from the sentiment140 corpus of 1.6 million tweets, and emoticons were used as positive and negative labels (instead of hashtagged words).

 

Paper

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus, Saif Mohammad, Bonnie Dorr, and Cody Dunne, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), August 2009, Singapore.
Paper (pdf)   BibTeX     Presentation 

Data

Access the Macquarie Semantic Orientation Lexicon (MSOL) here. It is described in the EMNLP-09 paper listed below. The paper describes a few different MSOL variants; the one available here for download is MSOL(ASL and GI).

 

AI/NLP Ethics
 

Journal Paper

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis. Saif M. Mohammad. Computational Linguistics, 48(2):239-278. June 2022. 
Paper (pdf)    BibTeX    Slides    Poster

Paper

Forgotten Knowledge: Examining the Citational Amnesia in NLP. Janvijay Singh, Mukund Rungta, Diyi Yang, and Saif M. Mohammad. In Proceedings of the 61st Annual Meeting of the Association of Computational Linguistics (ACL-2023), Toronto, Canada.

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research. Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, and Karen Fort. In Proceedings of the 61st Annual Meeting of the Association of Computational Linguistics (ACL-2023), Toronto, Canada.
Paper (pdf)    BibTeX    Slides

AI Usage Cards: Responsibly Reporting AI-generated Content. Jan Philip Wahle, Terry Ruas, Saif M. Mohammad, Norman Meuschke, and Bela Gipp. arXiv:2303.03886, 2023.
Paper (pdf)    BibTeX    Slides 

Best Practices in the Creation and Use of Emotion Lexicons. Saif M. Mohammad. EACL, 2023, Dubrovnik, Croatia.
Paper (pdf)    BibTeX    Slides

Geographic Citation Gaps in NLP Research. Mukund Rungta, Janvijay Singh, Saif M. Mohammad, Diyi Yang. EMNLP, 2022, Abu Dhabi, UAE.
Paper (pdf)    BibTeX    Slides

Ethics Sheets for AI Tasks. Saif M. Mohammad. In Proceedings of the 60th Annual Meeting of the Association of Computational Linguistics (ACL-2022), May 2022, Dublin, Ireland. 
Paper (pdf)    BibTeX    Slides    Poster

Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons.  Saif M. Mohammad. arXiv preprint arXiv:2011.03492. December 2020. 
Paper (pdf)    BibTeX

Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020). July 2020. Seattle, USA.  
Paper (pdf)    BibTeX       Video       Presentation       Project Home Page     Medium Blog Posts

Applied AI Ethics. Report on Canada-United Kingdom Symposia on Ethics in AI in Ottawa, Canada and London, UK. de Bruijn, B., Désillets, A., Fraser, K., Kiritchenko, S., Mohammad, S., Vinson, N., Bloomfield, P., Brace, H., Brzoska, K., Elhalal, A., Ho, K., Kinsey, L., McWhirter, R., Nazare, M., and Ofuri-Kuragu, E. Digital Catapult, London, UK / NRC, Ottawa, Canada, 2019. 
Paper (pdf)     Symposium Website

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems. Svetlana Kiritchenko and Saif M. Mohammad. In Proceedings of *Sem, New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX    Project Page and Data    Presentation

SemEval-2018 Task 1: Affect in Tweets Webpage
75 teams and about 200 participants. First SemEval shared task with an ethics-associated evaluation.

Invited Talks

Ethics Sheets for Social NLP Tasks. The 10th Social NLP Workshop at ACL 2022, Seattle, USA. July 14, 2022.

Ethics Sheets for Social AI Tasks. The Alan Turing Institute. July 28, 2022. London, UK.

Ethics Sheets for AI Tasks and a Case Study for Automatic Emotion Recognition. The University of British Columbia Language Sciences Talks, Vancouver, Canada. July 15, 2021.
Slides      Video

Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations. Women+@DCS Seminar Series, University of Sheffield, October 28 2020, Sheffield, UK.
Slides      Video

Fairness and Emotions in Language. The Globe and Mail. October 29, 2019, Toronto, Canada.

Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems.Invited talk at the Second ACL Workshop on Ethics in Natural Language Processing, New Orleans, LA, USA, June 2018.

Professional Community Involvement

Chair of the 2019 Canada--UK Symposium on Ethics in AI, Feb 21--22, Ottawa, Canada.

Hosted the Responsible AI Summit, October 24 2019, Montreal, Canada.

Blog Posts

Ethics Sheets for AI Tasks. July 5, 2021.     
Video

Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis. July 5, 2021.

Web/Press Mentions

Gender and Racial Bias in Cloud NLP Sentiment APIs, Aug 21, 2019. Article looking into race and gender biases in the Google and AWS cloud sentiment analysis APIs using the Equity Evaluation Corpus and the techniques we published in 2018.

 

AI/NLP Scientometrics
 

Journal Paper

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Paper

D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research. Jan Philip Wahle, Terry Ruas, Saif Mohammad and Bela Gipp. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC-2022), May 2022, Marseille, France.
Paper (pdf)    BibTeX      Project Home Page (Code and Data)   Slides

Geographic Citation Gaps in NLP Research. Mukund Rungta, Janvijay Singh, Saif M. Mohammad, Diyi Yang. EMNLP, 2022, Abu Dhabi, UAE.
Paper (pdf)    BibTeX    Slides

Examining Citations of Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020, Seattle, USA. 
Paper (pdf)    BibTeX       Presentation       Project Home Page    Interactive Visualizations      Medium Blog Posts

NLP Scholar: An Interactive Visual Explorer for Natural Language Processing Literature. Saif M. Mohammad. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL-2020), July 2020, Seattle, USA.
Paper (pdf)    BibTeX       Presentation      Project Home Page    Interactive Visualizations      Medium Blog Posts

NLP Scholar: A Dataset for Examining the State of NLP Research. Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France. 
Paper (pdf)    BibTeX      Project Home Page and Data    Interactive Visualizations      Medium Blog Posts

The State of NLP Literature: A Diachronic Analysis of the ACL Anthology.  Saif M. Mohammad. arXiv preprint arXiv:1911.03562. November 2019. 
Paper (pdf)    BibTeX      Project Home Page and Data     Interactive Visualizations      Medium Blog Posts

 

 

Computational Social Science
 

Journal Paper

Examining the Language of Solitude vs. Loneliness in Tweets.  Will E. Hipson, Svetlana Kiritchenko, Robert J. Coplan, Saif M. Mohammad. Journal of Social and Personal Relationships. March 2021. 
Paper (pdf)    BibTeX

Stance and Sentiment in Tweets. Saif M. Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Special Section of the ACM Transactions on Internet Technology on Argumentation in Social Media, 2017, 17(3).
Paper (pdf)    BibTeX       Data and Visualization

Paper

Ruddit: Norms of Offensiveness for English Reddit Comments. Rishav Hada, Sohi Sudhir, Pushkar Mishra, Helen Yannakoudakis, Saif M. Mohammad, and Ekaterina Shutova. In Proceedings of the 59th Annual Meeting of the Association of Computational Linguistics (ACL-2021), August 2021. 
Paper (pdf)    BibTeX     Code and Data

SOLO: A Corpus of Tweets for Examining the State of Being Alone. Svetlana Kiritchenko, Will Hipson, Robert Coplan, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX       Data

How do we feel when a robot dies? Emotions expressed on Twitter before and after hitchBOT’s destruction. Kathleen C. Fraser, Frauke Zeller, David Harris Smith, Saif M. Mohammad, and Frank Rudicz. In Proceedings of the NAACL workshop on computational approaches to subjectivity, sentiment, and social media analysis (WASSA-19), June 2019, Minneapolis, USA.
Paper (pdf)    BibTeX       Slides       Visualizations

Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. Saif M. Mohammad. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 2018.
Paper (pdf)    BibTeX    Project Page and Data    Presentation   Video   Poster

Agree or Disagree: Predicting Judgments on Nuanced Assertions. Michael Wojatzki, Torsten Zesch, Saif M. Mohammad, and  Svetlana Kiritchenko. In Proceedings of *Sem, New Orleans, LA, USA, June 2018.
Paper (pdf)    BibTeX        Project Page and Data        Presentation

Quantifying Qualitative Data for Understanding Controversial Issues. Michael Wojatzki, Saif M. Mohammad, Torsten Zesch, and Svetlana Kiritchenko. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)    BibTeX        Presentation      Project Page and Data

Detecting Stance in Tweets And Analyzing its Interaction with Sentiment.Parinaz Sobhani, Saif M. Mohammad, and Svetlana Kiritchenko. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany. 
Paper (pdf)    BibTeX    Presentation       Data and Visualization

Semeval-2016 Task 6: Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the International Workshop on Semantic Evaluation (SemEval ’16). June 2016. San Diego, California.
Paper (pdf)    BibTeX   Presentation    Task Website

A Dataset for Detecting Stance in Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation     Data: Stance Dataset    Interactive Visualization

Identifying Purpose Behind Electoral Tweets, Saif Mohammad, Svetlana Kiritchenko and Joel Martin, In Proceedings of the KDD Workshop on Issues of Sentiment Discovery and Opinion Mining (WISDOM-2013), August 2013, Chicago, USA.
Paper (pdf)    BibTeXAnnotatedData

Event

A symphony orchestra performed music composed using the NRC Emotion Lexicon under the glass of the Louvre museum in Paris on Sept. 20, 2016. Click here for a video of the performance.
Articles published in the Washington PostCBS News, Columbia Tribune, and others.

 

Digital Humanities
 

Journal Paper

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Sentiment, Emotion, Purpose, and Style in Electoral Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Xiaodan Zhu, and Joel Martin. Information Processing and Management, Volume 51, Issue 4, July 2015, Pages 480–499.
Paper (pdf)    BibTeX     AnnotatedData

From Once Upon a Time to Happily Ever After: Tracking Emotions in Mail and Books, Saif Mohammad, Decision Support Systems, Volume 53, Issue 4, November 2012, Pages 730–741.
Paper (pdf)    BibTeX

Paper

Voices Speaking To and About One Another: Introducing the Project Dialogism Novel Corpus. Adam Hammond, Krishnapriya Vishnubhotla, Graeme Hirst, and Saif M. Mohammad. In Proceedings of the Digital Humanities 2022 Conference, July 2022, virtual.
Paper (pdf)    BibTeX

PoKi: A Large Dataset of Poems by Children. Will E. Hipson, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX     Project Home Page and Data

Imagisaurus: An Interactive Visualizer of Valence and Emotion in the Roget’s Thesaurus. Saif M. Mohammad. In Proceedings of the EMNLP 2015 Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media (WASSA), September 2015, Lisbon, Portugal.
Paper (pdf)    BibTeX    Interactive Visualization

Generating Music from Literature. Hannah Davis and Saif M. Mohammad, In Proceedings of the EACL Workshop on Computational Linguistics for Literature, April 2014, Gothenburg, Sweden. 
Paper (pdf)    BibTeX    TransProse Website

Notable Press Mentions: The Physics arXiv Blog, March 20, 2014, TIME, May 7, 2014, PC World, May 15, 2014, Popular Science, May 14, 2014, io9, May 12, 2014, LiveScience, May 11, 2014.

Tracking Sentiment in Mail: How Genders Differ on Emotional Axes, Saif Mohammad and Tony Yang, In Proceedings of the ACL 2011 Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation 

From Once Upon a Time to Happily Ever After: Tracking Emotions in Novels and Fairy Tales, Saif Mohammad, In Proceedings of the ACL 2011 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation

Invited Talk

From Once Upon a Time to Happily Ever After: Tracking Emotions in Books and Mail.
         July 2011: Amazon, Social Media group and Digital Books group, Seattle, OR. 
         June 2011: Social Media - "Big Data" Analysis Workshop, Defence R&D Canada, Ottawa, Canada.

 

 

Africa and Asia NLP
 

Journal Paper

How Translation Alters Sentiment. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko, Journal of Artificial Intelligence Research, January 2016, 55:95-130.
Paper (pdf)    BibTeX    Data: Arabic Sentiment Lexicons

Paper

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval). Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif M. Mohammad, Meriem Beloucif, Sebastian Ruder. SemEval 2023, Toronto, Canada.

AfriSenti SemEval-2023 Shared Task

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages. Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis Davis, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Steven Arthur

Sentiment Lexicons for Arabic Social Media. Saif M. Mohammad, Mohammad Salameh, and Svetlana Kiritchenko. In Proceedings of the 10th edition of the Language Resources and Evaluation Conference, May 2016, Portorož (Slovenia).
Paper (pdf)    BibTeX    Presentation    Video        Data: Arabic Sentiment Lexicons

Sentiment After Translation: A Case-Study on Arabic Social Media Posts. Mohammad Salameh, Saif M. Mohammad, and Svetlana Kiritchenko, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2015), June 2015, Denver, Colorado.
Paper (pdf)    BibTeX   Data: rabic Sentiment Lexicons

 

Personality Traits
 

Journal Paper

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Paper

Using Nuances of Emotion to Identify Personality, Saif M. Mohammad and Svetlana Kiritchenko, In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA.
Paper (pdf)
   BibTeX

 
 

Capturing Word-Colour Associations
 

Papers

Colourful Language: Measuring Word-Colour Associations, Saif Mohammad, In Proceedings of the ACL 2011 Workshop on Cognitive Modeling and Computational Linguistics (CMCL), June 2011, Portland, OR.
Paper (pdf)    BibTeX     Presentation

Even the Abstract have Colour: Consensus in WordColour Associations, Saif Mohammad, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 2011, Portland, OR.
Paper (pdf)    BibTeX     Poster  

Data

The NRC Word-Colour Association Lexicon (a.k.a. NRC Color Lexicon) has human annotations of colours associated with more than 24,200 word senses (about 14,200 word types). It is available here.

Visualization

An interactive visualization of the NRC Color Lexicon, called Lexichrome, is available here.


 

Computing Semantic Distance and Distributional Similarity
 

Papers

What Makes Sentences Semantically Related: A Textual Relatedness Dataset and Empirical Study.Mohamed Abdalla, Krishnapriya Vishnubhotla, Saif Mohammad. arXiv:2110.04845. Oct 2021.
Paper (pdf)    BibTeX     Data

Big Bird: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition. Shima Asaadi, Saif M. Mohammad, and Svetlana Kiritchenko. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2019), June 2019, Minnesota, USA.
Paper (pdf)    BibTeX       Poster       Data       Project Home Page and Visualizations       Code

Measuring Semantic Distance using Distributional Profiles of Concepts, Saif Mohammad and Graeme Hirst. Arxiv.
Paper (pdf)

Estimating semantic distance using soft semantic constraints in knowledge-source–corpus hybrid models, Yuval Marton, Saif Mohammad, and Philip Resnik, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), August 2009, Singapore.
Paper (pdf)    Presentation 

Measuring Semantic Distance using Distributional Profiles of Concepts, Saif Mohammad, Ph.D. thesis, University of Toronto, January 2008, Toronto, Canada.
Paper (pdf)    Presentation

Cross-lingual distributional profiles of concepts for measuring semantic distance, Saif Mohammad, Iryna Gurevych, Graeme Hirst, and Torsten Zesch, In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP/CoNLL-2007), June 2007, Prague, Czech Republic.
Paper (pdf)    Presentation 

Distributional Measures of Semantic Distance: A Survey. Saif Mohammad and Graeme Hirst. arXiv:1203.1858. 2007.
Paper (pdf) (Note: This is an updated version of the Jan 2006 paper below.)   

Distributional Measures as Proxies for Semantic Relatedness. Saif Mohammad and Graeme Hirst. arXiv:1203.1889. 2006.
Paper (pdf)

Distributional measures of concept-distance: A task-oriented evaluation, Saif Mohammad and Graeme Hirst, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2006), July 2006, Sydney, Australia.
Paper (pdf)    Presentation 

 

Computing Lexical Contrast
 

Data

Datasets described in Computing Lexical Contrast, Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney, Computational Linguistics, 39 (3), 555-590, 2013.

1. List of about 3.5 million antonym pairs identified from contrasting adjacent thesaurus categories.
2. List of about 3.2 million antonym pairs identified using affix patterns and the thesaurus structure.
3. Total set of 6.3 million antonym pairs obtained by combining 1 and 2, and removing duplicates.
4. Set of 1269 closest-to-opposite questions created for WordNet opposites: adjectives, adverbs, nouns, verbs
5. Set of 162 closest-to-opposite questions from GRE preparatory website 1: development set.
6. Set of 790 closest-to-opposite questions from GRE preparatory website 2: test set.
7. Questionnaires for determining information about kinds of opposites: adjectives, adverbs, nouns, verbs
8. Responses to crowdsourced questionnaires: adjectives, adverbs, nouns, verbs
9. Set of 209 adjacent categories in the Macquarie Thesaurus that were manually determined to be contrasting.
10. Set of 1358 WordNet opposites used to test the co-occurrence and the distributional hypotheses.
11. Set of 1358 WordNet synonyms used to test the co-occurrence and the distributional hypotheses.
12. Set of 1358 WordNet random word pairs used to test the co-occurrence and the distributional hypotheses.
13. Set of 15 affix rules that tend to generate opposites.
14. TURN dataset: 136 pairs of words (89 opposites and 47 synonyms) from various Web sites for learners of English as a second language (first described in Turney, 2008).
15. LZQZ dataset: 80 pairs of synonyms and 80 pairs of opposites from the Webster’s Collegiate Thesaurus (first described in Lin et al., 2003).

Journal Paper

Computing Lexical Contrast, Saif M. Mohammad, Bonnie J. Dorr, Graeme Hirst, and Peter D. Turney, Computational Linguistics, 39 (3), 555-590, 2013.
Paper (pdf)   BibTeX

Papers

Computing Word-Pair Antonymy, Saif Mohammad, Bonnie Dorr, and Graeme Hirst, In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-2008), October 2008, Waikiki, Hawaii.
Abstract    Paper (pdf)    Presentation 

Towards Antonymy-Aware Natural Language Applications, Saif Mohammad, Bonnie Dorr, and Graeme Hirst. Proceedings of the Symposium on Semantic Knowledge Discovery, Organization and Use (SKDOU-2008), November 2008, New York, NY.
Paper (pdf)    Poster

 

Evolution of Words
 

Journal Paper

The Natural Selection of Words: Finding the Features of Fitness. Peter D. Turney and Saif M. Mohammad. PLoS One, 14 (1):e0211512. January 2019.
Paper (pdf)    BibTeX     Code

Paper

WordWars: A Dataset to Examine the Natural Selection of Words. Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX       Data      Project Home Page and Visualizations

 

Word Sense Disambiguation and Word Sense Dominance
 

Papers

Distributional profiles of concepts for Unsupervised Word Sense Disambigution, Saif Mohammad, Graeme Hirst, and Philip Resnik, In Proceedings of the Fourth International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SemEval-07), June 2007, Prague, Czech Republic.
Abstract    Paper (pdf)    Poster

Determining Word Sense Dominance Using a Thesaurus, Saif Mohammad and Graeme Hirst, In Proceedings of the 11th conference of the European chapter of the Association for Computational Linguistics (EACL-2006), April 2006, Trento, Italy.
Abstract    Paper (pdf)    Presentation 

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation, Saif Mohammad and Ted Pedersen, In Proceedings of the Conference on Computational Natural Language Learning (CoNLL-2004), May, 2004, Boston, MA.
Paper (pdf)    Presentation

Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to Senseval-3, Saif Mohammad and Ted Pedersen, In Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SensEval-3), July 2004, Barcelona, Spain.
Paper (pdf)    Presentation

Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation, Saif Mohammad, Master's thesis, University of Minnesota, August 2003, Minnesota.
Paper (pdf)    Presentation

Guaranteed Pre-Tagging for the Brill Tagger, Saif Mohammad and Ted Pedersen, In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2003), February 2003, Mexico City.
Paper (pdf)

 

Text Summarization
 

Journal Paper

Generating Extractive Summaries of Scientific Paradigms, Vahed Qazvinian, Dragomir R. Radev, Saif M. Mohammad, Bonnie Dorr, David Zajic, Michael Whidby, Taesun Moon. Journal of Artificial Intelligence Research (JAIR), 46, pages 165-201, 2013.
Paper (pdf)   BibTeX

Papers

Using Citations to Generate Surveys of Scientific Paradigms, Saif M. Mohammad, Bonnie Dorr, Melissa Egan, Ahmed Hassan, Pradeep Muthukrishan, Vahed Qazvinian, Dragomir Radev, and David Zajic, In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT-2009), May 2009, Boulder, Colorado.
Paper (pdf)    Presentation 

Multiple alternative sentence compressions and word-pair antonymy for automatic text summarization and recognizing textual entailment, Saif Mohammad, Bonnie Dorr, Melissa Egan, Jimmy Lin, and David Zajic. Proceedings of the Text Analysis Conference (TAC-2008), November 2008, Gaithersburg, MD.
Paper (pdf)    Poster

 

Multi-Document Coreference Resolution
 

Paper

Cross-Document Coreference Resolution: A Key Technology for Learning by Reading, James Mayfield, Bonnie Dorr, Jason Eisner, Tim Finin, Saif Mohammad, Douglas Oard, Ralph Weischedel, David Yarowsky, and others. March 2009. Proceedings of the AAAI Spring Symposium on Learning by Reading and Learning to Read (AAAI-09), Menlo Park, CA.
Paper (pdf)

 

Recognizing Textual Entailment
 

Journal Paper

Experiments with Three Approaches to Recognizing Lexical Entailment. Peter D. Turney, Saif M. Mohammad, Natural Language Engineering, Volume 21, Issue 3, May 2015.
Paper (pdf)    BibTeX

Paper

Multiple alternative sentence compressions and word-pair antonymy for automatic text summarization and recognizing textual entailment, Saif Mohammad, Bonnie Dorr, Melissa Egan, Jimmy Lin, and David Zajic. Proceedings of the Text Analysis Conference (TAC-2008), November 2008, Gaithersburg, MD.
Paper (pdf)    Poster



Relational Similarity
 

Paper

SemEval-2012 Task 2: Measuring Degrees of Relational Similarity, David Jurgens, Saif Mohammad, Peter Turney and Keith Holyoak, In Proceedings of the 2012 SemEval-2012: Semantic Evaluation Exercises, June 2012, Montreal, Canada.
Paper (pdf)    BibTeX

Data

Data we created for SemEval-2012: Semantic Evaluation Exercises -- Task 2: Measuring Degrees of Relational Similarity is available here.


Metaphor
 

Paper

Metaphor as a Medium for Emotion: An Empirical Study, Saif M. Mohammad, Ekaterina Shutova, and Peter Turney. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*Sem), August 2016, Berlin, Germany.
Paper (pdf)   BibTeX     Data and Interactive Visualization

Data

The data annotated as part of this project can be downloaded by clicking here.

 

NLP for Psychology, Health Applications, Pharmacovigilance
 

Journal Paper

Emotion Dynamics in Movie Dialogues. Will E. Hipson and Saif M. Mohammad. arXiv preprint arXiv:2103.01345. March 2021. (To appear in PLOS One, 2021)
Paper (pdf)    BibTeX     Code

Examining the Language of Solitude vs. Loneliness in Tweets.  Will E. Hipson, Svetlana Kiritchenko, Robert J. Coplan, Saif M. Mohammad. Journal of Social and Personal Relationships. March 2021. 
Paper (pdf)    BibTeX

Using Hashtags to Capture Fine Emotion Categories from Tweets. Saif M. Mohammad, Svetlana Kiritchenko, Computational Intelligence, Volume 31, Issue 2, Pages 301-326, May 2015.
Paper (pdf)    BibTeX

Paper

Tweet Emotion Dynamics: Emotion Word Usage in Tweets from US and Canada. Krishnapriya Vishnubhotla and Saif M. Mohammad. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC-2022), May 2022, Marseille, France.
Paper (pdf)    BibTeX      Project Home Page (Code and Data)    Poster   Slides

PoKi: A Large Dataset of Poems by Children. Will E. Hipson, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX     Project Home Page and Data

SOLO: A Corpus of Tweets for Examining the State of Being Alone. Svetlana Kiritchenko, Will Hipson, Robert Coplan, and Saif M. Mohammad. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), May 2020, Marseille, France.
Paper (pdf)    BibTeX       Data

Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. Saif M. Mohammad. In Proceedings ofthe 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, July 2018.
Paper (pdf)   BibTeX    Project Page and Data     Presentation   Video   Poster

Word Affect Intensities. Saif M. Mohammad. In Proceedings of the 11th Edition of the Language Resources and Evaluation Conference (LREC-2018), May 2018, Miyazaki, Japan.
Paper (pdf)   BibTeX       Presentation       Project Page and Data

Data and systems for medication-related text classification and concept normalization from Twitter: Insights from the Social Media Mining for Health (SMM4H)-2017 shared task. Abeed Sarker, Maksim Belusov, Jasper Friedrichs, Kai Hakala, Sifei Han, Svetlana Kiritchenko, Farrokh Mehryary, Anthony Rios, Tung Tran, Berry de Bruijn, Filip Ginter, Ramakanth Kavuluru, Debanjan Mahata, Saif M. Mohammad, Goran Nenadic, Graciela Gonzalez-Hernandez. Journal of the American Medical Informatics Association (JAMIA). 25(10), 1274--1283, October 2018.

NRC-Canada at SMM4H Shared Task: Classifying Tweets Mentioning Adverse Drug Reactions and Medication Intake. Svetlana Kiritchenko, Saif M. Mohammad, Jason Morin, and Berry de Bruijn (2017). In Proceedings of the Social Media Mining for Health Applications Workshop at AMIA-2017, Washington, DC, USA, 2017.
Paper (pdf)   BibTeX    Our System Homepage

Official Rankings
: Our team (NRC-Canada) ranked first in the AMIA Shared Task on detecting adverse drug reactions in tweets.  

Using Nuances of Emotion to Identify Personality, Saif M. Mohammad and Svetlana Kiritchenko, In Proceedings of the ICWSM Workshop on Computational Personality Recognition, July 2013, Boston, USA.
Paper (pdf)
   BibTeX  Poster

Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes. Colin Cherry, Saif Mohammad, and Berry de Bruijn. Journal of Biomedical Informatics Insights, 5 (Suppl. 1), 147--154, January 2012.
Paper (pdf)    BibTeX

 

Designated Contact Person:

Dr. Saif M. Mohammad
Senior Research Officer at NRC (and one of the creators of the resource on this page)
saif.mohammad@nrc-cnrc.gc.ca

Terms of Use:

  1. All rights for the resource(s) listed on this page are held by National Research Council Canada.

  2. The resources listed here are available free for research purposes. If you make use of them, cite the paper(s) associated with the resource in your research papers and articles.

  3. If interested in commercial use of any of these resources, send email to the designated contact person. A nominal one-time licensing fee may apply.

  4. If referenced in news articles and online posts, then cite the resource appropriately. For example: "This application/product/tool makes use of the <resource name>, created by <author(s)> at the National Research Council Canada." If possible, hyperlink the resource name to this page.

  5. If you use the resource in a product or application, then acknowledge this in the 'About' page and other relevant documentation of the application by stating the name of the resource, the authors, and NRC. For example: "This application/product/tool makes use of the <resource name>, created by <author(s)> at the National Research Council Canada." If possible, hyperlink the resource name to this page.

  6. Do not redistribute the resource/data. Direct interested parties to this page. They can also email the designated contact person.

  7. If you create a derivative resource from one of the resources listed on this page: 

    1. Please ask users to cite the source data paper (in addition to your paper). 

    2. Do not distribute the source data. See #6 above.

Examples of derivative resources include: translations into other languages, added annotations to the text instances, aggregations of multiple datasets, etc.

  1. If you are interested in uploading our resource on a third-party website or to include the resource in any collection/aggregate of datasets, then:

    1. Email the designated contact person to begin the process to obtain permission.

    2. After obtaining permission, any curator of datasets that includes a resource listed here must take steps to ensure that users of the aggregate dataset still cite the papers associated with the individual datasets. This includes at minimum: stating this clearly in the README and providing the citing information of the source dataset.

By default, no one other than the creators of the resource have permission to upload the resource on a third-party website or to include the resource in any collection/aggregate of datasets.

  1. National Research Council Canada (NRC) disclaims any responsibility for the use of the resource(s) listed on this page and does not provide technical support. However, the contact listed above will be happy to respond to queries and clarifications.

If you send us an email, we will be thrilled to know about how you have used the resource.


Last Updated: July 2015