The Webis Clickbait Corpus 2016 (Webis-Clickbait-16) comprises 2992 Twitter tweets sampled from top 20 news publishers as per retweets in 2014. The tweets have been manually annotated by three independent annotators with regard to whether they can be considered clickbait. A total of 767 tweets are considered clickbait by the majority of annotators. The majority vote of reviewers can be used as a ground truth to build clickbait detection technology. This corpus is the first of its kind and gives rise to the development of technology to tackle clickbait.
To download the corpus use the following links:
(255 MB, MD5 sum: 7ae5a128350eecbcbad182ade4f42585)
If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].
Clickbait refers to a certain kind of web content advertisement that is designed to entice its readers into clicking an accompanying link. Typically, it is spread on social media in the form of short teaser messages that may read like the following examples:
- A Man Falls Down And Cries For Help Twice. The Second Time, My Jaw Drops
- 9 Out Of 10 Americans Are Completely Wrong About This Mind-Blowing Fact
- Here’s What Actually Reduces Gun Violence
When reading such and similar messages, many get the distinct impression that something is odd about them; something unnamed is referred to, some emotional reaction is promised, some lack of knowledge is ascribed, some authority is claimed. Content publishers of all kinds discovered clickbait as an effective tool to draw attention to their websites. The level of attention captured by a website determines the prize of displaying ads there, whereas attention is measured in terms of unique page impressions, usually caused by clicking on a link that points to a given page (often abbreviated as “clicks”). Therefore, a clickbait’s target link alongside its teaser message usually redirects to the sender’s website if the reader is afar, or else to another page on the same site. The content found at the linked page often encourages the reader to share it, suggesting clickbait for a default message and thus spreading it virally. Clickbait on social media has been on the rise in recent years, and even some news publishers have adopted this technique. These developments have caused general concern among many outspoken bloggers, since clickbait threatens to clog up social media channels, and since it violates journalistic codes of ethics.
For more information on the construction of the dataset see the publication below.
Students: Sebastian Köpsel.