Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities

Jack Hessel, Chenhao Tan, Lillian Lee


When large social-media platforms allow users to easily form and self-organize into interest groups, highly related communities can arise. For example, the Reddit site hosts not just a group called food, but also HealthyFood, foodhacks, foodporn, and cooking, among others. Are these highly related communities created for similar classes of reasons (e.g., "true" to distinguish one as a better community and "advice" to focus on helping fellow members)? How do users allocate attention between such close alternatives when they are available or emerge over time? Are there different types of relations between close alternatives such as sharing many users vs. a new community drawing away members of an older one vs. a splinter group failing to cohere into a viable separate community? We investigate the interactions between highly related communities using data from consisting of 975M posts and comments spanning an 8-year period. We identify a set of typical affixes that users adopt to create highly related communities and build a taxonomy of affixes. One interesting finding regarding users' behavior is: after a newer community is created, for several types of highly-related community pairs, users that engage in a newer community tend to be more active in their original community than users that do not explore, even when controlling for previous level of engagement.

Paper Link and Citation

This paper appeared in ICWSM 2016. A PDF is available here. The bibtex entry for this article is:

     author = {Hessel, Jack and Tan, Chenhao, and Lee, Lillian},
     title = {Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities},
     year = {2016},
     booktitle = {The 10th International AAAI Conference on Web and Social Media}

Dataset download

The dataset includes meta information regarding around 1B reddit posts/comments. Included are the full reconstructed comment trees.

April 2018 Update: We upgraded the dataset form version 1.0 to version 1.1. See more details here.

The readme of the dataset is here. Please read over it before you download this 10GB file. The full text of the comments/posts is not provided (it is around 300GB, compressed) but is available upon request. Please send an e-mail to if you're interested.


Thanks to David Mimno and Mor Naaman for their helpful comments. Chenhao Tan was supported by a Facebook fellowship at the time of this work.