The Story Resource Page
Collecting and analyzing millions of personal stories of everyday life
1. Overview
Over the last several years, my students and I have been developing technologies for collecting, analyzing, and reasoning with millions of personal stories extracted from Internet weblogs.
For an overview of these efforts, see the following papers.
- Gordon, A. (2008) Story Management Technologies for Organizational Learning. International Conference on Knowledge Management, Special Track on Intelligent Assistance for Self-Directed and Organizational Learning, Graz, Austria, September 3-5, 2008. pdf
- Gordon, A. and Swanson, R. (2008) Envisioning With Weblogs. International Conference on New Media Technology, Special Track on Knowledge Acquisition From the Social Web, Graz, Austria. September 3-5, 2008. pdf
2. Large-scale story corpora
To facilitate the distribution of large-scale story corpora, our group has identified individual blog posts that contain personal stories within existing large-scale corpora of posts. Most recently, we identified nearly one million personal stories in the ICWSM 2009 Spinn3r Blog Dataset, which we call the ICWSM 2009 Story Subset. Information about obtaining the The ICWSM 2009 Spinn3r Blog Dataset is available here. To identify the story subset once you have this dataset, please use one of the following:
- Version 1.0 (5/29/09) Initial release
- Version 1.1 (6/15/09) Fixed 1-off errors in both index file and java extractor
- Version 2.0 (12/21/09) Results using new classifier from Reid Swanson's PhD Dissertation
- Version 2.1 (5/26/11) Made some syntax changes to the python extractor to achieve compatibility with Python 2.6+
If you use this ICWSM 2009 Story Subset in your research, please send Andrew Gordon an email, and be sure to cite the following paper:
- Gordon, A. and Swanson, R. (2009) Identifying Personal Stories in Millions of Weblog Entries. Third International Conference on Weblogs and Social Media, Data Challenge Workshop, San Jose, CA, May 20, 2009. pdf
3. Reasoning with weblog stories
One of the primary aims of our work is to solve the problem of knowledge acquisition in commonsense reasoning. Below are papers that describe some of our previous attempts to exploit commonsense knowledge that exists within millions of weblog stories.
- Gordon, A., Bejan, C., and Sagae, K. (2011) Commonsense Causal Reasoning Using Millions of Personal Stories. Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), August 7–11, 2011, San Francisco, CA. pdf
- Gordon, A. (2010) Mining Commonsense Knowledge From Personal Stories in Internet Weblogs. Proceedings of the First Workshop on Automated Knowledge Base Construction, Grenoble, France, May 17-19, 2010. pdf
- Gerber, M., Gordon, A., & Sagae, K. (2010) Open-domain Commonsense Reasoning Using Discourse Relations from a Corpus of Weblog Stories. Proceedings of the 1st International Workshop on Formalisms and Methodology for Learning by Reading (FAM-LbR) NAACL 2010 Workshop, Los Angeles, CA, June 6, 2010. pdf
4. Related work
Several groups have pursued related work in automated story extraction from text. Some of the more recent work achieves higher accuracy than our efforts, and may be more promising as a basis for future work.
- Joshua Eisenberg and Mark Finlayson (2017) A Simpler and More Generalizable Story Detector using Verb and Character Features. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. link
- Joshua D. Eisenberg, W. Victor H. Yarlott, and Mark A. Finlayson (2016) Comparing Extant Story Classifiers: Results & New Directions. Seventh International Workshop on Computational Models of Narrative, Krakow, Poland. link
- Betul Ceran, Ravi Karad, Steven Corman, and Hasan Davulcu. 2012. A Hybrid Model and Memory Based Story Classifier. In Proceedings of the 3rd In- ternational Workshop on Computational Models of Narrative (CMN’12), pages 60–64, Istanbul, Turkey. link