Digital Humanities Summer Institute
University of Victoria – June 8-12, 2009
SEASR in Action: Data Analytics for Humanities Scholars
Morning Session – Day 1
Participants: Loretta, Boris, Allison, Jen, Jessica, Kathy, Justin, Greg, Brett, Roman, Devon, Mary, Quinn, Lauren,
SOA – service-oriented architecture
RDF – resource description framework
Dunning Loglikelihood – comparing different works.
Entity extraction using OpenNLP; dates viewed on Simile Timeline; location viewed on Google Maps.
Dendogram visualization – for clustering of texts
NEMA – SEASR workflow for audio analysis
DISCUS – text summarization, visual maps of concepts
UIMA – IBM software for taking unstructured data and turning it into structured data. Visualization to track emotion across a document. see flare.prefuse.org
SEASR / Meandre Infrastructure –
Meandre Workbench visual programming tool
Meandre’s ZigZag scripting language (if you like that sort of thing)
Zotero (plugin for Firefox) – manages the collection
SEASR Community Hub –
Afternoon Session – Day 1
Steps: text pre-processing, feature generation, feature selection, text / data analytics, analyzing results.
Text characteristics – text must be converted to numerical values for most algorithms. Noisy data (spelling msitakes, abbrevs., ACRNYMS). Not well structured text: email/chat/micro+blogs; transcribed speech; dependency (order of words); ambiguity (multiple meanings).
Text pre-processing – syntactic analysis; semantic analysis
Feature Selection – reduce dimensionality; irrelevant features (not all features help).
Syntactic analysis –
Semantic analysis –
Information extraction: entities (98% accuracy); attributes (80%), facts (60 – 70%), events (50 – 60%)
Hands-on work with meandre
Robert Blake Plenary – “Teaching with Technology”
Technology syndrome – is the fault technical (server is down), or pedagogical?
Multiple entry points for using technology – web pages, cds/dvds, etc.; social networks (cmc/cscw, tele/video-conferencing)
Emphasize how you use technology, not what you use
Create a student-centred classroom (e.g., wikis and blogs reinforce autonomy
Support interactivity, agency and students as co-producers (see Sloan Foundation report, 2004)
Extend the curriculum beyond the space and time of teh classroom
Technology is not a self-determining agent – only social forces working together can create a curriculum
Tools of the Trade:
- Wikis – e.g., through Moodle
- Second Life – Cuidad Bonita – see http://slclassmanagement.blogspot.com/
- Film clips (UC Berkley film archive – tagged, annotated)