Documentation on Various Techniques
Using IBM's Watson Analytics for Basic Twitter Analysis
Mapping a Drive to Vidia (Windows)
Search Social Media using Trackur.com
Analyzing Text in IBM SPSS Modeler with Text Analytics
Creating a Word List using RapidMiner
K-Means Cluster Analysis in RapidMiner
Harvesting Twitter Data Using IBM® SPSS® Modeler 16
Basic computation of tf-idf is defined in 6.2.1 here.
There are various ways to weight it, but here's the essential rundown: it's the importance of a word in a document, relative to a collection of documents.
term frequency
==========
The term frequency, tf, for a term in a document is the number of times the term appears in a document.
inverse document frequency
==================
The idf, inverse document frequency, of term t, is computed relative to a collection of documents as:
idf(for term t) = log (number of documents in collection/number of documents in collection containing term t).
So, the idf of a rare term is high, whereas the idf of a frequent term is low.
tfidf
===
tf-idf provides a way to weight the importance of each term in each document. Equation 6.8 in the above reference gives:
tf-idf(of term t in document d) = (term frequency for the term in document d) x (idf of term t).
Interpretation
=========
The tf-idf weights the importance of term t in document d. It is:
1. Highest when t occurs many times within a small number of documents in the collection (thus lending high discriminating power to these documents)
2. Lower when the term appears fewer times in a document, or occurs in many documents in the collection (so it is less important for discrimination)
3. Lowest when t occurs in most documents in the collection.
SOC 260: Social Class VIDIA Videos
The videos below are for Dr. Lowe's SOC 260 class. They may be useful to others but you should note they are specific to this class.
How to login to VIDIA and Run Rapid Miner (video)
How to Load a CSV file into Rapid Miner on VIDIA (video)
This video shows the import of a Twitter text file for Dr. Lowe's SOC 260 - Fall 2014 class.
How to Import a Process into Rapid Miner, make a minor Edit to it and Run it (video)
This video shows the import and editing of a process for Dr. Lowe's SOC 260 - Fall 2014 course
How to Select Key Terms for Extraction in Twitter Analysis (video)
How to Prune Results in Twitter Analysis (video)
PHIL 230: Environmental Ethics
The videos below are for Dr. Koedderman's PHIL 230 class.
How to Load the Hydraulic Fracturing Twitter Dataset into Rapid Miner
How to Import and Process to Analyze the Twitter Dataset into Rapid Miner
How to Do Some Basic Analysis, Pruning and Visualization
Which includes how to fix the infamous Write Output.CSV Permissions Error
How to Import and Use a Basic Text Processing Stream for IBM's Modeler
POLS 279: Religion and Politics
The videos below are for Dr. Heindl's POLS 279 class.
How to Import Twitter Data to Include Date and Time Information
Example Rapid Miner Process to Select Tweets by Date
SOC 390: Senior Seminar in Sociology
How to Use the Modeler Stream File Provided
How to Capture the Data for SOC 390
Modeler Stream File for Analysis
Using Flickr:
Creative Commons
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.