Documentation

Documentation on Various Techniques

Using IBM's Watson Analytics for Basic Twitter Analysis

Mapping a Drive to Vidia (Windows)

Search Social Media using Trackur.com

Coding Tweets in Excel

Analyzing Text in IBM SPSS Modeler with Text Analytics

Creating a Word List using RapidMiner

K-Means Cluster Analysis in RapidMiner

Harvesting Twitter Data Using IBM® SPSS® Modeler 16

Basic computation of tf-idf is defined in 6.2.1 here.

There are various ways to weight it, but here's the essential rundown: it's the importance of a word in a document, relative to a collection of documents.

term frequency
==========
The term frequency, tf, for a term in a document is the number of times the term appears in a document.

inverse document frequency
==================
The idf, inverse document frequency, of term t, is computed relative to a collection of documents as:

idf(for term t) = log (number of documents in collection/number of documents in collection containing term t).

So, the idf of a rare term is high, whereas the idf of a frequent term is low.

tfidf
===
tf-idf provides a way to weight the importance of each term in each document. Equation 6.8 in the above reference gives:

tf-idf(of term t in document d) = (term frequency for the term in document d) x (idf of term t).

Interpretation
=========

The tf-idf weights the importance of term t in document d. It is:

1. Highest when t occurs many times within a small number of documents in the collection (thus lending high discriminating power to these documents)

2. Lower when the term appears fewer times in a document, or occurs in many documents in the collection (so it is less important for discrimination)

3. Lowest when t occurs in most documents in the collection.

SOC 260: Social Class VIDIA Videos

The videos below are for Dr. Lowe's SOC 260 class. They may be useful to others but you should note they are specific to this class.

How to login to VIDIA and Run Rapid Miner (video)

How to Load a CSV file into Rapid Miner on VIDIA (video)
This video shows the import of a Twitter text file for Dr. Lowe's SOC 260 - Fall 2014 class.

How to Import a Process into Rapid Miner, make a minor Edit to it and Run it (video)
This video shows the import and editing of a process for Dr. Lowe's SOC 260 - Fall 2014 course

How to Select Key Terms for Extraction in Twitter Analysis (video)

How to Prune Results in Twitter Analysis (video)

PHIL 230: Environmental Ethics

The videos below are for Dr. Koedderman's PHIL 230 class.

How to Load the Hydraulic Fracturing Twitter Dataset into Rapid Miner

How to Import and Process to Analyze the Twitter Dataset into Rapid Miner

How to Do Some Basic Analysis, Pruning and Visualization
Which includes how to fix the infamous Write Output.CSV Permissions Error

How to Import and Use a Basic Text Processing Stream for IBM's Modeler

POLS 279: Religion and Politics

The videos below are for Dr. Heindl's POLS 279 class.

How to Import Twitter Data to Include Date and Time Information

Example Rapid Miner Process to Select Tweets by Date

SOC 390: Senior Seminar in Sociology

How to Use the Modeler Stream File Provided

How to Capture the Data for SOC 390

Modeler Stream File for Analysis

Using Flickr:

Back to top