Wordbag

Get Started With This Data

A token is a continuous string of non-whitespace characters that are NOT:

A wordbag is “a set of tokens that a particular Twitter user tweets more often than the average Twitter user.”

This is different than the tokens most tweeted by a particular user. For example, @infochimps tweets the token “the” quite often but not very much more or less than the average rate at which the token “the” is tweeted across all of Twitter. On the other hand, @infochimps tweets the tokens “viz”, “opendata”, “supercomputer”, and “cluster” much more often than most Twitter users. These tokens provide an at-a-glance summary of the kind of conversations @infochimps engages in.

Use case examples: build a word cloud for a Twitter user, find users interested in a particular topic, general targeted advertising by words used, build a library of closely related vocabulary to a particular keyword.

API Call

Parameters

GET http://api.infochimps.com/soc/net/tw/wordbag.json?screen_name=[screen_name]
GET http://api.infochimps.com/soc/net/tw/wordbag.json?user_id=[user_id]

Returns

Example

GET http://api.infochimps.com/soc/net/tw/wordbag.json?screen_name=infochimps&apikey=xxxxxxxxxxxxxxxxxxxxxxxxxxx

  {
    "vocab": 666,
    "toks": [
      {
	"tok": "nsfdvw",
	"user_freq_ppb": 816326.5,
	"rel_freq": 12144117.7175036
      },
      {
	"tok": "datageeknet",
	"user_freq_ppb": 816326.5,
	"rel_freq": 12144117.7175036
      },
      ...
      {
	"tok": "pioneers",
	"user_freq_ppb": 816326.5,
	"rel_freq": 395.43213448577
      },
      {
	"tok": "sb44",
	"user_freq_ppb": 816326.5,
	"rel_freq": 386.915495384347
      }
    ],
    "total_usages": 1225
  }