A token is a continuous string of non-whitespace characters that are NOT:
#opendata)@infochimps)http://infochimps.org)$AAPL):)A wordbag is “a set of tokens that a particular Twitter user tweets more often than the average Twitter user.”
This is different than the tokens most tweeted by a particular user. For example, @infochimps tweets the token “the” quite often but not very much more or less than the average rate at which the token “the” is tweeted across all of Twitter. On the other hand, @infochimps tweets the tokens “viz”, “opendata”, “supercomputer”, and “cluster” much more often than most Twitter users. These tokens provide an at-a-glance summary of the kind of conversations @infochimps engages in.
Use case examples: build a word cloud for a Twitter user, find users interested in a particular topic, general targeted advertising by words used, build a library of closely related vocabulary to a particular keyword.
GET http://api.infochimps.com/soc/net/tw/wordbag.json?screen_name=[screen_name]
GET http://api.infochimps.com/soc/net/tw/wordbag.json?user_id=[user_id]
screen_name – a Twitter screen nameuser_id – a Twitter user IDvocab – the number of distinct tokens ever emitted by the usertoks – an array consisting of information on the 100 tokens most-used by the user, sorted by rel_freq. Each element of the array is a hash with the following keys:
tok – the tokenuser_freq_ppb – the frequency (in parts per billion) with which the user emits the tokenrel_freq – the ratio of the user’s frequency for this token to the average frequency for this tokentotal_usages – the number of tokens (not necessarily distinct) ever emitted by the userGET http://api.infochimps.com/soc/net/tw/wordbag.json?screen_name=infochimps&apikey=xxxxxxxxxxxxxxxxxxxxxxxxxxx{ "vocab": 666, "toks": [ { "tok": "nsfdvw", "user_freq_ppb": 816326.5, "rel_freq": 12144117.7175036 }, { "tok": "datageeknet", "user_freq_ppb": 816326.5, "rel_freq": 12144117.7175036 }, ... { "tok": "pioneers", "user_freq_ppb": 816326.5, "rel_freq": 395.43213448577 }, { "tok": "sb44", "user_freq_ppb": 816326.5, "rel_freq": 386.915495384347 } ], "total_usages": 1225 }