Saturday, March 10, 2012

Detour in taste wordclouds

I read Mining Twitter for consumer attitudes towards hotels in my feed of R-bloggers. That reminded me that I intended to look at generating wordclouds for salt and MSG at some point. Salt, or sodium is linked to hypertension, which is linked to some diseases http://en.wikipedia.org/wiki/Complications_of_hypertension. It is a topic within governments and health organizations, but I have the feeling it is not so much an issue in the public. MSG, or mono sodium glutamate, is not an issue for the governments of health organisations, but has a bad name and is for some linked to the chinese restaurant syndrom.  Luckily there was an nice post to follow: Generating Twitter Wordclouds in R.
Salt
Neither @Salt nor #Salt are good when interested in salt taste. Hence the search is for #sodium

sodium.tweets <- searchTwitter('#sodium',n=1500)
sodium.texts <- laply(sodium.tweets, function(x) x$getText())
head(sodium.texts)
[1] "#Citric Acid #Sodium Bicarbonate http://t.co/QgJxSlGT HealthAid Vitamin C 1000mg - Effervescent (Blackcurrant Flavour) - 20 Tablets"
[2] "I dnt understand metro I can go on Facebook an Twitter but I can't call or text anybody #sodium"                                    
[3] "Get the facts on #sodium:http://t.co/Djc9rTEl #BCHC @TheHSF"                                                                        
[4] "#Sodium: How to tame your salt habit now? http://t.co/eFTl8yI1"                                                                     
[5] "#lol #funny #insta #instafunny #haha #smile #meme #chemistry #joke #sodium  http://t.co/pX404RhQ"                                   
[6] "@Astroboii07 #sodium. Haha. Tas bisaya daw. i-sudyum. Hahaha.  @andiedote @krizhsexy @mjpatingo  #building"                         
At this point I found the blog twitter to wordcloud, so I restarted and used those functions. The original is from Using Text Mining to Find Out What @RDataMining Tweets are About. There was a small bit of editing. Require(tm) and require(wordcloud) within the functions did not work, so I called on the libraries directly. The clouds had some links in them, shown as 'httpt' with some more text added (link to a chemistry joke) a function to remove those is added too.
library(tm)
library(wordcloud)

RemoveAtPeople <- function(tweet) {
gsub("@\\w+", "", tweet)
}

RemoveHTTP <- function(tweet) {
gsub("http[[:alnum:][:punct:]]+", "", tweet)
}

generateCorpus= function(df,my.stopwords=c()){
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
tweets.grabber=function(searchTerm,num=500){
rdmTweets = searchTwitter(searchTerm, n=num,.encoding='UTF-8')
tw.df=twListToDF(rdmTweets)
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
as.vector(sapply(tweets,RemoveHTTP))
}



tweets=tweets.grabber('sodium',num=500)
tweets <- tweets[-308] # tweet in wrong locale
wordcloud.generate(generateCorpus(tweets,'sodium'),3)
The ugly line which removed tweed 308 is because this is in the wrong locale. It gave an error. This is an error which is not simple to resolve, so I removed the offending tweet: R tm package invalid input in 'utf8towcs'
Error in FUN(X[[308L]], ...) : 
  invalid input 'That was too much sodium 😞' in 'utf8towcs'



From the cloud we learn that even within the sodium tweets fat is as important as salt and linked. Using grep('fat',tweets,value=TRUE,ignore.case=TRUE)
"I'm higher then a fat bitch sodium"
"Beware of Salt! How to protect your health and shrink your fat. "
Low has some advertisements and behaviour
"Mothers Quick Cooking Barley, 11-Ounce Unit (Pack of 12): Quick cooking. Good source of fiber. Low fat, sodium f... "
"Chef's Pride Beef Flavored Base, Low Sodium, 16-Ounce Tubs (Pack of 12):"

"I cut alcohol for about two months &amp; started eating only natural/organic food. No processes junk & low sodium! I cook for myself"  
Blood has positive and negative tweets.

"RT : Reduce blood pressure by paying attention to the sodium content in the food you buy. The salt you add is minimal in comparison."
" this is corned beef season!!!! I'm ready for the mass amounts of sodium and the alarming spike in my blood pressure"  
Pressure even though not large, is also mixed with lighting


"Hydroponic Indoor Grow Light Bulb Lamp - 1000 Watt High Output HPS - High Pressure Sodium:  "                                        


"RT : Obesity, a high-salt, high-fat diet, and lack of regular exercise can all amp up the blood pressure. "                          
MSG
MSG is another word which cannot be used in a twitter search. It is an abbreviation of message. Hence the search is for glutamate. It was needed to remove the words msg and monosodium out of the feed on top of glutamate

tweetsMSG=tweets.grabber('glutamate',num=1500)
tweetsMSG <- tweetsMSG[-591]
wordcloud.generate(generateCorpus(tweetsMSG,c('glutamate','msg','monosodium')),3)
Stress is a bit of a surprise to me.  "Loss of glutamate receptor linked to negative effects of chronic stress " .
The negative words are much smaller, related to the story that glutamate is an excitotoxitin, which passes the bloodbrain barrier with all the negative effects of such. Surprising also a positive tweet in this context.
"How the Excitotoxins Glutamate and Aspartame Affect Our Health" 
"Glial cells may protect against excitotoxicity by transporting excess glutamate across the blood-brain barrier ". 
It is also told that glutamate is hidden.
"Hidden names for MSG - Hydrolyzed protein, glutamate, hydrolyzed soy, yeast extract, caseinate, spices, natural flavorings, vinegar powder". 
I feel sure these all have links which are stripped from the texts. It is clear there is an active group of people looking at MSG. To repeat, this is not so much from health authorities, check for instance the WHO.
NatitionalNutritionMonth
NatuionalNutritionMonth is a word I encountered examining the tweets. What do we learn about it?

tweetsNNM=tweets.grabber('#NationalNutritionMonth',num=1500)
wordcloud.generate(generateCorpus(tweetsNNM,'nationalnutritionmonth'),3)
From the cloud I understand that March is the national nutrition month. It is about healthy eating, nutrition and eating the right things. Nothing about MSG some tweets about salt"RT . It's #NationalNutritionMonth: how much fat, salt, and sugar should you eat? " Clearly this is pushed by health authorities, but that is obvious from the word NationalNutrionMonth already.

No comments:

Post a Comment