Wednesday, October 16, 2013

Finding interested Hadoop users in the vicinity using twitteR

In an earlier blog article, we looked at how to get the Tweets using Flume, put them into HDFS and then analyze it using Hive. Now, we will try to get some more interesting data from Twitter using R.

It's nice to know others with the same interests as we have within our vicinity. One way is to find out who Tweeted about a certain topic of interest in the vicinity. Twitter provides location feature which attaches the location information as meta along with the Tweets. This feature is off by default in the Twitter settings and here is how to opt in.

R provides a nice library (twitteR) to extract the Tweets and one of the option is to specify the geo location. Here is the API documentation for the twitteR package. With the assumption that R has been installed as mentioned in the earlier blog, now is the time to install the twitteR package and get get Tweets.

Here are the steps.

- First `libcurl4-openssl-dev` package has to be installed.
sudo apt-get install libcurl4-openssl-dev
- Create an account with and create a new application here and get the `Consumer Key` and the `Consumer Secret`.
- Install the twitteR package from the R shell along with the dependencies.
install.packages("twitteR", dependencies=TRUE)
- Load the twitteR library in the R shell and call the getTwitterOAuth function with the `Consumer Key` and the `Consumer Secret` got from the earlier step.
getTwitterOAuth("Consumer Key", "Consumer Secret")

- The previous step will provide a URL, copy the URL to the browser and a PIN will be provided.
- Copy the PIN and paste it back in the console and now we are all set to get Tweets using the twitteR package.

- Goto the Google Maps and find your current location. In the URL, the latitude/longitude will be there.!q=60532&data=!4m15!2m14!1m13!1s0x880e5123b7fe36d9%3A0xbedcc6bcaa223107!3m8!1m3!1d22176155!2d-95.677068!3d37.0625!3m2!1i1317!2i653!4f13.1!4m2!3d41.794402!4d-88.0803051

- Call the searchTwitter API with the above latitude/longitude and topic of interest to get the Tweets to the console.
searchTwitter('hadoop', geocode='41.794402,-88.0803051,5mi',  n=5000, retryOnRateLimit=1)
If you are providing some sort of service, you can now easily find others who are interested in your service in your vicinity.

1) Here is an interesting article on doing Facebook analysis using R.

No comments:

Post a Comment