Railsmagazine60x60 Using the Twitter API with Ruby

by Brian Rivard

Issue: Winter Jam

published in December 2009

Brian rivard

Brian Rivard is the CTO at Semcasting, Inc. in Andover, Massachusetts. Brian has been working with Ruby on Rails since 2005 and has been doing software development for over 25 years. He holds BS and MS degrees in Computer Science from the University of Massachusetts, and an MBA degree from Northeastern University.

Brian lives on Plum Island off the coast of Massachusetts with his wife and has twin daughters in college. Visit his website at http://www.BrianRivard.com, follow him on Twitter at @brianrivard, or drop him an email at brian@rivard.us.

Introduction

In this article I will talk about some of the basics of using the Twitter API, how to call it with Ruby, and then work through a few examples to give you an idea of the power that is available to you as an application developer. There are countless applications out there taking advantage of the API, and people are inventing new uses for that vast collection of data every day.

API Documentation

The official Twitter API documentation can be found at http://apiwiki.twitter.com/Twitter-API-Documentation. The folks over at Twitter update it regularly so it is a good idea to keep a link to it handy. The API is very much a work in progress so expect to see updates and new capabilities frequently.

Overview

Currently, there are two separate Twitter APIs. This is due to historical reasons and does not overly affect the programmer, with one big exception which I will describe later. These groups are:

  • Search API Methods
  • REST API Methods

Within the Search API, you get APIs to do searching and get trend data.

There are several classes of REST APIs:

  • Timeline Methods
  • Status Methods
  • User Methods
  • Direct Message Methods
  • Friendship Methods
  • Social Graph Methods
  • Account Methods
  • Favorite Methods
  • Block Methods
  • Saved Searches Methods
  • OAuth Methods
  • Help Methods

As you can see, there are many things you can do with the Twitter API. Some of these APIs require authentication, though many do not. But before we get into the details, there are some general things you should know about using the API.

Good Things to Know about Using the Twitter API

Twitter is popular right now. Very popular. And because of that there are a lot of developers out there putting a lot of stress on Twitter's servers. In order to keep things under control, Twitter enforces a limit of 150 API requests per hour per IP address (with a few exceptions). In the big scheme of things this should be enough for many applications, but if not they have a process by which you can get whitelisted to be allowed 20,000 API requests per hour. Twitter also imposes limits on how many direct messages, updates, and follow requests that you can make during the course of a day. You can find an explanation of their policy in excruciating detail at http://apiwiki.twitter.com/Rate-limiting.

When requesting data using the Twitter API, you can get results returned to you in different formats: XML and JSON, along with RSS and Atom syndication formats. Not all formats are available for all APIs.

Some APIs take required and/or optional parameters. When passing parameters, be sure to convert them to UTF-8 and URL encode them. I will give you an example of a routine to do URL encoding in just a bit.

Certain APIs have limits as to how much data they return in a single request. Some also allow you to specify how much data you want in a single request using the rpp parameter. To get the additional data, you can make subsequent calls using the page parameter. Each API limits how much total data you can retrieve. For example the search method of the Search API allows you to request up to 100 status updates per page and allows you to request multiple pages in order to retrieve a maximum of about 1500 status updates. These numbers differ per API.

Using the Twitter API with Ruby

Enough of the general stuff. Let's get down to some of the nitty gritty. For purposes of this discussion, we will be requesting data in the JSON format, so you will need to install the json gem if you do not already have it installed:

gem install json

This makes it simple to read the various fields returned in the output of the call. We will start by creating a routine that takes a complete URL, makes the API request, and parses the JSON result into something we can use.

require 'open-uri'

require 'json'

 

  # Returns the data in JSON format from the specified URL
  # or nil if an error occurs. 
  #
  def get_json(url)
    begin
      # Submit the request.
      f = open(url, "UserAgent" => "YourAgentName")
      html = f.read
      f.close

      # Parse the JSON result.
      ret = JSON.parse(html)   
    rescue
      # Return nil on an error.
      ret = nil
    end

    # Return the result.
    ret
  end

Then you can use this routine to call all of the Twitter APIs. For example, we can call the Twitter account/rate_limit_status API to find out how many API requests are available to us before the hourly limit is reached. To do this, you would simple make the following call:

result = get_json "http://twitter.com/account/rate_limit_status.json"

Which would return something like the following:

{

  "remaining_hits":150,

  "hourly_limit":150,

  "reset_time_in_seconds":1254538441,

  "reset_time":"Sat Oct 03 02:54:01 +0000 2009"

}

Then to display the results, we could do this:

minutes = (result['reset_time_in_seconds'].to_i - Time.now.to_i) / 60

puts "Current Time: #{Time.now}"

puts "Reset time: #{result['reset_time']}" +

" (#{minutes} minutes from now)"

puts "Remaining hits: #{result['remaining_hits']}"

puts "Hourly limit: #{result['hourly_limit']}"

In case you are curious, if you were to make the same call requesting XML instead as follows:

result = get_json "http://twitter.com/account/rate_limit_status.xml"

You would get back something similar to this:

<?xml version="1.0" encoding="UTF-8"?>

<hash>

  <hourly-limit type="integer">150</hourly-limit>

  <reset-time-in-seconds type="integer">1254674626</reset-time-in-seconds>

  <reset-time type="datetime">2009-10-04T16:43:46+00:00</reset-time>

  <remaining-hits type="integer">148</remaining-hits>

</hash>

It is worth noting that this particular API is not subject to rate limiting. So you can use it whenever you need to make sure you have not hit the maximum number of calls for the hour, either proactively or in response to an error. Twitter prefers that your application police itself and stop hitting their servers once it has reached the rate limit. Abusing the Twitter servers by flooding them with too many API calls beyond your limit can result in your application being blocked.

Let's look at another simple example. You can retrieve extended information about a Twitter user with the users/show API method. You can make this request in one of three ways, by embedding the Twitter screen name or Twitter ID of the user in the URL or by using either the user_id or screen_name parameters. For example, the URL for the first method may look like this:

http://twitter.com/users/show/railsmagazine.json

This particular API supports either JSON or XML formats. Here we have specified JSON as the extension. We could have also used a Twitter ID instead to get the same result:

http://twitter.com/users/show/16105820.json

The JSON output of either of these requests would look something like this:

{

  "profile_background_tile":false,

  "description":"The first free magazine dedicated to the Ruby on Rails community. Currently accepting article submissions for the premiere edition!",

  "profile_background_color":"EBEBEB",

  "url":"http://railsmagazine.com",

  "status":

  {

    "truncated":false,

    "in_reply_to_status_id":null,

    "source":"web",

    "created_at":"Fri Sep 04 05:05:30 +0000 2009",

    "favorited":false,

    "in_reply_to_user_id":null,

    "in_reply_to_screen_name":null,

    "geo":null,

    "id":3750899201,

    "text":"Rails Magazine #4 free at http://railsmagazine.com/issues/4. Interviews w/ @yukihiro_matz @dhh @wycats @tom_enebo tech articles & more"

  },

  "following":false,

  "notifications":false,

  "time_zone":"Eastern Time (US & Canada)",

  "favourites_count":1,

  "profile_sidebar_fill_color":"F3F3F3",

  "verified":false,

  "created_at":"Wed Sep 03 00:49:29 +0000 2008",

  "friends_count":0,

  "followers_count":464,

  "statuses_count":15,

  "profile_sidebar_border_color":"DFDFDF",

  "geo_enabled":false,

  "protected":false,

  "profile_image_url": "http://a3.twimg.com/profile_images/61608749/RailsMagazine100x100_normal.jpg",

  "profile_text_color":"333333",

  "location":"",

  "name":"Rails Magazine",

  "profile_background_image_url": "http://a1.twimg.com/profile_background_images/3591238/rormag_no_wires.png",

  "screen_name":"railsmagazine",

  "id":16105820,

  "utc_offset":-18000,

  "profile_link_color":"990000"

}

Since some Twitter IDs may also be valid screen names, you can avoid ambiguity by specifying the user using either the user_id or screen_name parameters:

http://twitter.com/users/show.json?user_id=16105820

or

http://twitter.com/users/show.json?screen_name=railsmagazine

and get back the same result.

Searching with the Twitter Search API

Now let's move on to a more complicated example using the search API. This request returns status updates matching a number of optional parameters. Note that because search is part of the Search API, the URL is a little bit different. This will change in Version 2 of the Twitter API, but for now just remember that the base URL for Search APIs is http://search.twitter.com, not http://twitter.com.

For starters, the q parameter allows you to search for tweets containing specific words. If you wanted to search for status updates containing the word “rails” you could create a URL of the form:

http://search.twitter.com/search.json?q=rails

Since you probably do not want to collect status updates about people working on their decks or riding trains, you could make your search more specific by adding additional words. Search operators can be used with API queries. So to search for status updates containing the words “ruby” and “rails”, your URL would look like this:

http://search.twitter.com/search.json?q=ruby+AND+rails

Remember, your search parameters need to be URL encoded. Let's take a moment and write a routine that will do this work for us.

# URL encode a string.

#

def self.url_encode(string)

# Just in case we want to encode something other than text.

s = string.to_s

chars =

{

  '%' => '%25', ' ' => '%20', '!' => '%21',

  '*' => '%2A', '"' => '%22', '\'' => '%27',

  '(' => '%28', ')' => '%29', ';' => '%3B',

  ':' => '%3A', '@' => '%40', '&' => '%26',

  '=' => '%3D', '+' => '%2B', '$' => '%24',

  ',' => '%2C', '/' => '%2F', '?' => '%3F',

  '#' => '%23', '[' => '%5B', ']' => '%5D'

}

encoded_string = ''

 

# Process each character.

0.upto(s.size-1) do |i|

 

# If this is a special character, replace

# it with the proper encoding.

if chars.has_key?(s[i,1])

  encoded_string += chars[s[i,1]].to_s

 

# If not, just use the character.

else

  encoded_string += s[i,1].to_s

end

end

 

# Return the encoded string.

encoded_string

end

There are several other parameters that are supported by the search API request. The page parameter allows you to get multiple pages of results. The start_date and end_date parameters allow you to specify a range of days to search. Dates need to be of the form YYYY-MM-DD. A limitation of the date parameters is that you can only search back in time about a week and a half, and this amount varies depending on the load of Twitter's servers. Another useful parameter supported by this API is geocode, which lets you specify lat, long, and radius (in miles or kilometers). This allow you to specify an area around a spot on the globe and only search status updates from that area. You can also specify the language you are looking for with the lang parameter, as well as the number of results you want with the rpp parameter. By combining all of these parameters into your search API request, you can create very specific search patterns to suit your needs.

IMPORTANT CAVEAT: The user IDs you get from the Search API are different than those you get from the REST API. This goes back to the fact that the Search API was developed independently of the REST API. Luckily, you also get back the user's screen name, which is consistent over both APIs. So if you need the correct user ID of the author of a particular status update, you will need to do a user_id-based lookup with the users/show REST API.

Let's put this into practice. Suppose we want to search the Boston area for the 10 most recent status updates concerning the Boston Bruins, and we only want those that are in English. Our URL would look like this:

http://search.twitter.com/search.json?q=Boston%20Bruins&rpp=10&lang=en

It is easy enough to write simple queries like this, but if we are going to be embedding this functionality into an application we will want to write a routine that allows us to easily do generic queries using any of the supported parameters. But before we do that, we will want to first create a routine to format multiple parameters properly in order to keep our code clean.

The following routine accepts a hash of parameter names and their values, and formats them into a search string, while properly URL encoding the values of the parameters:

# Format a hash of parameters in the form: ?param1=value1&param2=value2...

#

def format_params(params)

  ret = '?'

  params.each { |name, value| ret += "#{name}=#{url_encode(value)}&" if value }

  ret.chop

end

You can pass this routine a hash with any parameters you want, some of whose values may be nil, and it will format them properly. For example, you can write:

params = Hash.new

params['q'] = “Boston Bruins”

params['rpp'] = '10'

params['lang'] = 'en'

params['geocode'] = '42.3323,-71.0167,25mi' # lat,long,radius (in miles)

and then call:

format_params(params)

to get back the formatted search string. Put it together with the search URL and you have:

http://search.twitter.com/search.json?q=Boston%20Bruins&lang=en&rpp=10& geocode=42.3323,-71.0167,25mi

Finally, let's write the generic search routine:

  # This routine performs a search for status updates containing
  # the specified words (or phrases) that satisfy the rest of the
  # (optional) parameters.
  #
  # Inputs:
  #   words - An Array of Strings containing words or
  #     phrases to search for. These will be properly
  #     URL encoded for the search.
  #   page - The page number. This can be a value from 1 to 15.
  #   start_date - A string of the form YYYY-MM-DD specifying
  #     the earliest date to search. Note: this can
  #     be no earlier than about 10 days prior to
  #     the current date.
  #   end_date - A string of the form YYYY-MM-DD specifying
  #     the latest date to search. Note: this can
  #     be no later than the current date.
  #   lat - A latitude in decimal format.
  #   long - A longitude in decimal format.
  #   radius_in_miles - A radius in miles.
  #
  # Outputs:
  #   JSON representation of the search result.
  #
  # Notes:
  #   - You may specify both a start and an end date, or just
  #     one or the other.
  #   - If you specify dates, the search results start at the
  #     most recent and work backwards.
  #   - If you want to search around a location, you must
  #     all three of lat, long, and radius in miles.
  #   - Only English status updates are returned.
  #   - Up to 100 status updates are returned.
  #
  def self.search(words,
                  page = 1,
                  start_date = nil,
                  end_date = nil,
                  lat = nil,
                  long = nil,
                  radius_in_miles = nil)

    encoded_words = []
    words.each { |word| encoded_words << url_encode(word) }
   
    params = Hash.new
    params['rpp'] = '100'
    params['lang'] = 'en'
    params['page'] = page
    params['since'] = start_date
    params['until'] = end_date
   
    if lat && long && radius_in_miles
      params['geocode'] = "#{lat},#{long},#{radius_in_miles}mi"
    end
   
    get_json( "http://search.twitter.com/search.json#{format_params params}&q=#{encoded_words.join('+OR+')}")
  end

Ruby GEMs for accessing the Twitter API

If coding your own access to the Twitter API does not appeal to you, there are a few gems available out there that encapsulate much of the functionality for you. Most notable of these are the aptly named Twitter gem, along with Twitter4R. You can find out more about these gems at http://twitter.rubyforge.org and http://twitter4r.rubyforge.org respectively. For a more complete list of Ruby gems for Twitter, see http://apiwiki.twitter.com/Libraries#Ruby.

Future of the Twitter API

The team over at Twitter have big plans for their API in the future, not the least of which is merging the Search and REST APIs. You can see the full V2 Roadmap at http://apiwiki.twitter.com/V2-Roadmap.

Conclusion

In this article, we looked at some of the potential and limits of the Twitter API and how to access it. However we really barely scratched the surface of what you can do. I suggest going to the Twitter API reference on-line and digging more deeply. This is a powerful API that is made even more powerful by the imagination and creativity of developers like yourselves who create applications on top of the Twitter platform.

Developer Resources

Twitter does a good job at keeping developers informed of developments and changes to their API. There are several resources available to developers, such as groups like the Twitter API Announcements Google group (http://groups.google.com/group/twitter-api-announce) and the Twitter Development Talk Google group (http://groups.google.com/group/twitter-development-talk). You can also follow the Twitter API user called @twitterapi on Twitter itself. And you can keep track of changes to the REST API at http://apiwiki.twitter.com/REST-API-Changelog.