What's a good way to select a random set of twitterers?

Considering the set of Twitter users "nodes" and the relation u follows v as the "edges", we have a graph from which I would like to select a subset of the users at random. I could be wrong, but from reading the API docs I think it's impossible to get a collection of users except by getting the followers or friends of an already-known user. So, starting from myself and exploring the Twitter graph from there, what's a good way to select a random sample of (say 100) users?

I. J. Kennedy asked Feb 6, 2010 at 3:08 I. J. Kennedy I. J. Kennedy 25.7k 17 17 gold badges 64 64 silver badges 87 87 bronze badges

7 Answers 7

I would use the numerical user id . Generate a bunch of random numbers, and fetch users based on that. If you hit a nonexistent id, simply skip that.

The Twitter API wiki, for users/show:

id. The ID or screen name of a user.

answered Feb 6, 2010 at 3:58 3,049 1 1 gold badge 22 22 silver badges 33 33 bronze badges Thanks. Do you know the range of numerical user ids? Commented Feb 7, 2010 at 0:50

You could create a new account, and see what id it gets (easiest to look at the RSS feed url, which includes the user id). My user id is ~1200, so I guess they started at 1 (or near that).

Commented Feb 7, 2010 at 2:55 If you can figure out a structure of the IDs, this is probably a very good option. – Aryabhatta Commented Feb 7, 2010 at 6:58

This will only work if the range of numerical IDs has no holes, or if the distribution of holes in the IDs is uniform across the range of IDs. If there is a non-uniform distribution of holes in the ID range, then generating random IDs and skipping invalid ones (holes) will result in a biased sample of users. Imagine there are more holes the higher you go in the ID range (non-uniform distribution of holes). If you select 100 random IDs in the range, your sample will be biased toward low-ID users. This could be a big problem if user ID correlates to some other user trait you care about.

Commented Feb 25, 2010 at 22:43

This works ok when Twitter IDs were 32 bit (pre-2015). They are now 64 bit and such a strategy is too inefficient.

Commented Jun 16, 2019 at 3:00

Twitter's streaming API has an endpoint called "Sample" which Returns a small random sample of all public statuses (cf. https://dev.twitter.com/docs/api/1.1/get/statuses/sample)

Authors twitter Ids are returned with the tweets, so this would get you random active twitter users.

answered Jan 3, 2014 at 16:31 7,377 1 1 gold badge 34 34 silver badges 47 47 bronze badges

You can use GET statuses/sample to get a continuos stream of tweets from twitter being posted while your code is executing. You can then extract the user (tweeter) from the tweet information received

Here is the python code to do so using the Python twitter api

import twitter f=open("account","r") #this file should contain "consumer_key consumer_secret access_token_key access_token_secret" acc=f.read().split() f.close() api=twitter.Api(consumer_key=acc[0], consumer_secret=acc[1], access_token_key=acc[2], access_token_secret=acc[3]) lis = api.GetStreamSample() cnt = 0 userIDs = [] for tweet in lis: # stop after getting 100 tweets. You can adjust this to any number if cnt == 100: break; cnt += 1 userIDs.append(tweet['user']['id']) userIDs = list(set(userIDs)) # To remove any duplicated user IDs print userIDs