Exploring Facebook Graph API with Python
In this post, I propose to explore Facebook Graph API with Python. The use case will be gathering some information and do some analysis on some of your (mine in my case 😉) friends’ posts, with the aim of discovering what words they use the most.
UPDATE: 2020-04-12
As of today, the method I used below doesn’t work anymore as Facebook has decided to further restrict access to its users data (not a bad thing actually). Now your app needs to explicitly requests access your friends’ data. You can find out more here.
Setting up the working environment
To do this, I will use facebook-sdk. If you wish to follow along with the post, go ahead and install the sdk (after creating a virtual env of course):
python -m pip install facebook-sdk
Now that the sdk is installed, we can start working with it. I skip the details of creating a facebook app. To know how to do that, go see here.
After the app  is created, you can get an  access token from
facebook’s                  graph                  explorer
tool. This
will get you started quickly without bothering yourself with
generating client access  key and secret. Make  sure you get
the  token  for the  right  set  of permissions.   For  this
experiment, you only  need public_profile, user_friends,
user_status and user_posts.
After you get your access token, use the following code to get started with a configured facebook graph object:
import facebook
from getpass import getpass
token = getpass('access token from graph explorer:')
graph = facebook.GraphAPI(access_token=token, version="2.12")c
The code will ask for your graph explorer access token as if
it was a password (it won’t  be printed in the console), and
then   will    configure   the   GraphAPI    object   from
facebook-sdk package. Note that  we specify the version of
the official  Facebook graph  API, with  version="2.12" so
that the package knows which requests to make to facebook to
get what you need.
Warming up
Now we can start looking around. First, let see how to get the list of your friends:
friends = graph.get_connections(id='me', connection_name='friends')
You can  see we  use the  get_connections method  from the
GraphAPI  object. You  have  to specify  the  id of  the
person whose friends you want to  get. In this case, you use
me to specify  that it’s for your own  friends. The second
parameter   is  a   connection_name   which   is  set   to
"friends". We  will explain  why this parameter  is called
like that later.
To understand how Facebook Graph API is organised, we have to go and rtfd. If you are a good guy (I have no doubt you are), you read it and saw that facebook says its api is made of nodes, edges (like any graph structure) and fields. Nodes are individual objects linked to many other objects through edges, and having fields, like that:
So for example, we could have this representation for Me and some of my friends:
If we  make a request  to an edge,  say friends, we  get a
collection of nodes. And this  request should be done from a
node. Nothing really difficult to understand then.
Since in this  session, what we need is to  get the posts of
our friends, we need the top node (that’s the id parameter
of  get_connections with  value  "me"). And  we need  to
request  the friends  edge or  connection (that’s  why the
second parameter is called connection_name).
Start building real stuff
To have  access to  our friends’ posts,  we need  to request
their name and id, so that later on, we can request for each
one  of them  the posts  edge. That  is the  logic of  the
following  piece of  code. We  also add  the possibility  to
store the collected posts in  a database, waiting to be used
for the analysis.
Fine, here goes the code:
class Person(object):
    def __init__(self, name, facebook_id):
        self.name = name
        self.facebook_id = facebook_id
        
    def __repr__(self):
        return '<Person({name}, {facebook_id})>'.format(**self.__dict__)
import sqlite3
connection = sqlite3.connect('fb_people_db.sqlite3')
PEOPLE_TABLE = 'people'
def initialise_db(table, fields):
    cursor = connection.cursor()
    cursor.execute('create table if not exists {} ({})'.format(table, ','.join(fields)))
    connection.commit()
    cursor.close()
def save_people(people):
    save_in_db(PEOPLE_TABLE, [(person.name, person.facebook_id) for person in people])
    
def initialise_people_table():
    initialise_db(PEOPLE_TABLE, ['name, facebook_id'])
    
def save_in_db(table, objects):
    cursor = connection.cursor()
    cursor.executemany(
        'insert into {} values ({})'.format(table, ','.join('?' * len(objects[0]))),
        objects)
    connection.commit()
    cursor.close()
initialise_people_table()
people = [
    Person(friend['name'], friend['id'])
    for friend in friends['data']
]
save_people(people)
That’s a good start. You can copy/paste the following url that I tested for you into the Graph Explorer tool mentioned above to see the result. I use it below to get the fields to request for our purpose:
explorer_url = 'me/posts?fields=description,shares,type,caption,created_time,is_hidden,message,message_tags,name,properties,application,feed_targeting,from,is_popular,source,story,status_type,via,target,attachments{description,description_tags,type,url},comments{comment_count,id,like_count,user_likes},likes{name,id,username},reactions&include_hidden=true'
fields = explorer_url.split('=')[-2].rstrip('&include_hidden')
Now we can build up a Python dict with our friends’ id and name as key and the list of their posts as value:
friends_posts = {
    (person.facebook_id, person.name): graph.get_connections(
        id=person.facebook_id, connection_name='posts', fields=fields, include_hidden=True)['data']
    for person in people
}
You can  see we  use the  same method as  above, with  2 new
parameters  to specify  the fields  to return  and ask  to
include_hidden fields.
Building further
Now Let’s do some OO modeling for some of the nodes returned by the latest call to the graph API. We use the excellent attrs package to make the code cleaner:
import attr
@attr.s
class Attachment(object):
    description = attr.ib()
    description_tags = attr.ib()
    type = attr.ib()
    url = attr.ib()
    
@attr.s
class Comment(object):
    id = attr.ib()
    like_count = attr.ib()
    user_likes = attr.ib()
    
@attr.s
class Like(object):
    id = attr.ib()
    liker_name = attr.ib()
    #liker_username = attr.ib()
@attr.s
class Post(object):
    id = attr.ib()
    description = attr.ib()
    type = attr.ib()
    caption = attr.ib()
    publication_date = attr.ib()
    is_hidden = attr.ib()
    message = attr.ib()
    message_tags = attr.ib()
    name = attr.ib()
    properties = attr.ib()
    author = attr.ib()
    source = attr.ib()
    story = attr.ib()
    status_type = attr.ib()
    attachements = attr.ib(default=attr.Factory(list))
    comments = attr.ib(default=attr.Factory(list))
    likes = attr.ib(default=attr.Factory(list))
    reactions = attr.ib(default=attr.Factory(list))
    
    @classmethod
    def from_record(cls, record):
        attachments = [
            Attachment(
                att.get('description', ''), 
                att.get('description_tags', []), 
                att.get('type', ''), 
                att.get('url', ''))
            for att in record.get('attachments', {}).get('data', {})
        ]
        
        comments = [
            Comment(
                c.get('like_count', 0),
                c['id'],
                c.get('user_likes', False))
            for c in record.get('comments', {}).get('data', {})
        ]
        
        likes = [
            Like(
                l['id'],
                l.get('name', l.get('username', '')))
            for l in record.get('likes', {}).get('data', {})
        ]
        
        return cls(
            record.get('id'),
            record.get('description'),
            record.get('type'),
            record.get('caption'),
            record.get('created_time'),
            record.get('is_hidden'),
            record.get('message', ''),
            record.get('message_tags', ''),
            record.get('name'),
            record.get('properties'),
            record.get('from'),
            record.get('source', ''),
            record.get('story', ''),
            record.get('status_type'),
            attachments,
            comments,
            likes,
            record.get('reactions', {})
        )
        
        
Now we have some Records containers. We use it like that (build a post object from the first post of a given friend):
post = Post.from_record(friends_posts[(friend_id, friend_name)][0])
And now comes the time for fun
Well, now comes the time for some analysis! There are very well developed packages out there for that, the most famous being maybe nltk library for natural language analysis. But we won’t do any of that. For now, let’s do some basic word counting with Python’s builtin batteries:
from collections import Counter
nbr_likes = len(post.likes)
word_occurrences = Counter(post.message.split())
The above snippet uses Counter to count the number of word occurences in a single message from the post we built previously.
Now let’s scale that snippet to all messages of all our friends’ posts to have their total word occurences:
total_word_occurences = Counter(
    word
    for posts in friends_posts.values()
    for record in posts
    for word in Post.from_record(record).message.split()
)
To classify that from the most  used words to the least, you
simply call most_common from the Counter object:
total_word_occurences.most_common()
Assembling Everything
Now we have a very basic app that can get our friends’ posts and show us what are the most common words used in them. I have recapitulated the entire program below for you to copy/paste as you wish (I know, I am a good guy) - Be aware that all credits go to the original authors of all the libraries I used here.
from collections import Counter
from getpass import getpass
import sqlite3
import attr
import facebook
connection = sqlite3.connect('fb_people_db.sqlite3')
PEOPLE_TABLE = 'people'
class Person(object):
    def __init__(self, name, facebook_id):
        self.name = name
        self.facebook_id = facebook_id
        
    def __repr__(self):
        return '<Person({name}, {facebook_id})>'.format(**self.__dict__)
@attr.s
class Attachment(object):
    description = attr.ib()
    description_tags = attr.ib()
    type = attr.ib()
    url = attr.ib()
    
@attr.s
class Comment(object):
    id = attr.ib()
    like_count = attr.ib()
    user_likes = attr.ib()
    
@attr.s
class Like(object):
    id = attr.ib()
    liker_name = attr.ib()
    #liker_username = attr.ib()
@attr.s
class Post(object):
    id = attr.ib()
    description = attr.ib()
    type = attr.ib()
    caption = attr.ib()
    publication_date = attr.ib()
    is_hidden = attr.ib()
    message = attr.ib()
    message_tags = attr.ib()
    name = attr.ib()
    properties = attr.ib()
    author = attr.ib()
    source = attr.ib()
    story = attr.ib()
    status_type = attr.ib()
    attachements = attr.ib(default=attr.Factory(list))
    comments = attr.ib(default=attr.Factory(list))
    likes = attr.ib(default=attr.Factory(list))
    reactions = attr.ib(default=attr.Factory(list))
    
    @classmethod
    def from_record(cls, record):
        attachments = [
            Attachment(
                att.get('description', ''), 
                att.get('description_tags', []), 
                att.get('type', ''), 
                att.get('url', ''))
            for att in record.get('attachments', {}).get('data', {})
        ]
        
        comments = [
            Comment(
                c.get('like_count', 0),
                c['id'],
                c.get('user_likes', False))
            for c in record.get('comments', {}).get('data', {})
        ]
        
        likes = [
            Like(
                l['id'],
                l.get('name', l.get('username', '')))
            for l in record.get('likes', {}).get('data', {})
        ]
        
        return cls(
            record.get('id'),
            record.get('description'),
            record.get('type'),
            record.get('caption'),
            record.get('created_time'),
            record.get('is_hidden'),
            record.get('message', ''),
            record.get('message_tags', ''),
            record.get('name'),
            record.get('properties'),
            record.get('from'),
            record.get('source', ''),
            record.get('story', ''),
            record.get('status_type'),
            attachments,
            comments,
            likes,
            record.get('reactions', {})
        )
    
def initialise_db(table, fields):
    cursor = connection.cursor()
    cursor.execute('create table if not exists {} ({})'.format(table, ','.join(fields)))
    connection.commit()
    cursor.close()
def save_people(people):
    save_in_db(PEOPLE_TABLE, [(person.name, person.facebook_id) for person in people])
    
def initialise_people_table():
    initialise_db(PEOPLE_TABLE, ['name, facebook_id'])
    
def save_in_db(table, objects):
    cursor = connection.cursor()
    cursor.executemany(
        'insert into {} values ({})'.format(table, ','.join('?' * len(objects[0]))),
        objects)
    connection.commit()
    cursor.close()
if __name__ == '__main__':
    import logging
    
    logging.basicConfig()
    logger = logging.getLogger('minage')
    logger.setLevel(logging.DEBUG)
    
    logger.info('Getting authentication')
    
    token = getpass('access token from graph explorer:')
    graph = facebook.GraphAPI(access_token=token, version="2.12")
    
    me = graph.get_object(id='me')
    
    logger.info('Getting some friends of: %s ...', me['name'])
    friends = graph.get_connections(id='me', connection_name='friends')
    logger.info('Initializing people database ...')
    initialise_people_table()
    people = [
        Person(friend['name'], friend['id'])
        for friend in friends['data']
    ]
    logger.info('Saving people ...')
    save_people(people)
    explorer_url = 'me/posts?fields=description,shares,type,caption,created_time,is_hidden,message,message_tags,name,properties,application,feed_targeting,from,is_popular,source,story,status_type,via,target,attachments{description,description_tags,type,url},comments{comment_count,id,like_count,user_likes},likes{name,id,username},reactions&include_hidden=true'
    fields = explorer_url.split('=')[-2].rstrip('&include_hidden')
    
    logger.info('Retrieving all posts from friends ...')
    friends_posts = {
        (person.facebook_id, person.name): graph.get_connections(
            id=person.facebook_id, connection_name='posts', fields=fields, include_hidden=True)['data']
        for person in people
    }
    excluded = 'la le les un des de the at une à et du a que pour dans I ! est sur'
    total_word_occurences = Counter(
        word
        for posts in friends_posts.values()
        for record in posts
        for word in Post.from_record(record).message.split()
        if word not in excluded.split()
    )
    print("These are the top words used by %s's friends:" % me['name'])
    print(total_word_occurences.most_common())
That’s all folks!