Getting Reddit comments using PRAW

In this section, you'll learn to use the praw module for extracting comments from a given Reddit thread. You'll also see how to fetch only the top level comments.

From pypi: praw:

PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's API. PRAW aims to be easy to use and internally follows all of Reddit's API rules. With PRAW there's no need to introduce sleep calls in your code. Give your client an appropriate user agent and you're set.

From wikipedia: API:

An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.

Installation

You can install praw using the following commands:

# virtual environment
$ pip install praw

# normal environment
# use py instead of python3.13 for Windows
$ python3.13 -m pip install --user praw

I'd highly recommend using virtual environments to manage projects that use third party modules. See Installing modules and Virtual environments chapter from my Python introduction ebook if you are not familiar with installing modules.

Reddit app

First login to your Reddit account. Next, visit https://www.reddit.com/prefs/apps/ and click the are you a developer? create an app... button.

For this project, using the script option is enough. Two of the fields are mandatory:

name
redirect uri

The redirect uri isn't needed for this particular project though. As mentioned in Reddit's OAuth2 Quick Start Example guide, http://www.example.com/unused/redirect/uri can be used instead.

After filling the details, you'll get a screen with details about the app, which you can update if needed. If applicable, you'll also get an email from Reddit.

This section will give you an example of extracting comments from a particular discussion thread on Reddit. The code used is based on the Comment Extraction and Parsing tutorial from the documentation, which also informs that:

If you are only analyzing public comments, entering a username and password is optional.

The sample discussion thread used here is from the /r/booksuggestions subreddit. You can use this URL in the code or just the nsm98m id.

From the app you created in the previous section, you need to copy the client_id and client_secret details. You'll find the id at the top of the app details (usually 14 characters) and the secret field is clearly marked. With those details collected, here's how you can get all the comments:

>>> import praw

>>> reddit = praw.Reddit(
...     user_agent="Get Comments by /u/name",  #change 'name' to your username
...     client_id="XXX",                       #change 'XXX' to your id
...     client_secret="XXX",                   #change 'XXX' to your secret
... )

# use the url keyword argument if you want to pass a link instead of id
>>> submission = reddit.submission(id='nsm98m')
>>> submission.comments.replace_more(limit=None)
[]
# all comments are saved in a list here for illustration purposes
>>> comments = submission.comments.list()
# content of the first comment
>>> print(comments[0].body)
The Murder of Roger Ackroyd by Agatha Christie still has the
best twist I’ve ever read.
# fourth comment and so on
>>> comments[3].body
'The Silent Patient'

Use submission.comments instead of submission.comments.list() to fetch only the top level comments.

API secrets

As mentioned in Reddit's OAuth2 Quick Start Example guide:

You should NEVER post your client secret (or your reddit password) in public. If you create a bot, you should take steps to ensure that the bot's password and the app's client secret are secured against digital theft.

To avoid accidentally revealing API secrets online (publishing your code on GitHub for example), one way is to store them in a secrets file locally. Such a secrets filename should be part of the .gitignore file so that it won't get committed to the GitHub repo.

Practice Python Projects

Getting Reddit comments using PRAW

Installation

Reddit app

Extracting comments

API secrets