Getting Reddit comments using PRAW
In this section, you'll learn to use praw
for extracting comments from a given Reddit thread. You'll also see how to fetch only the top level comments.
From pypi: praw:
PRAW, an acronym for "Python Reddit API Wrapper", is a Python package that allows for simple access to Reddit's API. PRAW aims to be easy to use and internally follows all of Reddit's API rules. With PRAW there's no need to introduce sleep calls in your code. Give your client an appropriate user agent and you're set.
From wikipedia: API:
In computing, an application programming interface (API) is an interface that defines interactions between multiple software applications or mixed hardware-software intermediaries. It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionality in various ways and to varying degrees. An API can be entirely custom, specific to a component, or designed based on an industry-standard to ensure interoperability. Through information hiding, APIs enable modular programming, allowing users to use the interface independently of the implementation.
Installation
You can install praw using the following commands:
# virtual environment
$ pip install praw
# normal environment
# use py instead of python3.9 for Windows
$ python3.9 -m pip install --user praw
I'd highly recommend using virtual environments to manage projects that use third party modules. See Installing modules and Virtual environments chapter from my Python introduction ebook if you are not familiar with installing modules.
Reddit app
First login to your Reddit account. Next, visit https://www.reddit.com/prefs/apps/ and click the are you a developer? create an app... button.
For this project, using the script option is enough. Two of the fields are mandatory:
- name
- redirect uri
The redirect uri isn't needed for this particular project though. As mentioned in Reddit's OAuth2 Quick Start Example guide, http://www.example.com/unused/redirect/uri
can be used instead.
After filling the details, you'll get a screen with details about the app, which you can update if needed. If applicable, you'll also get an email from Reddit.
Extracting comments
This section will give you an example of extracting comments from a particular discussion thread on Reddit. The code used is based on the Comment Extraction and Parsing tutorial from the documentation, which also informs that:
If you are only analyzing public comments, entering a username and password is optional.
The sample discussion thread used here is from the /r/booksuggestions subreddit. You can use this URL in the code or just the nsm98m
id.
From the app you created in the previous section, you need to copy client_id
and client_secret
details. You'll find the id at the top of the app details (usually 14 characters) and the secret field is clearly marked. With those details collected, here's how you can get all the comments:
>>> import praw
>>> reddit = praw.Reddit(
... user_agent="Get Comments by /u/name", #change 'name' to your username
... client_id="XXX", #change 'XXX' to your id
... client_secret="XXX", #change 'XXX' to your secret
... )
# use url keyword argument if you want to pass a link instead of id
>>> submission = reddit.submission(id='nsm98m')
>>> submission.comments.replace_more(limit=None)
[]
# only first comment output is shown here
>>> for comment in submission.comments.list():
... print(comment.body + '\n')
...
The Murder of Roger Ackroyd by Agatha Christie still has the
best twist I’ve ever read.
Use submission.comments
instead of submission.comments.list()
in the above for
loop to fetch only the top level comments.
API secrets
As mentioned in Reddit's OAuth2 Quick Start Example guide:
You should NEVER post your client secret (or your reddit password) in public. If you create a bot, you should take steps to ensure that the bot's password and the app's client secret are secured against digital theft.
To avoid accidentally revealing API secrets online (publishing your code on GitHub for example), one way is to store them in a secrets file locally. Such a secrets filename should be part of the .gitignore
file so that it won't get committed to the GitHub repo.