Keeping Up with ArXiv using Sxolar
Overview & Motivation
Like many researchers, I use ArXiv to keep up with the latest research in my field. However, I find
it difficult to keep track of all the new papers that are posted each day. I have explored many of the existing tools
for tracking ArXiv, but I have not found one that meets my simple requirements. All I wanted was a tool that would let
me configure a set of queries, and send me period emails with the new papers that match those queries. Having found no
such tool, I decided to build my own. This post introduces sxolar
, a Python library that
allows you to search ArXiv, and shows how to use it to keep up with the latest research in your field of interest.
“All I wanted was a tool that would let me configure a set of queries, and send me period emails with the new papers that match those queries.”
Before getting into the simple but robust features of sxolar
, let’s take a look at existing tools for tracking ArXiv.
Existing Tools for Tracking ArXiv
There are several existing tools for tracking ArXiv, but they all have their limitations. Some of the most popular tools include:
- ArXiv Email Alerts: ArXiv provides an email alert service that allows you to subscribe to new papers in specific categories. However, there is no control over the other elements of the search API, including authors, keywords, etc.
- ArXiv Sanity Preserver: A web-based tool that allows you to search ArXiv and get daily email alerts for new papers in your field of interest. However, it does not allow you to configure custom queries. Instead, it uses a machine learning model to recommend papers based on your reading history.
- iArXiv: Similar to ArXiv Sanity Preserver, iArXiv is a web-based tool that allows you to search ArXiv and get daily email alerts for new papers in your field of interest. However, it does not allow you to configure custom queries. Instead, it uses a machine learning model to recommend papers based on your reading history.
- ArXiv RSS Feeds: ArXiv provides RSS feeds for each category, which you can subscribe to in your favorite RSS reader. However, this requires you to manually check the feeds each day.
- ArXiv API: ArXiv provides an API that allows you to search for papers and get metadata about them. However, this requires you to write code to interact with the API.
Introducing Sxolar
Sxolar is a Python library that allows you to search ArXiv and get daily email alerts for new
papers in your field of interest. It is designed to be simple and easy to use, with a focus on customizability and
flexibility. It offers a command-line interface, and is easy to use with GitHub actions for automated daily email
alerts. Similar to existing wrappers, sxolar
uses the ArXiv API to search for papers, and offers the ability to search
by authors, keywords, categories, and more. What makes sxolar
unique? `sxolar can:
- Search by all fields (title, abstract, authors, etc.) as well as arbitrarily complex logical expressions of these
fields (e.g.,
"title:quantum AND abstract:gravity"
). - Syntactic sugar for building complex queries (e.g.,
(Title("quantum") & Abstract("gravity")).search()
). - Persist queries to simple configuration files for easy reuse and modification.
- Send email alerts with summaries of new papers that match your queries
This library is relatively new, and I am actively developing it. If you have any feature requests or bug reports, please feel free to open an issue on the GitHub repository. I hope you find this tool useful; it has certainly made my life easier!
Setting Up Periodic Email Alerts with Sxolar
This post is focused on using sxolar
to setup a periodic email digest of ArXiv papers in your field of
interest. To see more detailed documentation on the library, please visit
the official documentation. 1 For the purpose of this post, we’ll use my field of study (
gravitational waves) to determine a sample query: searching for papers released by the LIGO or VIRGO scientific
collaborations. The package also has a tutorial for setting up a
periodic digest. 2 This post will be divided into the following steps:
- Configuration of the queries and summary
- Setup of Google Mail Access
- Scheduling with GitHub Actions
To get started with sxolar
, you will need to install the library. You can do this using pip
:
|
|
Step 1: Configure the Queries and Summary
A Summary
is a collection of Section
s, each of which represents a query with related search parameters (such as time
period). The summary info can be persisted to a config file (e.g., summary.yaml
) as shown below:
|
|
The configuration file specifies one summary “LIGO Virgo Summary” with two sections: “LIGO: Recent 2 Weeks” and “Virgo: Recent 2 Months”. Each section specifies the authors, search terms, and trailing time period for the search query. The “trailing” field specifies the number of days or months to search back from the current date.
Testing the Configuration
The summary can be generated either using python code or the command line interface. The command-line is relatively simple to use for testing a summary:
|
|
For more detailed testing, the python library can be used directly. The following code snippet demonstrates how to generate a summary from the config file using python:
|
|
Step 2: Setup Google Mail Access
To send email alerts, sxolar
uses the smtplib
library to send emails through a Gmail account. This is made possible
(and relatively secure) by using an app password. The library documentation includes instructions for setting up an app
password. To set up a Gmail account for sending emails, follow these
steps:
- Go to App Passwords.
- Enter an app name, e.g. “SampleApp”.
- Click “Create”.
- Copy the generated app password.
The generated app password is a 16-character code that you will use to authenticate your application.
App password: "abcd efgh ijkl mnop"
Specifically, we will set an environment variable in the GitHub repository to store the app password. This will allow the GitHub Actions to send emails on your behalf.
Step 3: Scheduling with GitHub Actions
To schedule the email alerts, we will use GitHub Actions. The below snippet is a sample workflow file (
.github/workflows/sxolar.yml
) that will run the sxolar
summary command weekly on Sunday at 8am EST (1pm UTC).
For a sample repository, see sxolar-template-run.
|
|
This workflow file will run the sxolar
summary command weekly on Sunday at 8am EST (1pm UTC). The command will use the
config file configs/gwaves.yml
to generate the summary, and will send the email to myrecipient@gmail.com
from
myemail@gmail.com
. The email will have the subject “Sxolar Weekly Digest: GWwaves”. The app password is stored in the
GitHub repository secrets as SXOLARGMAILAPPPASSWORD
.
Conclusion
In this post, we introduced sxolar
, a Python library that allows you to search ArXiv and get daily email alerts for
new
papers in your field of interest. We discussed the limitations of existing tools for tracking ArXiv, and showed how
sxolar
addresses these limitations. We walked through the process of setting up periodic email alerts with sxolar
,
including configuring the queries and summary, setting up Google Mail access, and scheduling with GitHub Actions. I hope
you find this tool useful for keeping up with the latest research in your field of interest. If you have any feature
requests or bug reports, please feel free to open an issue on
the GitHub repository. Happy reading!
References
Sxolar: Scholars tools for ArXiv: Documentation ↩︎