Converting Bandcamp Email Updates to an RSS Feed

I love music. I like to closely track bands and labels to learn about upcoming releases. Unfortunately, the type of music I listen to tends to stay more underground, so there isn’t any mainstream coverage of it, which just means I get to do all the work myself.

Fortunately, a majority of the bands and labels I am interested in are on bandcamp. Bandcamp makes it pretty easy to subscribe to artists and labels and get updates from them directly. Unfortunately, this all comes through email. While it is easy enough to filter to avoid cluttering my inbox, email isn’t my desired place for this information.

I run a private freshrss instance and curate several hundred RSS feeds. It syncs nicely between my various devices, and is just a great way to consume. My goal is to get all my music updates from bandcamp here, so that’d they be in the same place as where I get music reviews and follow some other underground music blogs.

Bandcamp doesn’t provide any rss feeds. There are some options, like using RSSHub and its bandcamp addon. However, many of the updates that come through email are private, and don’t show up in an artists’ or labels’ feed.

I decided, I’d go a different route, and just convert the emails I was getting into an RSS feed directly.

Step 1 – imapfilter

I already use imapfilter heavily for filtering and tagging my emails. I decided I could use it to pipe the bandcamp emails to a custom script that would convert it into an rss entry. Here is the relevant section:

function filter_bandcamp(messages)
   -- Grab anything from Bandcamp where the subject
   --  starts with "New". This indicates a message
   --  or updated rather than a receipt.
   -- I am taking these "mailing list" bandcamps
   --  messages and converting them to an rss feed.
   results = messages:contain_from('noreply@bandcamp.com') *
             messages:contain_subject('New')

   for _, mesg in ipairs(results) do
      mbox, uid = table.unpack(mesg)
      text = mbox[uid]:fetch_message()
      pipe_to('/opt/email2rss/email2rss', text)
   end

   -- delete the bandcamp messages
   results:delete_messages()

   return messages;
end

You’ll notice I don’t want all messages from “noreply@bandcamp.com”, since that would also include things like purchases and system notifications.

Step 2 – email2rss script

The email2rss script is a python program. I am using feedgen to generate the feed, and built in python libraries to parse the emails.

One issue you’ll notice immediately, is that this script runs once for every email message. For RSS, we want to create a continuous feed with all the entries. This means we have to insert the new entry into an existing file and have some persistence. The quickest/dirtiest method was to use python’s builtin pickle to serialize and deserialize the whole state. That way, I can quickly load the previous state, create a new entry, write out the rss file, then serialize and save the state to disk.

Here is the program in its entirety:

#!/usr/bin/env python3
#

import sys
import email.parser
import datetime
import pickle
import re
import os
from feedgen.feed import FeedGenerator

DATADIR=os.path.join('/', 'opt', 'email2rss', 'data')

def default_feed():
  fg = FeedGenerator()
  fg.id('https://feeds.line72.net/')
  fg.title('Bandcamp Updates')
  fg.description('Bandcamp Updates')
  fg.link(href = 'https://feeds.line72.net')
  fg.language('en')

  return fg

def get_feedgenerator():
  try:
    with open(os.path.join(DATADIR, 'feed.obj'), 'rb') as f:
      return pickle.load(f)
  except IOError:
    return default_feed()

def save_feedgenerator(fg):
  with open(os.path.join(DATADIR, 'feed.obj'), 'wb') as f:
    pickle.dump(fg, f)

def add_item(fg, msg, content):
  msg_id_header = msg.get('Message-ID')
  msg_id = re.match(r'^\<(.*?)@.*$', msg_id_header).group(1)
  sender = msg.get('From')
  subject = msg.get('Subject')

  fe = fg.add_entry()
  fe.id(f'https://feeds.line72.net/{msg_id}')
  fe.title(subject)
  fe.author(name = 'Bandcamp', email = 'noreply@bandcamp.com')
  fe.pubDate(datetime.datetime.utcnow().astimezone())
  fe.description(subject)
  fe.content(content, type = 'CDATA')

def go():
  fg = get_feedgenerator()

  parser = email.parser.Parser()
  msg = parser.parse(sys.stdin)
  for part in msg.walk():
    if part.get_content_type() == 'text/html':
      add_item(fg, msg, part.get_payload())
      break

  fg.rss_file(os.path.join(DATADIR, 'feed.rss'), pretty = True)

  save_feedgenerator(fg)

if __name__ == '__main__':
  go()

Step 3 – Host the Feed

I just have a simple nginx server running that hosts the feed.rss. Then I simply add this new feed to my freshrss instance.

Future Work

There are still some improvements that could be made to this. The history is going to grow out of control at some point, so I really should probably go through and delete old entries, maybe to keep a history of 100 or 500 entries.

The other possible issue (I haven’t run into it yet), is that the email2rss could be run simultaneously. If that is the case, then one entry will likely be lost. I really should probably have a lock around the feed.obj to keep a second instance from doing anything until the first was written out the new state.