Every year I receive a PDF document from our accountants containing tax forms (P11D) for our employees. Splitting that up and emailing it on by hand is tedious, and I live in fear of sending the wrong document to someone so I’ve automated the process.
The key steps are to split the original file into one for each individual, rename those individual files after the relevant employee, and then email the file to that person.
Splitting uses a python script to divide the original PDF into two page chunks, as each individual form occupies two pages. I use PyPDF2 to do this, it can be installed using pip.
#!/usr/bin/env python3 from PyPDF2 import PdfFileWriter, PdfFileReader import re def main(): inputpdf = PdfFileReader(open("p11d-2017.pdf", "rb")) for i in range(0, inputpdf.numPages, 2): output = PdfFileWriter() output.addPage(inputpdf.getPage(i)) secondPage = inputpdf.getPage(i+1) output.addPage(secondPage) with open("%s-p11d.pdf" % i, "wb") as outputStream: output.write(outputStream) if __name__ == '__main__': main()
I had originally intended to name the files after the individual concerned, finding the name in the extractText method of PyPDF2, but this unfortunately fails to find the text in our PDFs and is documented as being unreliable. To resolve this I use a second stage, based on pdftotext. Using this I’ve written a shell script rename.sh which takes the name of a file to rename, extracts the text and searches that for the name of the relevant employee. Files are then copied to a new directory and named accordingly.
#!/usr/bin/env bash file=$1 name=$(pdftotext $file - | grep Name: | cut -f2 -d: | xargs) cp $file "$name.pdf"
All that’s then left to do is to email the files to their owners. The file to email mapping is critical: you don’t want to email the wrong file to someone. Copious dry runs emailing only me, then a dry run emailing the recipient, not actually including the file help to provide reassurance.
import smtplib from email.mime.text import MIMEText from email.mime.multipart import MIMEMultipart from email.mime.application import MIMEApplication employees = { "Fred Bloggs" : "fred@example.com", "Joe Smith" : joe@example.com } me = 'sender@example.com' pwd = "elided" # This needs to be an application specific password from gmail as we have 2FA enabled. s = smtplib.SMTP_SSL('smtp.gmail.com', 465) s.ehlo() s.login(me, pwd) p11ds = employees.keys() for p11d in p11ds: # Create the container (outer) email message. msg = MIMEMultipart() msg['Subject'] = 'P11D 2017' body = MIMEText(" Here's your P11D for 2017. No more excuses for not doing your tax return!\n\nGiles") msg.attach(body) you = employees[p11d] p11dName = "%s.pdf" % p11d with open(p11dName, 'rb') as fp: pdf = MIMEApplication(fp.read()) pdf.add_header('Content-Disposition', 'attachment', filename= p11dName) msg.attach(pdf) # This is the critical line! Drop for dry runs s.sendmail(me, you, msg.as_string()) s.quit()
As I’ve mentioned before, I use BBDB to maintain a simple employee database. This allows me to grab the name and email address data in the employees dictionary from BBDB using a custom record layout.
(add-to-list 'bbdb-layout-alist '(short-email
(order mail)
(primary . t)
(toggle . t)))
(defun bbdb-display-record-short-email (record layout fields)
(let ((copy (copy-sequence record)))
(bbdb-record-set-field copy 'organization '(""))
(bbdb-display-record-one-line copy
layout
fields)))
There’s a bug in the BBDB 3.1.2 documentation for bbdb-layout-alist. It claims that “When you add a new layout FOO, you can write a corresponding layout function `bbdb-display-record-layout-FOO’”. Actually the corresponding function should be bbdb-display-record-FOO. Most of the hard work of my custom display format is delegated to the built in bbdb-display-record-one-line, but I erase the organization field from the record passed as there doesn’t seem to be a more elegant way of preventing it from being displayed.
Pressing *t in the *BBDB* buffer then toggles through the available layouts of all displayed records. Once short-email is displayed buffer can be copied and then a simple edit produces the required format for the employees structure.