Automatically downloading emails and extracting their attachments
I have an all-in-one printer that's also a scanner - specifically the Epson Ecotank 4750 (though annoyingly the automated document feeder doesn't support duplex). While it's a great printer (very eco-friendly, and the inks last for ages!), my biggest frustration with it is that it doesn't scan directly to an SMB file share (i.e. a Windows file share). It does support SANE though, which allows you to use it through a computer.
This is ok, but the ability to scan directly from the device itself without needing to use a computer was very convenient, so I set out to remedy this. The printer does have a cloud feature they call "Epson Connect", which allows one to upload to various cloud services such as Google Drive and Box, but I don't want to upload potentially sensitive data to such services.
Fortunately, there's a solution at hand - email! The printer in question also supports scanning to a an email address. Once the scanning process is complete, then it sends an email to the preconfigured email address with the scanned page(s) attached. It's been far too long since my last post about email too, so let's do something about that.
Logging in to my email account just to pick up a scan is clunky and annoying though, so I decided to automate the process to resolve the issue. The plan is as follows:
- Obtain a fresh email address
- Use IMAP IDLE to instantly download emails
- Extract attachments and save them to the output directory
- Discard the email - both locally and remotely
As some readers may be aware, I run my own email server - hence the reason why I wrote this post about email previously, so I reconfigured it to add a new email address. Many other free providers exist out there too - just make sure you don't use an account you might want to use for anything else, since our script will eat any emails sent to it.
Steps 2, 3, and 4 there took some research and fiddling about, but in the end I cooked up a shell script solution that uses fetchmail
, procmail
(which is apparently unmaintained, so I should consider looking for alternatives), inotifywait
, and munpack
. I've also packaged it into a Docker container, which I'll talk about later in this post.
To illustrate how all of these fit together, let's use a diagram:
fetchmail
uses IMAP IDLE to hold a connection open to the email server. When it receives notification of a new email, it instantly downloads it and spawns a new instance of procmail
to handle it.
procmail
writes the email to a temporary directory structure, which a separate script is watching with inotifywait
. As soon as procmail
finishes writing the new email to disk, inotifywait
triggers and the email is unpacked with munpack
. Any attachments found are moved to the output directory, and the original email discarded.
With this in mind, let's start drafting up a script. The first order of the day is configuring fetchmail
. This is done using a .fetchmailrc
file - I came up with this:
poll bobsrockets.com protocol IMAP port 993
user "[email protected]" with pass "PASSWORD_HERE"
idle
ssl
...where [email protected]
is the email address you want to watch, bobsrockets.com
is the domain part of said email address (everything after the @
), and PASSWORD_HERE
is the password required to login.
Save this somewhere safe with tight file permissions for later.
The other configuration file we'll need is one for procmail
. let's do that one now:
CORRECTHOME=/tmp/maildir
MAILDIR=$CORRECTHOME/
:0
Mail/
Replace /tmp/maildir
with the temporary directory you want to use to hold emails in. Save this as procmail.conf
for later too.
Now we have the mail config files written, we need to install some software. I'm using apt
on Debian (a minideb Docker container actually), so you'll need to adapt this for your own system if required.
sudo apt install ca-certificates fetchmail procmail inotify-tools mpack
# or, if you're using minideb:
install_packages ca-certificates fetchmail procmail inotify-tools mpack
fetchmail
is for some strange reason extremely picky about the user account it runs under, so let's update the pre-created fetchmail
user account to make it happy:
groupadd --gid 10000 fetchmail
usermod --uid 10000 --gid 10000 --home=/srv/fetchmail --uid=10000 --gi=10000 fetchmail
chown fetchmail:fetchmail /srv/fetchmail
fetchmail
now needs that config file we created earlier. Let's update the permissions on that:
chmod 10000:10000 path/to/.fetchmailrc
If you're running on bare metal, move it to the /srv/fetchmail
directory now. If you're using Docker, keep reading, as I recommend that this file is mounted using a Docker volume to make the resulting container image more reusable.
Now let's start drafting a shell script to pull everything together. Let's start with some initial setup:
#!/usr/bin/env bash
if [[ -z "${TARGET_UID}" ]]; then
echo "Error: The TARGET_UID environment variable was not specified.";
exit 1;
fi
if [[ -z "${TARGET_GID}" ]]; then
echo "Error: The TARGET_GID environment variable was not specified.";
exit 1;
fi
if [[ "${EUID}" -ne 0 ]]; then
echo "Error: This Docker container must run as root because fetchmail is a pain, and to allow customisation of the target UID/GID (although all possible actions are run as non-root users)";
exit 1;
fi
dir_mail_root="/tmp/maildir";
dir_newmail="${dir_mail_root}/Mail/new";
target_dir="/mnt/output";
fetchmail_uid="$(id -u "fetchmail")";
fetchmail_gid="$(id -g "fetchmail")";
temp_dir="$(mktemp --tmpdir -d "imap-download-XXXXXXX")";
on_exit() {
rm -rf "${temp_dir}";
}
trap on_exit EXIT;
log_msg() {
echo "$(date -u +"%Y-%m-%d %H:%M:%S") imap-download: $*";
}
This script will run as root
, and fetchmail
runs as UID 10000
and GID 10000
, The reasons for this are complicated (and mostly have to do with my weird network setup). We look for the TARGET_UID
and TARGET_GID
environment variables, as these define the uid:gid we'll be setting files to before writing them to the output directory.
We also determine the fetchmail UID/GID dynamically here, and create a second temporary directory to work with too (the reasons for which will become apparent).
Before we continue, we need to create the directory procmail
writes new emails to. Not because procmail
won't create it on its own (because it will), but because we need it to exist up-front so we can watch it with inotifywait
:
mkdir -p "${dir_newmail}";
chown -R "${fetchmail_uid}:${fetchmail_gid}" "${dir_mail_root}";
We're running as root
, but we'll want to spawn fetchmail
(and other things) as non-root users. Technically, I don't think you're supposed to use sudo
in non-interactive scripts, and it's also not present in my Docker container image. The alternative is the setpriv
command, but using it is rather complicated and annoying.
It's more powerful than sudo
, as it allows you to specify not only the UID/GID a process runs as, but also the capabilities the process will have too (e.g. binding to low port numbers). There's a nasty bug one has to work around if one is using Docker too, so given all this I've written a wrapper function that abstracts all of this complexity away:
# Runs a process as another user.
# Ref https://github.com/SinusBot/docker/pull/40
# $1 The UID to run the process as.
# $2 The GID to run the process as.
# $3-* The command (including arguments) to run
run_as_user() {
run_as_uid="${1}"; shift;
run_as_gid="${1}"; shift;
if [[ -z "${run_as_uid}" ]]; then
echo "run_as_user: No target UID specified.";
return 1;
fi
if [[ -z "${run_as_gid}" ]]; then
echo "run_as_user: No target GID specified.";
return 2;
fi
# Ref https://github.com/SinusBot/docker/pull/40
# WORKAROUND for `setpriv: libcap-ng is too old for "all" caps`, previously "-all" was used here
# create a list to drop all capabilities supported by current kernel
cap_prefix="-cap_";
caps="$cap_prefix$(seq -s ",$cap_prefix" 0 "$(cat /proc/sys/kernel/cap_last_cap)")";
setpriv --inh-caps="${caps}" --reuid "${run_as_uid}" --clear-groups --regid "${run_as_gid}" "$@";
return "$?";
}
With this in hand, we can now wrap fetchmail
and procmail
in a function too:
do_fetchmail() {
log_msg "Starting fetchmail";
while :; do
run_as_user "${fetchmail_uid}" "${fetchmail_gid}" fetchmail --mda "/usr/bin/procmail -m /srv/procmail.conf";
exit_code="$?";
if [[ "$exit_code" -eq 127 ]]; then
log_msg "setpriv failed, exiting with code 127";
exit 127;
fi
log_msg "Fetchmail exited with code ${exit_code}, sleeping 60 seconds";
sleep 60
done
}
In short this spawns fetchmail
as the fetchmail
user we configured above, and also restarts it if it dies. If setpriv
fails, it returns an exit code of 127 - so we catch that and don't bother trying again, as the issue likely needs manual intervention.
To finish the script, we now need to setup that inotifywait
loop I mentioned earlier. Let's setup a shell function for that:
do_attachments() {
while :; do # : = infinite loop
# Wait for an update
# inotifywait's non-0 exit code forces an exit for some reason :-/
inotifywait -qr --event create --format '%:e %f' "${dir_newmail}";
# Process new email here
done
}
Processing new emails is not particularly difficult, but requires a sub loop because:
- More than 1 email could be written at a time
- Additional emails could slip through when we're processing the last one
while read -r filename; do
# Process each email
done < <(find "${dir_newmail}" -type f);
Finally, we need to process each email we find in turn. Let's outline the steps we need to take:
- Move the email to that second temporary directory we created above (since the
procmail
directory might not be empty) - Unpack the attachments
chown
the attach
Let's do this in chunks. First, let's move it to the temporary directory:
log_msg "Processing email ${filename}";
# Move the email to a temporary directory for processing
mv "${filename}" "${temp_dir}";
The filename
environment variable there is the absolute path to the email in question, since we used find
and passed it an absolute directory to list the contents of (as opposed to a relative path).
To find the filepath we moved it to, we need to do this:
filepath_temp="${temp_dir}/$(basename "${filename}")"
This is important for the next step, where we unpack it:
# Unpack the attachments
munpack -C "${temp_dir}" "${filepath_temp}";
Now that we've unpacked it, let's do a bit of cleaning up, by deleting the original email file and the .desc
description files that munpack
also generates:
# Delete the original email file and any description files
rm "${filepath_temp}";
find "${temp_dir}" -iname '*.desc' -delete;
Great! Now we have the attachments sorted, now all we need to do is chown
them to the target UID/GID and move them to the right place.
chown -R "${TARGET_UID}:${TARGET_GID}" "${temp_dir}";
chmod -R a=rX,ug+w "${temp_dir}";
I also chmod
the temporary directory too to make sure that the permissions are correct, because otherwise the mv
command is unable to read the directory's contents.
Now to actually move all the attachments:
# Move the attachment files to the output directory
while read -r attachment; do
log_msg "Extracted attachment ${attachment}";
chmod 0775 "${attachment}";
run_as_user "${TARGET_UID}" "${TARGET_GID}" mv "${attachment}" "${target_dir}";
done < <(find "${temp_dir}" -type f);
This is rather overcomplicated because of an older design, but it does the job just fine.
With that done, we've finished the script. I'll include the whole script at the bottom of this post.
Dockerification
If you're running on bare metal, then you can skip to the end of this post. Because I have a cluster, I want to be able to run this thereon. Since said cluster works with Docker containers, it's natural to Dockerise this process.
The Dockerfile for all this is surprisingly concise:
(Can't see the above? View it on my personal Git server instead)
To use this, you'll need the following files alongside it:
procmail.conf
file from above- The Bash script developed above as the filename
run.sh
(link to it on my personal Git server)
It exposes the following Docker volumes:
/mnt/fetchmailrc
: The fetchmailrc file/mnt/output
: The target output directory
All these files can be found in this directory on my personal Git server.
Conclusion
We've strung together a bunch of different programs to automatically download emails and extract their attachments. This is very useful as for ingesting all sorts of different files. Things I haven't covered:
- Restricting it to certain source email addresses to handle spam
- Restricting the file types accepted (the
file
command is probably your friend) - Disallowing large files (most 3rd party email servers do this automatically, but in my case I don't have a limit that I know of other than my hard disk space)
As always, this blog post is both a reference for my own use and a starting point for you if you'd like to do this for yourself.
If you've found this useful, please comment below! I find it really inspiring / motivating to learn how people have found my posts useful and what for.
Sources and further reading
run.sh
scriptDockerfile
- Directory on my personal Git server
setpriv
bug that keeps hanging around in my infrastructure like a bad smell- EmbedBox, which I used for the embeds in this post
run.sh script
(Can't see the above? Try a this link, or alternatively this one (bash))