mark :: blog :: fedora

01 Apr 2023: Searching my email attachments with Ponymail and Tika

Moving my email archives to Ponymail went well

One feature I forgot that zoe had was how it indexed some attachments. If you have an email with a PDF attachment and that PDF attachment had plain text in it (it wasn't just a scan) then you could search on words in the PDF. That's super handy. Ponymail doesn't do that, in fact you don't get to search on any text in attachments, even if they are plain text (or things like patches). Let's fix that!

Remember how I said whenever I need some code I first look if there is an Apache community that has a project that does something similar? Well Apache Tika is an awesome project that will return the plain text of pretty much whatever you throw at it. PDF? sure. Patches? definitely. Word docs? yup. Images? yes. Wait, images? so Tika will go and use Tesseract and do an OCR of an image.

Okay, so let's add a field to the mbox index, attachmenttext, populate it with Tika, and search on it. For now if some text in an attachment matches your search query you'll see the result, but you won't know exactly where the text appears (perhaps later it could highlight which attachment it appears in).

I wrote a quick Python script that runs through all emails in Ponymail (or some search query subset), and if they have attachments runs all the attachments through Apache Tika, storing the plain texts in the attachmenttext field. We ignore anything that's already got something in that field, so we can just run this periodically rather than on import. Then a one-line patch to Ponymail also searches the attachmenttext field. 40,000 attachments and two hours later, it was all done and working.

It's not ready for a PR yet; probably for Ponymail upstream we'd want the option of doing this at import, although I chose not too so we can be deliberately careful as parsing untrusted attachments is risky

So there we have it; a way to search your emails including inside most attachments, outside the cloud, using Open Source projects and a few little patches.

28 Mar 2023: How Ponymail helped me keep my email archive searchable but out of the cloud.

I have a lot of historical personal email, back as far as 1991, and from time to time there's a need to find some old message. Although services like GMail would like to you keep all your mail in their cloud and pride themselves on searching, I'd rather keep my email archive offline and encrypted at rest unless there's some need to do a search. Indeed I always use Google takeout every month to remove all historic GMail messages. Until this year I used a tool called Zoe for allowing searchable email archives. You can import your emails, it uses Apache Lucene as a back end, and gives you a nice web based interface to find your mails. But Zoe has been unmaintained for over a decade and has mostly vanished from the net. It was time to replace it.

Whenever I need some open source project my first place to look is if there is an Apache Software Foundation community with a project along the same lines. And the ASF is all about communities communicating over Email, so not only is there an ASF project with a solution, but that project is used to provide the web interface for all the archived ASF mailing lists too. "Ponymail Foal" is the project and lists.apache.org is where you can see it running. (Note that the Ponymail website refers to the old version of Pony Mail before "Foal")

Internally the project is mostly Python, HTML and Javascript, using Python scripts to import emails into elasticsearch, so it's really straightforward to get up and running following the project instructions.

So I can just import my several hundred thousand email messages I have in random text mbox format files and be done? Well, nearly. It almost worked but it needed a few tweaks:

Ponymail wasn't able to parse a fair number of email messages. Analysing the mails led to only three root causes of mails not being able to be imported:

Bad "Content-Type" headers. Even my bank gets this wrong with the header Content-Type: text/html; charset="utf-8 charset=\"iso-8859-1\"". I just made the code ignore similar bad headers and try the fallbacks. Patch here

Messages with no text or HTML body and no attachments. These are fairly common for example a calendar entry might be sent as "Content-Type: text/calendar". I just made it so that if there is no recognised body it just uses whatever the last section it found was, regardless of content type. Patch here

Google Chat messages from many years ago. These have no useful anything, no body, no to: no message id, no return address. Rather than note them as failures I use made the code ignore them completely. Since this is just a warning, no upstream patch prepared.

Handling List-Id's. Ponymail likes to sort mails by the List-Id which makes a lot of sense where you have the thousands of Apache lists. But with personal email, and certainly when you subscribe to various newsletters, or get bills, or spam that got into the archives then you end up with lots of list id's that are only used once or twice or are not useful. Working on open source projects there's lots of lists that I'm on that I want the email to get archived, but it would be nice if it was separated out in the Ponymail UI. So really I needed the ability to have an 'allow list' of list id's that I want to have separate, with everything else defaulting to a generic list id (being my email address where all those mails came into). Patch here

HTML email. Where an email contains only HTML and no text version then Ponymail will make and store a text conversion of the HTML, but sometimes, especially those pesky bank emails, it's useful to be able to see the HTML with all the embedded images. Displaying HTML email in HTML isn't really a goal for the project, especially since you have to be really careful you don't end up parsing untrusted javascript for example. And you might not want all those tracking images to suddenly start getting pinged. But I'd really like a button that you could use on selected emails to display them in HTML. Fortunately Ponymail stores a complete raw copy of the email, any my proof-of-concept worked, so this can be easy to add in the future.

Managing a personal email archive can be a daunting task especially with the volume of email correspondence. However, with Ponymail, it's possible to take control of your email archive, keep it local and secure, and search through it quickly and efficiently using the power of ElasticSearch.

21 Feb 2012: Enterprise Linux 5.7 to 5.8 risk report

Red Hat Enterprise Linux 5.8 was released today (February 2012), seven months since the release of 5.7 in July 2011. So let's use this opportunity to take a quick look back over the vulnerabilities and security updates made in that time, specifically for Red Hat Enterprise Linux 5 Server.

Red Hat Enterprise Linux 5 is coming up to its fifth year since release, and is supported for another five years, until 2017.

Errata count

The chart below illustrates the total number of security updates issued for Red Hat Enterprise Linux 5 Server if you had installed 5.7, up to and including the 5.8 release, broken down by severity. It's split into two columns, one for the packages you'd get if you did a default install, and the other if you installed every single package (which is unlikely as it would involve quite a bit of manual effort to select every one). For a given installation, the number of package updates and vulnerabilities that affected you will depend on exactly what packages you have installed or removed.

Number of security errata between
5.7 and 5.8

So, for a default install, from release of 5.7 up to and including 5.8, we shipped 42 advisories to address 118 vulnerabilities. 4 advisories were rated critical, 13 were important, and the remaining 25 were moderate and low.

Or, for all packages, from release of 5.7 up to and including 5.8, we shipped 71 advisories to address 177 vulnerabilities. 7 advisories were rated critical, 16 were important, and the remaining 48 were moderate and low.

Critical vulnerabilities

The 7 critical advisories addressed 20 critical vulnerabilities across 4 different packages:

An update to OpenJDK 6 Java Runtime Environment, (October 2011) where a web site hosting a malicious Java applet could potentially run arbitrary code as the user.

An update to the MIT krb5 telnet daemon (December 2011) where a remote attacker who can access the telnet port of a target machine could use this flaw to execute arbitrary code as root. Note that the krb5 telnet daemon is not installed or enabled by default, and the default firewall rules block remote access to the telnet port. This flaw did not affect the more commonly used telnet daemon distributed in the telnet-server package.

Updates to PHP and PHP 5.3 (February 2012) where a remote attacker could send a specially-crafted HTTP request to cause the PHP interpreter to crash or, possibly, execute arbitrary code. This flaw was caused by the fix for CVE-2011-4885.

Three updates to Firefox (August 2011, September 2011, November 2011) where a malicious web site could potentially run arbitrary code as the user running Firefox.

Updates to correct 19 out of the 20 critical vulnerabilities were available via Red Hat Network either the same day or the next calendar day after the issues were public. The update to krb5 took 2 calendar days because it was public on Christmas day.

Overall, for Red Hat Enterprise Linux 5 since release until 5.8, 98% of critical vulnerabilities have had an update available to address them available from the Red Hat Network either the same day or the next calendar day after the issue was public.

Other significant vulnerabilities

Although not in the definition of critical severity, also of interest during this period were a couple of remote denial of service flaws that were easily exploitable:

A flaw in BIND, CVE-2011-4313, fixed by RHSA-2011:1458 (bind) and RHSA-2011:1459 (bind97). A remote attacker could use this flaw to cause BIND to crash.

A flaw in Apache HTTP Server, CVE-2011-3192, fixed by RHSA-2011:1245. A remote attacker could use this flaw to cause httpd to use an excessive amount of memory and CPU time.

In addition, updates to Firefox, NSS, and Thunderbird were made to blacklist a compromised Certificate Authority.

Previous update releases

To compare these statistics with previous update releases we need to take into account that the time between each update release is different. So looking at a default installation and calculating the number of advisories per month gives the following chart:

Errata per month for each update release

This data is interesting to get a feel for the risk of running Enterprise Linux 5 Server, but isn't really useful for comparisons with other major versions, distributions, or operating systems -- for example, a default install of Red Hat Enterprise Linux 4AS did not include Firefox, but 5 Server does. You can use our public security measurement data and tools, and run your own custom metrics for any given Red Hat product, package set, timescales, and severity range of interest.

See also: 5.7, 5.6, 5.5, 5.4, 5.3, 5.2, and 5.1 risk reports.

31 Dec 2011: Making of the SONIK Gravitation music video

The inspiration for the Sonik video for Gravitation came from a local friend of ours, a talented and world-renowned photographer, Adrian Brannan. Ade is famous for his analogue photo collages (please give him a 'like' on his Facebook page):

We often wondered how the same effect would look if rendered with video. With video you've got the extra element of time, each segment of the mosaic can be running from a different starting point, with a different speed, and even a different direction. In addition the segments themselves can move over time. Would this end up with an effect that was just too much of a mess? Or would it give an effect that helps visualise the consequence of spacetime?

We started by taking several videos at three different locations over the period of a year with a Kodak Zi8 camera. A motorway bridge over the M74, just outside the Buchanan shopping center in Glasgow, and a bench in Strathclyde park. Lining up the images was done roughly by using lines drawn on acetate stuck over the camera screen.

The software to do the mosaic effect was hand-written. We used a simple scripting language, Perl, and the image library GD. On a relatively modern Linux PC running Fedora 16 we can render near real-time 720p HD even when handling 300 segments of mosaic. A simple language controls which parts of the screen come from which video, and the first half of the music video uses this with simple effects having just a few boxes overlayed:

Later in the video things get more complicated, using randomisation to pick the location and movement of each segment:

We used our scripts to create a number of ~13 second segments, then put them all together using kdenlive. The intro and outro were taken from a different video from a hotel room in London Victoria; the intro using a 'miniature' effect, and outro using the randomised segments applied to a single video.

The Perl script and a 5 frame example is available to download: 2011-sonik-vid-example.tar.bz2 (1.4M)

Watch the full video, or click through to YouTube to see it in HD:

18 Nov 2011: Vulnerability Acknowledgements for Red Hat online services

When we get notified of a security issue affecting a Red Hat product in advance we give an acknowledgement in the security advisory and in our CVE database.

We've now created a page to give acknowledgements to the companies and individuals that report issues in our online services, such as finding a cross-site scripting flaw in a Red Hat web site, or a vulnerability in OpenShift.

11 Nov 2011: When do we push most advisories?

We pushed an update to Flash Player for Red Hat Enterprise Linux Supplementary today, on a Friday, because it fixed Critical vulnerabilities. But we try not to push updates on a Friday unless they are critical and already public.

So let's take a look at the most common times and days we push advisories for Red Hat Enterprise Linux 4, 5, and 6 (including Supplementary) using a heatmap:

heatmap

The more advisories pushed for a given date and hour, the darker that section of the graph is. So the most popular times for pushing advisories are Tuesdays at 10am and 2pm Eastern US time, Fridays are pretty light for pushes, and there was nothing during the weekends. The spread of the graph shows that we push advisories when they are ready, rather than waiting to a fixed day and time, in order to reduce the risk to users.

All the data used to create this graph is available as part of our public metrics. Thanks to Sami Kerola for the R code from which I based my graph generation.

09 Aug 2011: Red Hat's Most Serious Flaw Types for 2010

A few weeks ago the 2011 update to the CWE/SANS Top 25 Most Dangerous Software Errors was published. As part of our contribution to this update we analysed the most severe vulnerabilities that affected Red Hat since the last update and mapped each one to the appropriate Common Weakness Enumeration (CWE) type.

The table below lists all vulnerabilities which have a CVSS score of 7 or more ('high'), that we fixed in any product during calendar year 2010.

Most common CWE were:

Buffer Copy without Checking Size of Input (CWE-120): 8 vulnerabilities.
Race Condition (CWE-362): 5 vulnerabilities.

CVE CWE 2011 top 25? CVSS base score Fixed in

CVE-2007-4567 CWE-476 no 7.8 Red Hat Enterprise Linux 5 (kernel)

CVE-2009-0778 CWE-770 no 7.1 Red Hat Enterprise Linux 5 (kernel)

CVE-2009-1385 CWE-191 no 7.1 Red Hat Enterprise Linux 5 (kernel)

CVE-2009-3080 CWE-129 no 7.2 Red Hat Enterprise Linux 3, 4, 5, MRG (kernel)

CVE-2009-3245 CWE-252 no 7.6 Red Hat Enterprise Linux 3, 4, 5 (openssl)

CVE-2009-3726 CWE-476 no 7.2 Red Hat Enterprise Linux 4, 5, MRG (kernel)

CVE-2009-4005 CWE-127 no 7.1 Red Hat Enterprise Linux 4 (kernel)

CVE-2009-4027 CWE-362 no 7.8 Red Hat Enterprise Linux 5 (kernel)

CVE-2009-4141 CWE-416 no 7.2 Red Hat Enterprise Linux 5, MRG (kernel)

CVE-2009-4212 CWE-191 no 10.0 Red Hat Enterprise Linux 3, 4, 5 (krb5)

CVE-2009-4272 CWE-764 no 7.8 Red Hat Enterprise Linux 5 (kernel)

CVE-2009-4273 CWE-78 yes 7.9 Red Hat Enterprise Linux 5 (systemtap)

CVE-2009-4537 CWE-120 yes 7.1 Red Hat Enterprise Linux 4, 5, MRG (kernel)

CVE-2009-4895 CWE-362 no 7.2 Red Hat Enterprise MRG (kernel)

CVE-2010-0008 CWE-606 no 7.8 Red Hat Enterprise Linux 4, 5 (kernel)

CVE-2010-0291 CWE-822 no 7.2 Red Hat Enterprise Linux 5 (kernel)

CVE-2010-0738 CWE-424 no 7.5 JBoss Enterprise Application Platform 4.2, 4.3

CVE-2010-0741 CWE-20 no 7.1 Red Hat Enterprise Linux 5 (kvm)

CVE-2010-1084 CWE-120 yes 7.2 Red Hat Enterprise Linux 5 (kernel)

CVE-2010-1086 CWE-20 no 7.8 Red Hat Enterprise Linux 4, 5 (kernel)

CVE-2010-1087 CWE-362 no 7.2 Red Hat Enterprise Linux 5 (kernel)

CVE-2010-1166 CWE-823 no 7.6 Red Hat Enterprise Linux 5 (xorg-x11-server)

CVE-2010-1173 CWE-120 * yes 7.1 Red Hat Enterprise Linux 4, 5 (kernel)

CVE-2010-1188 CWE-416 no 7.8 Red Hat Enterprise Linux 3, 4, 5 (kernel)

CVE-2010-1436 CWE-120 yes 7.2 Red Hat Enterprise Linux 5 (kernel)

CVE-2010-1437 CWE-362 no 7.2 Red Hat Enterprise Linux 4, 5 (kernel)

CVE-2010-2063 CWE-823 no 7.5 Red Hat Enterprise Linux 3, 4, 5 (samba)

CVE-2010-2235 CWE-77 no 7.1 Red Hat Network Satellite Server 5.3 (cobbler)

CVE-2010-2240 CWE-788 no 7.2 Red Hat Enterprise Linux 3, 4, 5, MRG (kernel)

CVE-2010-2248 CWE-682 no 7.1 Red Hat Enterprise Linux 4, 5 (kernel)

CVE-2010-2492 CWE-805 no 7.2 Red Hat Enterprise Linux 5, 6 (kernel)

CVE-2010-2521 CWE-805 no 8.3 Red Hat Enterprise Linux 4, 5, MRG (kernel)

CVE-2010-2798 CWE-476 no 7.2 Red Hat Enterprise Linux 5 (kernel)

CVE-2010-2962 CWE-823 no 7.2 Red Hat Enterprise Linux 6, MRG (kernel)

CVE-2010-3069 CWE-129 no 8.3 Red Hat Enterprise Linux 3, 4, 5, 6 (samba)

CVE-2010-3081 CWE-131 yes 7.2 Red Hat Enterprise Linux 3, 4, 5, 6, MRG (kernel)

CVE-2010-3084 CWE-120 yes 7.2 Red Hat Enterprise Linux 6 (kernel)

CVE-2010-3301 CWE-129 no 7.2 Red Hat Enterprise Linux 6 (kernel)

CVE-2010-3302 CWE-120 yes 7.1 Red Hat Enterprise Linux 6 (openswan)

CVE-2010-3308 CWE-120 yes 7.1 Red Hat Enterprise Linux 6 (openswan)

CVE-2010-3432 CWE-805 * no 7.8 Red Hat Enterprise Linux 4, 5, 6, MRG (kernel)

CVE-2010-3705 CWE-788 no 8.3 Red Hat Enterprise Linux 6, MRG (kernel)

CVE-2010-3708 CWE-77 no 7.5 JBoss Enterprise Application Platform 4.3, SOA Platform 4.2

CVE-2010-3752 CWE-78 yes 7.1 Red Hat Enterprise Linux 6 (openswan)

CVE-2010-3753 CWE-78 yes 7.1 Red Hat Enterprise Linux 6 (openswan)

CVE-2010-3847 CWE-426 no 7.2 Red Hat Enterprise Linux 5, 6 (glibc)

CVE-2010-3856 CWE-426 no 7.2 Red Hat Enterprise Linux 5, 6 (glibc)

CVE-2010-3864 CWE-362 no 7.6 Red Hat Enterprise Linux 6 (openssl)

CVE-2010-3904 CWE-822 no 7.2 Red Hat Enterprise Linux 5, 6 (kernel)

CVE-2010-4170 CWE-88 no 7.2 Red Hat Enterprise Linux 4, 5, 6 (systemtap)

CVE-2010-4179 CWE-862 yes 7.5 Red Hat Enterprise MRG (cumin)

CVE-2010-4344 CWE-120 yes 7.5 Red Hat Enterprise Linux 4, 5 (exim)

CVE	CWE	2011 top 25?	CVSS base score	Fixed in
CVE-2007-4567	CWE-476	no	7.8	Red Hat Enterprise Linux 5 (kernel)
CVE-2009-0778	CWE-770	no	7.1	Red Hat Enterprise Linux 5 (kernel)
CVE-2009-1385	CWE-191	no	7.1	Red Hat Enterprise Linux 5 (kernel)
CVE-2009-3080	CWE-129	no	7.2	Red Hat Enterprise Linux 3, 4, 5, MRG (kernel)
CVE-2009-3245	CWE-252	no	7.6	Red Hat Enterprise Linux 3, 4, 5 (openssl)
CVE-2009-3726	CWE-476	no	7.2	Red Hat Enterprise Linux 4, 5, MRG (kernel)
CVE-2009-4005	CWE-127	no	7.1	Red Hat Enterprise Linux 4 (kernel)
CVE-2009-4027	CWE-362	no	7.8	Red Hat Enterprise Linux 5 (kernel)
CVE-2009-4141	CWE-416	no	7.2	Red Hat Enterprise Linux 5, MRG (kernel)
CVE-2009-4212	CWE-191	no	10.0	Red Hat Enterprise Linux 3, 4, 5 (krb5)
CVE-2009-4272	CWE-764	no	7.8	Red Hat Enterprise Linux 5 (kernel)
CVE-2009-4273	CWE-78	yes	7.9	Red Hat Enterprise Linux 5 (systemtap)
CVE-2009-4537	CWE-120	yes	7.1	Red Hat Enterprise Linux 4, 5, MRG (kernel)
CVE-2009-4895	CWE-362	no	7.2	Red Hat Enterprise MRG (kernel)
CVE-2010-0008	CWE-606	no	7.8	Red Hat Enterprise Linux 4, 5 (kernel)
CVE-2010-0291	CWE-822	no	7.2	Red Hat Enterprise Linux 5 (kernel)
CVE-2010-0738	CWE-424	no	7.5	JBoss Enterprise Application Platform 4.2, 4.3
CVE-2010-0741	CWE-20	no	7.1	Red Hat Enterprise Linux 5 (kvm)
CVE-2010-1084	CWE-120	yes	7.2	Red Hat Enterprise Linux 5 (kernel)
CVE-2010-1086	CWE-20	no	7.8	Red Hat Enterprise Linux 4, 5 (kernel)
CVE-2010-1087	CWE-362	no	7.2	Red Hat Enterprise Linux 5 (kernel)
CVE-2010-1166	CWE-823	no	7.6	Red Hat Enterprise Linux 5 (xorg-x11-server)
CVE-2010-1173	CWE-120 *	yes	7.1	Red Hat Enterprise Linux 4, 5 (kernel)
CVE-2010-1188	CWE-416	no	7.8	Red Hat Enterprise Linux 3, 4, 5 (kernel)
CVE-2010-1436	CWE-120	yes	7.2	Red Hat Enterprise Linux 5 (kernel)
CVE-2010-1437	CWE-362	no	7.2	Red Hat Enterprise Linux 4, 5 (kernel)
CVE-2010-2063	CWE-823	no	7.5	Red Hat Enterprise Linux 3, 4, 5 (samba)
CVE-2010-2235	CWE-77	no	7.1	Red Hat Network Satellite Server 5.3 (cobbler)
CVE-2010-2240	CWE-788	no	7.2	Red Hat Enterprise Linux 3, 4, 5, MRG (kernel)
CVE-2010-2248	CWE-682	no	7.1	Red Hat Enterprise Linux 4, 5 (kernel)
CVE-2010-2492	CWE-805	no	7.2	Red Hat Enterprise Linux 5, 6 (kernel)
CVE-2010-2521	CWE-805	no	8.3	Red Hat Enterprise Linux 4, 5, MRG (kernel)
CVE-2010-2798	CWE-476	no	7.2	Red Hat Enterprise Linux 5 (kernel)
CVE-2010-2962	CWE-823	no	7.2	Red Hat Enterprise Linux 6, MRG (kernel)
CVE-2010-3069	CWE-129	no	8.3	Red Hat Enterprise Linux 3, 4, 5, 6 (samba)
CVE-2010-3081	CWE-131	yes	7.2	Red Hat Enterprise Linux 3, 4, 5, 6, MRG (kernel)
CVE-2010-3084	CWE-120	yes	7.2	Red Hat Enterprise Linux 6 (kernel)
CVE-2010-3301	CWE-129	no	7.2	Red Hat Enterprise Linux 6 (kernel)
CVE-2010-3302	CWE-120	yes	7.1	Red Hat Enterprise Linux 6 (openswan)
CVE-2010-3308	CWE-120	yes	7.1	Red Hat Enterprise Linux 6 (openswan)
CVE-2010-3432	CWE-805 *	no	7.8	Red Hat Enterprise Linux 4, 5, 6, MRG (kernel)
CVE-2010-3705	CWE-788	no	8.3	Red Hat Enterprise Linux 6, MRG (kernel)
CVE-2010-3708	CWE-77	no	7.5	JBoss Enterprise Application Platform 4.3, SOA Platform 4.2
CVE-2010-3752	CWE-78	yes	7.1	Red Hat Enterprise Linux 6 (openswan)
CVE-2010-3753	CWE-78	yes	7.1	Red Hat Enterprise Linux 6 (openswan)
CVE-2010-3847	CWE-426	no	7.2	Red Hat Enterprise Linux 5, 6 (glibc)
CVE-2010-3856	CWE-426	no	7.2	Red Hat Enterprise Linux 5, 6 (glibc)
CVE-2010-3864	CWE-362	no	7.6	Red Hat Enterprise Linux 6 (openssl)
CVE-2010-3904	CWE-822	no	7.2	Red Hat Enterprise Linux 5, 6 (kernel)
CVE-2010-4170	CWE-88	no	7.2	Red Hat Enterprise Linux 4, 5, 6 (systemtap)
CVE-2010-4179	CWE-862	yes	7.5	Red Hat Enterprise MRG (cumin)
CVE-2010-4344	CWE-120	yes	7.5	Red Hat Enterprise Linux 4, 5 (exim)

* - in both these cases the outcome is not a buffer overflow as the possible overflow is detected and instead converted into an abort (DoS)

27 Jul 2011: Enterprise Linux 5.6 to 5.7 risk report

Red Hat Enterprise Linux 5.7 was released last week (July 2011), six months since the release of 5.6 in January 2011. So let's use this opportunity to take a quick look back over the vulnerabilities and security updates made in that time, specifically for Red Hat Enterprise Linux 5 Server.