mark :: blog

[ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 ] next >>

Moving my email archives to Ponymail went well

One feature I forgot that zoe had was how it indexed some attachments.  If you have an email with a PDF attachment and that PDF attachment had plain text in it (it wasn't just a scan) then you could search on words in the PDF.  That's super handy.  Ponymail doesn't do that, in fact you don't get to search on any text in attachments, even if they are plain text (or things like patches).  Let's fix that!

Remember how I said whenever I need some code I first look if there is an Apache community that has a project that does something similar?  Well Apache Tika is an awesome project that will return the plain text of pretty much whatever you throw at it.  PDF? sure.  Patches? definitely.  Word docs? yup.  Images? yes.  Wait, images? so Tika will go and use Tesseract and do an OCR of an image.

Okay, so let's add a field to the mbox index, attachmenttext, populate it with Tika, and search on it.  For now if some text in an attachment matches your search query you'll see the result, but you won't know exactly where the text appears (perhaps later it could highlight which attachment it appears in).

I wrote a quick Python script that runs through all emails in Ponymail (or some search query subset), and if they have attachments runs all the attachments through Apache Tika, storing the plain texts in the attachmenttext field.  We ignore anything that's already got something in that field, so we can just run this periodically rather than on import.  Then a one-line patch to Ponymail also searches the attachmenttext field.  40,000 attachments and two hours later, it was all done and working.

It's not ready for a PR yet; probably for Ponymail upstream we'd want the option of doing this at import, although I chose not too so we can be deliberately careful as parsing untrusted attachments is risky

So there we have it; a way to search your emails including inside most attachments, outside the cloud, using Open Source projects and a few little patches.

I have a lot of historical personal email, back as far as 1991, and from time to time there's a need to find some old message.  Although services like GMail would like to you keep all your mail in their cloud and pride themselves on searching, I'd rather keep my email archive offline and encrypted at rest unless there's some need to do a search.  Indeed I always use Google takeout every month to remove all historic GMail messages. Until this year I used a tool called Zoe for allowing searchable email archives.  You can import your emails, it uses Apache Lucene as a back end, and gives you a nice web based interface to find your mails.  But Zoe has been unmaintained for over a decade and has mostly vanished from the net. It was time to replace it.

Whenever I need some open source project my first place to look is if there is an Apache Software Foundation community with a project along the same lines.  And the ASF is all about communities communicating over Email, so not only is there an ASF project with a solution, but that project is used to provide the web interface for all the archived ASF mailing lists too.   "Ponymail Foal" is the project and is where you can see it running.  (Note that the Ponymail website refers to the old version of Pony Mail before "Foal")

Internally the project is mostly Python, HTML and Javascript, using Python scripts to import emails into elasticsearch, so it's really straightforward to get up and running following the project instructions.

So I can just import my several hundred thousand email messages I have in random text mbox format files and be done?  Well, nearly.  It almost worked but it needed a few tweaks:

  • Ponymail wasn't able to parse a fair number of email messages.  Analysing the mails led to only three root causes of mails not being able to be imported:
    • Bad "Content-Type" headers.  Even my bank gets this wrong with the header Content-Type: text/html; charset="utf-8 charset=\"iso-8859-1\"".  I just made the code ignore similar bad headers and try the fallbacks.   Patch here
    • Messages with no text or HTML body and no attachments.  These are fairly common for example a calendar entry might be sent as "Content-Type: text/calendar".  I just made it so that if there is no recognised body it just uses whatever the last section it found was, regardless of content type.  Patch here
    • Google Chat messages from many years ago.  These have no useful anything, no body, no to: no message id, no return address. Rather than note them as failures I use made the code ignore them completely.  Since this is just a warning, no upstream patch prepared.
  • Handling List-Id's.  Ponymail likes to sort mails by the List-Id which makes a lot of sense where you have the thousands of Apache lists.  But with personal email, and certainly when you subscribe to various newsletters, or get bills, or spam that got into the archives then you end up with lots of list id's that are only used once or twice or are not useful.  Working on open source projects there's lots of lists that I'm on that I want the email to get archived, but it would be nice if it was separated out in the Ponymail UI.  So really I needed the ability to have an 'allow list' of list id's that I want to have separate, with everything else defaulting to a generic list id (being my email address where all those mails came into).  Patch here
  • HTML email.  Where an email contains only HTML and no text version then Ponymail will make and store a text conversion of the HTML, but sometimes, especially those pesky bank emails, it's useful to be able to see the HTML with all the embedded images.  Displaying HTML email in HTML isn't really a goal for the project, especially since you have to be really careful you don't end up parsing untrusted javascript for example.  And you might not want all those tracking images to suddenly start getting pinged.  But I'd really like a button that you could use on selected emails to display them in HTML.  Fortunately Ponymail stores a complete raw copy of the email, any my proof-of-concept worked, so this can be easy to add in the future.
  • Managing a personal email archive can be a daunting task especially with the volume of email correspondence. However, with Ponymail, it's possible to take control of your email archive, keep it local and secure, and search through it quickly and efficiently using the power of ElasticSearch.

    My GPG key has lasted me well, over 18 years, but it's a v2 key and therefore no longer supported by newer versions of GnuPG. So it's time to move to a new one. I've made a transition statement available. If you signed my old key please consider signing the new one.

    I've written about why OpenSSL chose the disclosure method we did for the CCS Injection issues and how it went here

    Here is the timeline from my (OpenSSL) perspective for the recent CCS Injection (MITM) vulnerability as well as the other flaws being fixed today.

    SSL/TLS MITM vulnerability (CVE-2014-0224)

    DTLS recursion flaw (CVE-2014-0221)

    DTLS invalid fragment vulnerability (CVE-2014-0195)

    Anonymous ECDH denial of service (CVE-2014-3470)

    (All times UTC)

    Post copied from my original source on Google+

    We've had more than a few press enquiries at OpenSSL about the timeline of the CVE-2014-0160 (heartbleed) issue. Here's the OpenSSL view of the timeline: So to be clear, OpenSSL notified only the following organisations prior to the public release of the issue: Red Hat, SuSE, Debian, FreeBSD, AltLinux.

    (Originally posted in Google+ at )

    Note: Akamai note on their blog that they were given advance notice of this issue by the OpenSSL team. This is incorrect. They were probably notified directly by one of the vulnerability finders.

    Note: To see how this fits into the overall timeline of this issue see this article

    Here is a quick writeup of the protocol for the iKettle taken from my Google+ post earlier this month. This protocol allows you to write your own software to control your iKettle or get notifications from it, so you can integrate it into your desktop or existing home automation system.

    The iKettle is advertised as the first wifi kettle, available in UK since February 2014. I bought mine on pre-order back in October 2013. When you first turn on the kettle it acts as a wifi hotspot and they supply an app for Android and iPhone that reconfigures the kettle to then connect to your local wifi hotspot instead. The app then communicates with the kettle on your local network enabling you to turn it on, set some temperature options, and get notification when it has boiled.

    Once connected to your local network the device responds to ping requests and listens on two tcp ports, 23 and 2000. The wifi connectivity is enabled by a third party serial to wifi interface board and it responds similar to a HLK-WIFI-M03. Port 23 is used to configure the wifi board itself (to tell it what network to connect to and so on). Port 2000 is passed through to the processor in the iKettle to handle the main interface to the kettle.

    Port 2000, main kettle interface

    The iKettle wifi interface listens on tcp port 2000; all devices that connect to port 2000 share the same interface and therefore receive the same messages. The specification for the wifi serial board state that the device can only handle a few connections to this port at a time. The iKettle app also uses this port to do the initial discovery of the kettle on your network.


    Sending the string "HELLOKETTLE\n" to port 2000 will return with "HELLOAPP\n". You can use this to check you are talking to a kettle (and if the kettle has moved addresses due to dhcp you could scan the entire local network looking for devices that respond in this way. You might receive other HELLOAPP commands at later points as other apps on the network connect to the kettle.

    Initial Status

    Once connected you need to figure out if the kettle is currently doing anything as you will have missed any previous status messages. To do this you send the string "get sys status\n". The kettle will respond with the string "sys status key=\n" or "sys status key=X\n" where X is a single character. bitfields in character X tell you what buttons are currently active:

    Bit 6Bit 5Bit 4Bit 3Bit 2Bit 1

    So, for example if you receive "sys status key=!" then buttons "100C" and "On" are currently active (and the kettle is therefore turned on and heating up to 100C).

    Status messages

    As the state of the kettle changes, either by someone pushing the physical button on the unit, using an app, or sending the command directly you will get async status messages. Note that although the status messages start with "0x" they are not really hex. Here are all the messages you could see:

    sys status 0x100100C selected
    sys status 0x9595C selected
    sys status 0x8080C selected
    sys status 0x10065C selected
    sys status 0x11Warm selected
    sys status 0x10Warm has ended
    sys status 0x5Turned on
    sys status 0x0Turned off
    sys status 0x8005Warm length is 5 minutes
    sys status 0x8010Warm length is 10 minutes
    sys status 0x8020Warm length is 20 minutes
    sys status 0x3Reached temperature
    sys status 0x2Problem (boiled dry?)
    sys status 0x1Kettle was removed (whilst on)

    You can receive multiple status messages given one action, for example if you turn the kettle on you should get a "sys status 0x5" and a "sys status 0x100" showing the "on" and "100C" buttons are selected. When the kettle boils and turns off you'd get a "sys status 0x3" to notify you it boiled, followed by a "sys status 0x0" to indicate all the buttons are now off.

    Sending an action

    To send an action to the kettle you send one or more action messages corresponding to the physical keys on the unit. After sending an action you'll get status messages to confirm them.

    set sys output 0x80Select 100C button
    set sys output 0x2Select 95C button
    set sys output 0x4000Select 80C button
    set sys output 0x200Select 65C button
    set sys output 0x8Select Warm button
    set sys output 0x8005Warm option is 5 mins
    set sys output 0x8010Warm option is 10 mins
    set sys output 0x8020Warm option is 20 mins
    set sys output 0x4Select On button
    set sys output 0x0Turn off

    Port 23, wifi interface

    The user manual for this document is available online, so no need to repeat the document here. The iKettle uses the device with the default password of "000000" and disables the web interface.

    If you're interested in looking at the web interface you can enable it by connecting to port 23 using telnet or nc, entering the password, then issuing the commands "AT+WEBS=1\n" then "AT+PMTF\n" then "AT+Z\n" and then you can open up a webserver on port 80 of the kettle and change or review the settings. I would not recommend you mess around with this interface, you could easily break the iKettle in a way that you can't easily fix. The interface gives you the option of uploading new firmware, but if you do this you could get into a state where the kettle processor can't correctly configure the interface and you're left with a broken kettle. Also the firmware is just for the wifi serial interface, not for the kettle control (the port 2000 stuff above), so there probably isn't much point.

    Missing functions

    The kettle processor knows the temperature but it doesn't expose that in any status message. I did try brute forcing the port 2000 interface using combinations of words in the dictionary, but I found no hidden features (and the folks behind the kettle confirmed there is no temperature read out). This is a shame since you could combine the temperature reading with time and figure out how full the kettle is whilst it is heating up. Hopefully they'll address this in a future revision.

    Security Implications

    The iKettle is designed to be contacted only through the local network - you don't want to be port forwarding to it through your firewall for example because the wifi serial interface is easily crashed by too many connections or bad packets. If you have access to a local network on which there is an iKettle you can certainly cause mischief by boiling the kettle, resetting it to factory settings, and probably even bricking it forever. However the cleverly designed segmentation between the kettle control and wifi interface means it's pretty unlikely you can do something more serious like overiding safety (i.e. keeping the kettle element on until something physically breaks).

    "Before" and "After" video:

    We were looking for a cheap laser lighting effect for our weekend parties and saw one that looked impressive, the Lanta Quasar Buster 2, and for only £30 new. The unit has both a red and green laser and and a nice moving effect that looks like the beams splits up and recombine again. It promised "sound activation" and we thought that meant it would do some clever sound to light effect, but it really does mean sound activation and just turns itself on when it hears a sound, then off again when it's silent. So out of the box the laser has three modes; the first lets you just set the speed of the effect with the lasers constantly on, the second strobes the lasers on and off to a speed you can set, and the third is the usless sound activation mode.


    Warrany void if removed. I didn't technically "remove" the sticker though.


    Opening the unit showed that it was easily hackable; all the connections to the control panel were via connectors. One connector provides +5v to the cooling fan, another +5v to a separate board that handles powering the two lasers, another connects to the motor the turns the optics to produce the burst effect, and the final one has a logic level signal to tell the laser power board if the lasers should be on or off.

    Since the laser power board is completely separate we can just replace this control panel with one of our own and then we can control the laser on/off and the speed of the motor (actually we could control the direction too but it doesn't really make the effect look any better so I leave it as one direction). And we can always swap the original board back in the future.

    My new control board comprises of an Arduino pro mini compatible board, a rotary encoder for setting the mode and levels, a mic with simple opamp preamp, and a MSGEQ7 chip to do all the hard work of analysing the levels of various frequencies. The optics motor is now simply driven using a PWM output via a MOSFET I had spare.



    Rough source and circuit diagram are available from github; some components don't have values if it doesn't really matter and others (like the MOSFET) can be changed as I just used things I happened to have in my component boxes. I'm still playing with different effects in software to see what works best.

    You can read my Enterprise Linux 6.3 to 6.4 risk report on the Red Hat Security Blog.

    "for all packages, from release of 6.3 up to and including 6.4, we shipped 108 advisories to address 311 vulnerabilities. 18 advisories were rated critical, 28 were important, and the remaining 62 were moderate and low."

    "Updates to correct 77 of the 78 critical vulnerabilities were available via Red Hat Network either the same day or the next calendar day after the issues were public. The other one was in OpenJDK 1.60 where the update took 4 calendar days (over a weekend)."

    And if you are interested in how the figures were calculated, here is the working out:

    Note that we can't just use a date range because we've pushed some RHSA the weeks before 6.4 that were not included in the 6.4 spin. These issues will get included when we do the 6.4 to 6.5 report (as anyone installing 6.4 will have got them when they first updated).

    So just after 6.4 before anything else was pushed that day:

    ** Product: Red Hat Enterprise Linux 6 server (all packages)
    ** Dates: 20101110 - 20130221 (835 days)
    ** 397 advisories (C=55 I=109 L=47 M=186 )
    ** 1151 vulnerabilities (C=198 I=185 L=279 M=489 )
    ** Product: Red Hat Enterprise Linux 6 Server (default installation packages)
    ** Dates: 20101110 - 20130221 (835 days)
    ** 177 advisories (C=11 I=71 L=19 M=76 )
    ** 579 vulnerabilities (C=35 I=133 L=159 M=252 )

    And we need to exclude errata released before 2013-02-21 but not in 6.4:

    RHSA-2013:0273 [critical, default]
    RHSA-2013:0275 [important, not default]
    RHSA-2013:0272 [critical, not default]
    RHSA-2013:0271 [critical, not default]
    RHSA-2013:0270 [moderate, not default]
    RHSA-2013:0269 [moderate, not default]
    RHSA-2013:0250 [moderate, default]
    RHSA-2013:0247 [important, not default]
    RHSA-2013:0245 [critical, default]
    RHSA-2013:0219 [moderate, default]
    RHSA-2013:0216 [important, default]
    Default vulns from above: critical:12 important:2 moderate:16 low:3
    Non-Default vulns from above: critical:4 important:2 moderate:5 low:0

    This gives us "Fixed between GA and 6.4 iso":

    ** Product: Red Hat Enterprise Linux 6 server (all packages)
    ** Dates: 20101110 - 20130221 (835 days)
    ** 386 advisories (C=51 I=106 L=47 M=182 )
    ** 1107 vulnerabilities (C=182 I=181 L=276 M=468 )
    ** Product: Red Hat Enterprise Linux 6 Server (default installation packages)
    ** Dates: 20101110 - 20130221 (835 days)
    ** 172 advisories (C=9 I=70 L=19 M=74 )
    ** 546 vulnerabilities (C=23 I=131 L=156 M=236 )

    And taken from the last report "Fixed between GA and 6.3 iso":

    ** Product: Red Hat Enterprise Linux 6 server (all packages)
    ** Dates: 20101110 - 20120620 (589 days)
    ** 278 advisories (C=33 I=78 L=31 M=136 )
    ** 796 vulnerabilities (C=104 I=140 L=196 M=356 )
    ** Product: Red Hat Enterprise Linux 6 Server (default installation packages)
    ** Dates: 20101110 - 20120620 (589 days)
    ** 134 advisories (C=6 I=56 L=15 M=57 )
    ** 438 vulnerabilities (C=16 I=110 L=126 M=186 )

    Therefore between 6.3 iso and 6.4 iso:

    ** Product: Red Hat Enterprise Linux 6 server (all packages)
    ** Dates: 20120621 - 20130221 (246 days)
    ** 108 advisories (C=18 I=28 L=16 M=46 )
    ** 311 vulnerabilities (C=78 I=41 L=80 M=112 )
    ** Product: Red Hat Enterprise Linux 6 Server (default installation packages)
    ** Dates: 20120621 - 20130221 (246 days)
    ** 38 advisories (C=3 I=14 L=4 M=17 )
    ** 108 vulnerabilities (C=7 I=21 L=30 M=50 )

    Note: although we have 3 default criticals, they are in openjdk-1.6.0, but we only call Java issues critical if they can be exploited via a browser, and in RHEL6 the Java browser plugin is in the icedtea-web package, which isn't a default package. So that means on a default install you don't get Java plugins running in your browser, so really these are not default criticals in RHEL6 default at all.

    You can read my Enterprise Linux 6.2 to 6.3 risk report on the Red Hat Security Blog.
    "for all packages, from release of 6.2 up to and including 6.3, we shipped 88 advisories to address 233 vulnerabilities. 15 advisories were rated critical, 23 were important, and the remaining 50 were moderate and low."

    "Updates to correct 34 of the 36 critical vulnerabilities were available via Red Hat Network either the same day or the next calendar day after the issues were public. The Kerberos telnet flaw was fixed in 2 calendar days as the issue was published on Christmas day. The second PHP flaw took 4 calendar days (over a weekend) as the initial fix released upstream was incomplete."

    And if you are interested in how the figures were calculated, as always view the source of this blog entry.

    [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 ] next >>

    Hi! I'm Mark Cox. This blog gives my thoughts on security work, open source, home automation, and other topics.