About Me

Hi, my name is Tom Purl and this is where I share my experiences playing around with technology and other fun stuff.

Lessons Learned From Yet Another Jekyll Convert

Date published: 14 Dec 2011

I recently converted my blog hosted on wordpress.com to use the Jekyll static blog generator. I'm pretty happy with the results, and it was a fun process getting here, so I thought I would share my lessons learned.

Lesson #1 - The Docs Aren't Great

First, let me say that Jekyll was not intended to be a commercial product. Hell, it wasn't even designed to be a terribly popular project. It was a side-project of one of the Github creators, Tom Preston-Warner. So it's easy to understand why there isn't a ton of good documentation.

So what do we have? Well, we have a lot of great tutorials from bloggers who have learned to use Jekyll. After reading a few of the better ones, you should be on your way to rolling your own awesome blog. Here are some of the ones that I really liked:

Lesson #2 - There's No Official Jekyll Skeleton

So what questions are left unanswered? Well, for starters, how do you create a site?!? The Jekyll usage guide does a decent job showing you which files are necessary, but it doesn't actually tell you what those files need to contain. Instead, you're supposed to clone the source for someone else's site on Github (or Bitbucket or whatever) and then change it to fit your needs.

So here's basically how I created my "base" Jekyll site, which didn't include any blog content:

  1. I cloned Tom Preston-Warner's site from Github.
  2. I deleted his CSS files (because I wanted a site that looked very different).
  3. I deleted his blog entries.

I also had to install the following gems:

  • redcloth
  • RDiscount

That's it! Of course, I had a very ugly and empty blog at this point, but I had the bare essentials that I needed to start using Jekyll.

Lesson #3 - Converting HTML to Markdown Is Tricky

Remember, my blog was previously hosted on Wordpress.com, which means that I have to save my posts as HTML. I'm not a huge fan of writing my blog posts using HTML, so I decided to switch to one of the other markup languages that Jekyll supports, Markdown.

This was great for all new posts, but did I really need to convert my old posts? Doesn't Jekyll support HTML too? Well yes, it does, but the HTML that I was able to extract using the converter that came with the Jekyll gem was pretty messy.

So I decided to make every blog post use Markdown, hell or high water. Not only would it make all of my posts compatible with Jekyll, but it would make it easier to edit or convert my old posts in the future.

The Script

I therefore wrote the following script to help:

This script basically does the following:

  1. It writes the YAML front matter to a .md file.
  2. It then tries to convert the HTML content to Markdown using html2text.py. This script does a very good job of converting HTML to Markdown, but it failed for me about 40% of the time.
  3. It html2text.py does fail, then convert the HTML using pandoc, which is much more reliable but worse at generating perfect output.
  4. Write the Markdown output to the same .md file that contains your YAML front matter.

I stored all of my exported HTML files in a folder called _archivedposts. Here's how I generated my blog's content:

$ cd _archivedposts
$ for f in $(ls *.html); do ../htmlplusyml2mkd.sh $f; done
$ mv *.md ../_posts

Cleanup

Of course, neither html2text.py or pandoc are perfect so a lot of my blog posts were a little mixed up. Manually cleaning up every single one of my blog posts would have been a major waste of time and effort for me, so I did the following:

  1. I checked my Wordpress stats to see what my most "popular" posts were.
  2. I made a list of every post had more than 50 page views (which ain't bad for my site).
  3. I manually made the final touches on those files and ignored the rest.

If you do find a post on my blog that looks a bit jumbled, then I apologize, but it just wasn't worth my time to fix it manually.

Lesson #4 - Creating Your Own Website From Scratch Is Fun

I used to spend hours every week in college manually tweaking the HTML and JavaScript in my web sites in the Sun lab. It was lots of fun creating something that I could share with the entire world these new and exciting (at the time) technologies.

Then I got a little older and busier, and while I still loved to write on my web site, I didn't want to have do take care of every single aspect of it any more. So I started using tools like Plone and Wordpress to author content. They had nice little WYSIWYG editors, and someone else worried about things like style, usability, and performance.

Making my new blog from (near) scratch forced me to think outside of my usual box about those icky things, and I'm really glad it did. For starters, it gave me an opportunity to use the underused, more artistic part of my brain. Also, it gave me an opportunity to learn about new web standards and tools for managing and creating a modern web site, such as:

  • Chrome's "Developer Tools": Chrome has a built-in module that helps you do things like design and profile a web site. This tool was especially useful to me when I was tweaking my CSS.
  • Google Web Fonts: Did you know that there were web apps that did nothing but serve up pretty fonts that could be used by other web sites? Me neither, until I started looking into Typekit and Google Web Fonts.
  • Google Analytics: One of my favorite things about Wordpress.com is that they have a great statistics page that you can use go gauge the popularity of your blog. However, you can also use Google Analytics to gather the same basic statistics (and more) for your static blog. And you can do this all for free.

Lesson #5 - Converting Comments Is Hard

Simply put, Disqus choked every time I tried to convert my Wordpress.com comments over, so I just skipped this step. I hope that I don't offend anyone who's left a comment on my blog in the past, but I only had a handful in the first place.

Lesson #6 - There's A Jekyll Fork That Makes Some Of These Hard Things Easier

There are lots of Jekyll forks out there that take care of a lot of the gripes that you see above, but I'm sticking with the canonical copy for now to make things a little simpler.

Awestruct does look especially compelling to me, and it seems to be pretty well supported. Once I'm a little more comfortable with my new site and Jekyll in general, I'll give it a second look.

My Repo

I published the source for my blog here:

It's a little rough, but hopefully it's a good jumping off point.

Good luck!

Switch To Jekyll

Date published: 09 Dec 2011

For the 3 people who may notice, I switched my blog from Wordpress to Jekyll. The conversion has been a fun experience, because it gave me the opportunity to "roll my own" website instead of relying on the good people at Wordpress.com to do everything for me.

I hope to write a blog post about the entire conversion soon. But in the mean time, if you notice anything strange or have any problems, then I would really appreciate it if you would drop me a line.

Enjoy!

Exim + Gmail On Ubuntu

Date published: 17 Sep 2011

This tutorial shows you how to set up a light-weight mail server on your Ubuntu system that can send mail to host-only (e.g. tom) and remote (e.g. tom@tompurl.com) addresses using Gmail as your SMTP server.

So what the heck does that mean? We’re making it possible for you (and various programs on your computer) do the following:

$ echo "Hello!" | mail -s "This is cool" tom # Sent to /var/mail/tom spool
$ echo "Hello!" | mail -s "This is cool" tom@tompurl.com # Sent to my Gmail account

So now you may be asking yourself “why anyone would want to so something like this on a desktop machine that isn’t a mail server? Can’t you just send email using programs like the Gmail web client and Thunderbird?”. You certainly can, but it’s not always the best choice.

For example, it you wanted to send an email message from a shell script, the easiest way to do that is to use the mail command above. Also, your system may want to send you a message if something weird happens, like a failed cronjob. Without a working mail server like Exim installed and configured, those messages are going to end up in /dev/null. So let’s get started :)

Prerequisites

This tutorial is designed to work with Ubuntu Linux 11.04, but it may work with other versions of Ubuntu and Debian Linux. Here’s all of the pertinent software versions that I’m using:

exim4-base                4.74-1ubuntu1.2
exim4-config              4.74-1ubuntu1.2
exim4-daemon-light        4.74-1ubuntu1.2
libmailutils2             1:2.1+dfsg1-7build1
mailutils                 1:2.1+dfsg1-7build1
mutt                      1.5.21-2ubuntu3

I used  this tutorial on using Exim with Gmail  to set up outgoing mail. If my instructions below don’t work for you, then that tutorial may be able to help.

Software Installation

This part is super easy:

$ sudo apt-get install exim4-base mailutils mutt

Note: We’re using exim (the Debian default) as our mail server instead of postfix, which is the default mail server in the Ubuntu world. You probably don’t care, and for 99% of you it shouldn’t matter. I’m just pointing it out because this is an Ubuntu-centric tutorial.

The mailutils package gives you a lightweight version of the exim daemon along with the mail and mailxprograms, which are pretty important if you ever want to be notified by your system when something strange happens.

Finally, we’re installing mutt, which is a mail reader that you can use in a console. Please note that you will need to install this program (or something similar) if you want to read mail that is sent to you by your system. Showing you how to use mutt is beyond the scope of this tutorial, but if you need some basic guidance, then I recommend My First Mutt.

Configuration

First, let’s configure exim with debconf using the following command:

$ sudo dpkg-reconfigure exim4-config

You will now be presented with a configuration wizard. Here’s what I chose:

  • Server Type
    • smarthost + smtp
  • System mail name
    • <your host name>
  • listening ip address
    • 127.0.0.1 ; ::1
  • Other destinations
    • <your host name>
  • machines to relay for
    • <blank>
  • smarthost ip address
    • smtp.gmail.com::587
  • Hide local mail name
    • No
  • DNS Queries
    • No
  • Delivery method
    • Mbox
  • Split config?
    • Yes

Next, execute the following command:

$ chown root:Debian-exim /etc/exim4/passwd.client

The only step left is to specify your Gmail password. Open /etc/exim4/passwd.client and add something like this at the bottom of the file:

*.google.com:tom@tompurl.com:somethingClever

Of course, you’ll want to replace the email address and password :) Please note that this config works with normal Gmail accounts and accounts that use Google Apps For Your Domain (like mine).

Testing

Now let’s run a couple of simple tests:

# Please replace "me" with your user account name and verify in mutt
$ echo test | mail -s "test" me # Sends mail to /var/mail/me spool
# Please replace "me@gmail.com" with your actual Gmail address
$ echo test | mail -s "test" me@gmail.com # verify using Gmail

Conclusion

That’s it! I hope that I’ve been able to help a few other people

Installing Graphite On Ubuntu 10.4 LTS

Date published: 12 Aug 2011

10/27/11 Update - The instructions below work with version 0.9.8 of Graphite. A new version (0.9.9) has been released that requires a few more steps. I haven't had time to test out the new version myself yet, but I've been told that the following tutorial does a good job of showing you how to install the latest version.


This tutorial shows you how I installed Graphite, a fantastic tool for for visualizing time-series data, on an Ubuntu 10.4 LTS system. The process is split up into 4 steps:

  • Installing and testing Graphite and Carbon in “dev” mode
  • Integrating Graphite with Apache
  • Making Carbon a managed service
  • Password-protecting your Graphite site

Installing In Dev Mode

By “dev” mode, I mean that we’re going to install, run and test Graphite and Carbon in a “quick-and-dirty” way. You will run all services using your personal account and you won’t integrate it with a web server (yet). So why am I doing this? Well, usually it takes less time to set up an app this way, which saves me time when evaluating new software. Also, I find that you learn a little more about the “guts” of a new application when you start this way. Of course, once you have evaluated Graphite and decided to install it on a separate system, you should skip the “Dev Mode” step and just install it as managed service (which I explain later).

Installation

First, let’s install everything that we can using apt-get:

$ sudo apt-get install bzr python-cairo python-django

The bzr program will be used to download the Graphite source files. The other packages will support Graphite at runtime. Next downloaded the source and compile Graphite:

$ cd ~/src
$ bzr branch lp:graphite
$ cd graphite
$ python ./setup.py build
$ sudo python ./setup.py install

Note: The last step will install the executables under /opt/graphite.

Next, we’ll install Whisper, the custom database that Graphite uses:

$ cd ~/src/graphite/whisper
$ python ./setup.py build
$ sudo python ./setup.py install

Finally, let’s install Carbon. Carbon is a agent that listens for readings and writes them to the Whisper databases:

$ cd ~/src/graphite/carbon
$ python ./setup.py build
$ sudo python ./setup.py install

Now let’s configure Carbon:

$ cd /opt/graphite/conf
$ sudo cp carbon.conf.example carbon.conf
$ sudo cp storage-schemas.conf.example storage-schemas.conf

Please note that you will probably want to reconfigure the storage-schemas.conf file soon. We are using the defaults now because we just want to get a base system up-and-running.

Now, since we’re still in “dev” mode, let’s make our experience a little bit nicer by making your regular user account the owner of the /opt/grahite folder. This will make it easier for you to do things like change config options and restart services. Don’t worry – eventually we’re going to fix this:

$ cd /opt
$ sudo chown -R myid:myid grahite

Of course, you would replce the myid value with your login name. Now we are ready to initialize the Whisper database. Execute the following command:

$ cd /opt/graphite
$ PYTHONPATH=`pwd`/webapp:`pwd`/whisper python ./webapp/graphite/manage.py syncdb

That last command will generate your initial databases and prompt you to create Django user. This user account will allow you to log into Graphite, and it is a web application user that is managed by the Django library. I recommend creating the user, especially if you are not very familiar with Django as a framework.

Note: Like most Django apps, you can manage this user and add others later by visiting http://your-graphite-url:8080/admin

OK, There’s one more configuration step that you need to run. Execute the following:

$ echo DEBUG = True > /opt/graphite/webapp/graphite/local_settings.py

Testing

Now for the fun part. Let’s fire up the web UI:

$ cd /opt/graphite
$ PYTHONPATH=`pwd`/whisper ./bin/run-graphite-devel-server.py --libs=`pwd`/webapp/ /opt/graphite/

You should now be able to visit http://localhost:8080 and see a very nice web application. If you’re hosting this application on a VM or separate machine, then simple replace “localhost” with the IP address of that machine. The web app should now be running, but there’s not really any data yet. To do that, we need to do the following:

  1. Start carbon, which listens for data and writes it to the whisper databases
  2. Start feeding it some data using using a test client.

Number 1 is pretty easy:

$ cd /opt/graphite
$ PYTHONPATH=`pwd`/whisper ./carbon/bin/carbon-cache.py --debug start

Now that your web app and data collection daemon are running, let’s start feeding it some data:

$ ~/src/graphite/examples/example-client.py

This script will write create the following monitors in Graphite:

  • Graphite -> system -> loadavg_15min
  • Graphite -> system -> loadavg_1min
  • Graphite -> system -> loadavg_5min

Clicking on a monitor shows its values in the graph. Clicking on the same monitor again deselects it.

Note: If you’re not seeing any data immediately, don’t worry. Check it again in 5 minutes.

The example client writes data to Graphite once per minute, so you should start seeing results soon.

Integrating With Apache

Now that you know that Graphite and Carbon work, let’s make them both managed services. By that, I mean that I don’t want to have to start any daemons manually when I restart my system. Carbon and Graphite should just work. Also, Graphite will perform much better once it is hosted on an Apache instance.

Configuration

First, let’s install the dependencies:

$ sudo apt-get install apache2 libapache2-mod-wsgi

We’re going to run our Graphite instance as a virtual host. The preferred way of doing this on Debian-based Linux distributions (like Ubuntu) is to create a vhost file and then enable it using the Debian Apache helpers. Lucky for us, there’s an example vhost file called ~/src/grahite/examples/example-graphite-vhost.conf.

Execute the following commands:

$ cd ~/src/graphite/examples
$ cp example-graphite-vhost.conf graphite-vhost.conf

Now make the following changes:

  • Comment out the WSGISocketPrefix line. This value will be set in a different config file.
  • Change the @DJANGO_ROOT@ value to /usr/lib/pymodules/python2.6/django.
  • If you don’t know what value to use with your ServerName property, then just leave it as graphite.

Save your graphite-vhost.conf file and then deploy it using the following commands:

$ sudo cp graphite-vhost.conf /etc/apache2/sites-available
$ sudo a2ensite graphite-vhost.conf

That last command creates a symlink to your graphite-vhost.conf file in /etc/apache2/sites-enabled and then tells you if you need to restart Apache or simply reload it. Now let’s take care of setting the WSGISocketPrefix value:

  • Open the /etc/apache2/mods-available/wsgi.conf file using your favorite text editor.
  • Uncomment the WSGISocketPrefix line an leave the default value.

One last thing before we reload Apache. The /opt/graphite directory is still owned by your id. You need to change everything so that is owned by the www-data user, which is the Apache user on Debian-based systems:

$ cd /opt
$ sudo chown -R www-data:www-data grahite

Now you can finally reload Apache using the following command:

$ sudo /etc/init.d/apache reload

Testing (And A Short ServerName Tutorial)

Now you should be able to visit your Graphite site using a URL that looks something like this:

If you know how the ServerName property in an Apache virtual host file works, then you will have no problem visiting the site, and you can jump to the next section. The rest of this section is for everyone else :)

If you don’t know how this property works, then you may try to test the Graphite site by visiting one of the following URL’s:

So why can’t you see your Graphite site? Apache cares about lots of things in your request header, but the following 3 are especially important:

  • The desired server IP address
  • The desired port
  • The ServerName value

It uses these three values to determine which vhost it will invoke for a request. Your request has parts one and two, but part three is simply graphite.ip.address. Your request will therefore be handled by the default vhost in Ubuntu, which displays the “it works” page. So we need to find a way to add the string graphite to our request header. The easiest way to do this is actually make the URL http://graphite point at our Graphite server. Here’s how you can do that:

  1. Open up your hosts file on your client running the web browser
  2. Add the word “graphite” as an alias for the machine hosting Graphite

So, for example, let’s assume that you’re hosting Graphite on a machine that has IP address of 10.0.0.100, and let’s assume that this machine already has an alias of “web”. Here’s what your host file looks like before the change:

10.0.0.100  web

And here’s what it should look like after the change:

10.0.0.100  web graphite

Note: Remember, we’re making these host file changes on the client, NOT the server.

Now, when you visit http://graphite, you should see the proper web site.

Making Carbon A Managed Service

Now that the web app is running so well, let’s “fix” carbon so that we don’t have to manually start it each time we reboot the server. Carbon doesn’t come with an init script, so I’ve been using the following crude version:

#! /bin/sh
# /etc/init.d/carbon

# Some things that run always
touch /var/lock/carbon

GRAPHITE_HOME=/opt/graphite
CARBON_USER=www-data

# Carry out specific functions when asked to by the system
case "$1" in
    start)
        echo "Starting script carbon "
        su $CARBON_USER -c "cd $GRAPHITE_HOME"; su $CARBON_USR -c "$GRAPHITE_HOME/bin/carbon-cache.py start"
        ;;
    stop)
        echo "Stopping script carbon"
        su $CARBON_USER -c "cd $GRAPHITE_HOME"; su $CARBON_USR -c "$GRAPHITE_HOME/bin/carbon-cache.py stop"
        ;;
    *)
        echo "Usage: /etc/init.d/carbon {start|stop}"
        exit 1
        ;;
esac

exit 0

Save this file as /etc/init.d/carbon, and then update rc.d using this command:

$ sudo update-rc.d carbon defaults

That’s it! You can now manage your carbon process using this script, and it will be automatically restarted when you reboot your machine.

Password-Protecting Your Graphite Site

Let’s take stock of where we are:

  • You installed Graphite and Carbon
  • You integrated Carbon with Apache
  • You made Carbon a managed service

You now have everything necessary to run a “real” Graphite instance. If you don’t need anything else, then feel free to skip the rest of this tutorial. For my needs, however, I needed one more thing. I needed to host my Graphite site on the world wide web, and I didn’t want just anyone poking in my system metrics. However, while Graphite may offer a Login link, it doesn’t give you the option of setting up a login page that can block non-authenticated access to the site.

Thankfully, there’s an easy way around this limitation. Apache gives you the ability to block non-authenticated access to a web site using the built-in security options. We’re going to manage security on our site using Basic authentication.

To do this, you first need to change your graphite-vhost.conf file. Add the following lines to the bottom of your file, just above the </VirtualHost> line:

# Set up .htaccess security so that I can protect the site online.
<Location "/">
    AuthType Basic
    AuthName "Under Construction"
    AuthUserFile /opt/graphite/sec/.mypasswds
    AuthGroupFile /opt/graphite/sec/.mygroups
    Require group managers
</Location>

Next, let’s create your AuthUserFile and your AuthGroupFile:

$ cd /opt/graphite
$ sudo mkdir sec
$ sudo chown -R www-data:www-data ./sec
$ sudo htpasswd -c ./sec/.mypasswds some_user_name
(enter a strong password)
$ echo 'managers: tom' | sudo tee -a ./sec/.mygroups
$ sudo chmod -R 600 ./sec
$ sudo /etc/init.d/apache reload

That’s it! Now restart your browser, and you should see a dialog box asking you to log in when visit your Graphite site.

Note: This configuration is only good enough to keep out the riff-raff. If you have more robust security needs, then you will want to look into using SSL.

Conclusion

I hope that some people find this tutorial to be helpful. If you find any errors or you have any suggestions, then please feel free to point them out in the comments.

GNU Screen Sugar

Date published: 02 Aug 2011

I don’t usually advertise my Github projects on this blog, but I thought that this one might actually be kinda useful to you if you like GNU Screen: - tompurl/Scrugar

Basically, it’s a couple of functions and aliases that I use to make it easier to use Screen. A few of my friends have found them to be pretty useful, so I decided to share them.