Tuesday, 28 June 2011

Apache 2.2: PAM authentication and SSL made easy.

The problem

If you run an Apache webserver and need to authenticate web users against system accounts with a central authentication service (LDAP, NIS, Kerberos), you previously had two basic choices:
  1. Use the specific authentication modules, eg auth_kerb or authnz_ldap
  2. Use auth_pam
I don't like option 1 - if you need to change your backend scheme (eg augment LDAP with kerberos, or switch the other way) you now have additional references to LDAP or kerberos sprinkled everywhere. That is a matter of opinion though - if you do want to do direct authentication from Apache, you may still find elements below of use with adaption.

It would also be cute to allow HTTP requests, and redirect them to HTTPS rather than just denying them with an SSLRequireSSL statement.

I am greatly in favour of PAM - it was designed to bring authentication into one place and it offers a lot of additional flexibility. I used to use auth_pam but it seems that the module is dead due to Apache 2.2 API changes.

However there is a very nice alternative: authnz_external. authnz_external forms a link between Apache's authentication phase and an external program which is handed the username and password on a pipe. All the program has to do is perform the authentication step and return a code to authnz_external to indicate success or mode of failure. pwauth is one such readily available program but as the program is decoupled from apache's API, it's pretty easy to write your own.

As it stands, pwauth uses pam via the pam service "pwauth" which makes configuration a breeze. What authnz_external does not do is handle group membership but it can be used in conjunction with authz_unixgroup to handle that.

Another problem is that you generally want to force HTTPS/SSL on for authenticated HTTP to protect against password sniffing. I'd like to present my solution which seems flexible and not prone to accidental misconfiguration issues. This is based on a Debian 6 system but it should be applicable to any Apache 2.2 installation and fairly easy to adapt.

Worked example

mkdir /etc/apache2/snippets

Add the following files and contents:

/etc/apache2/snippets/redirect-https
# Rewrite non SSL to SSL via 301 perm redirect
#
RewriteEngine on
#
# Case 1 redirect port 80 to SSL
RewriteCond %{HTTPS} !=on
RewriteCond %{SERVER_PORT} =80
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [R=301]
#
# Case 2 redirect port 8080 to SSL
RewriteCond %{HTTPS} !=on
RewriteCond %{SERVER_PORT} =8080
RewriteRule ^ https://%{SERVER_NAME}:8443%{REQUEST_URI} [R=301]

[Case 2 is optional and merely demonstrates how to handle alternative cases]

/etc/apache2/snippets/authload
# Set up authnz_external to pwauth
#
DefineExternalAuth auth_pam pipe /usr/sbin/pwauth

/etc/apache2/snippets/auth
# Set up auth and force user onto HTTPS
#
# Do the force to HTTPS
        Include /etc/apache2/snippets/redirect-https
#
# Set up auth external (uses pwauth, needs snippets/authload)
#
        AuthType Basic
        AuthBasicProvider external
        AuthExternal auth_pam
        AuthName "DDH at King's College London"
# Check unix (via NSS) groups
        AuthzUnixgroup on
# Here be magic - needs an env var "SSL_ON" set for all HTTPS connections
        Order Deny,Allow
        Deny from all
        Allow from env=!SSL_ON
# More magic - if non SSL, we allow with no auth, but redirect above then fires
# so no page served.
# Next time round, HTTPS connection fails the Allow test so falls back to Auth checks
        Satisfy any
# All you need are the appropriate "Require" directive after the Include of this snippet
# because the Require will vary from vhost and/or location.

/etc/apache2/snippets/enablessl
# Enable SSL and set SSL_ON environment variable 
SSLEngine On
RewriteEngine on
RewriteRule ^ - [E=SSL_ON]

Usage is pretty easy:

In your vhost config:
<virtualhost *:80>
        Include /etc/apache2/sites-available/yoursite.d/globalconfig
</virtualhost>
<virtualhost *:443>
        Include /etc/apache2/snippets/enablessl
        Include /etc/apache2/snippets/authload
        Include /etc/apache2/sites-available/yoursite.d/globalconfig
</virtualhost>

and /etc/apache2/sites-available/yoursite.d/globalconfig
ServerName www.example.com
ServerAdmin webmaster@example.com
DocumentRoot /var/www/www.example.com
ErrorLog /var/log/apache2/www.example.com-error.log
CustomLog /var/log/apache2/www.example.com-access.log
<directory /var/www/www.example.com>   
    # Whatever
</directory>
<location />
    Include /etc/apache2/snippets/auth
    Require group group1 [... group2 etc]
# or
    Require user user1 [... user2 etc]
# and optionally to allow unauthenticated local access:
     Allow from 10.0.1.0/24
</location>

Explanation

enablessl sets an Apache Environment variable SSL_ON for any HTTPS connection (this is not an OS level environment variable). This variable is likely to make it through to CGI or WSGI scripts.

authload sets up authnz_external (auth_pam here is merely a local identifier and can be anything as long as you change all occurrences of it)

auth is the hard part. If a request arrives here with SSL_ON set, then it relies on the Auth settings logical-OR any other Allow statements. If the request arrives here without SSL_ON set then we have a problem: we want the redirect rule to fire, but unfortunately Apache applies the Auth and Allow statements first. To get around this, we use the line: Allow from env=!SSL_ON which bypasses any other Allow and Auth rules and allows the request to proceed. This is counter intuitive as we do not actually serve the usual target of this request. Instead, this block is satisfied:
RewriteCond %{HTTPS} !=on
RewriteCond %{SERVER_PORT} =80
RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [R=301]
The last statement issues as permanent 301 redirect to the browser to come back to the same URI but with HTTPS on.

The <Location > may be applied to one or more sub URLs if desired.

Caveats


Don't forget to enable the relevant modules with a2enmod

It's a pretty stable solution, but you must be careful not to have a Satisfy All statement in the same scope or the association between Auth and Allow will be changes from a logical-OR to a logical-AND which will break the scheme.

Generally you should be careful with any other Auth, Allow or Rewrite rules. Rewrite rules performing other tasks are fine, but should come after the section:
Include /etc/apache2/snippets/enablessl
Include /etc/apache2/snippets/authload

Allow statements should only come after Include /etc/apache2/snippets/auth

Don't forget to set up /etc/pam.d/pwauth - this is too system specific to cover here. You could start by copying one of the other services configs to it unless your OS has set it up for you.
You may want to have a trimmed down config that avoids trying local passwd/shadow auth and only uses your external service.

Be aware that pwauth is hard coded to disallow UIDs below 500. This is a #define in the code so pretty easy to rebuild if required.

I recommend testing pwauth on the command line with some test accounts to verify that it is doing what you think it should.

auth_kerb

This is a rather special case. Some bright spark decided that the KrbStripRealm statement didn't belong and that modification of the supplied "username" (ie stripping the @realm... part) should really be handled by another more general ID mapping module. I agree with the reasoning but until such a general mapping module actually exists (not that I could find) it was a bit off in my opinion to remove it making auth_kerb useless in a great many installations.

If this applies to you, you may find the authnz_external method above useful. What you will lose is the ability to handle GSSAPI authentication from browsers that support it. If that is important to you, people have reported being able to patch the KrbStripRealm option back in.

License

Use what you want. For the pedants amongst you, the above code snippets are licensed under the BSD licence - do what you like :)

Acknowledgements

This is born out of my work with the Department of Digital Humanities, King's College London and credit is due in part to a number of blogs and group comments around the internet.

Friday, 3 June 2011

Fancy free fonts for your website

I guess this might be a well known fact but it wasn't for me...

http://www.google.com/webfonts

gives a quick and almost trivially simple way to jazz up your website with the same set of fonts that blogger.com offers.

Wednesday, 1 June 2011

The future of website design is gadgets. Or is it?

There's almost no need to build custom websites with complex functionality these days, at least if you are a small company or a person with a personal website.

Indeed, many people who once upon a time might have dared venture out with Tripod or Geocities are quite happy with Facebook. Facebook offers the scenario of publishing something about yourself, popping up a few snaps and interacting with others by way of comments.

After discounting Facebook users and also "proper" companies like Amazon or Sainsburys who need a "real website" with complex functionality, there remains a group of people, including me, who want to maintain a couple of websites with real man's HTML and CSS, but also want a bit of dynamic content such as a front page with news items and readers comments or a calendar of interesting events.

Traditionally, we would have had to have coded such things, usually badly, usually ugly, often unfinished. I have a couple of sites like this - my own website, and one for the village I live in.

Don't get me wrong - I am not a web designer. I am a systems programmer. State of the art design for me is using a couple of Gimp artistic plugins on photos and abusing the not-yet-standard CSS colour gradient properties. On a good day, my HTML and CSS might just all pass the W3C validators, because that appeals to my sense of neatness as a programmer. On a really good day, the pages might look OK in everything from IE8 upwards, Firefox, Chrome, Safari and a text browser.

Thus, I find myself experimenting with the IFRAME and OBJECT HTML tags to embed other peoples' hard work into my sites. Case in point - this blog, hosted by Google's Blogger.com. I have two on the village site - one for the front page news items and one for bulletins from the local police. I have a couple of Google calendars too: one for the police again, as it makes sense to put crime reports on a calendar and one for upcoming village events.

Google calendars are a joy to embed: they adapt themselves to whatever space you give them. The work involved is nothing more that using the Google "embed calendar" feature to set the display attributes, then pasting their generated code snippet into my site as an IFRAME or OBJECT. I set the display size and all is well.

It looks like my page has a calendar or "agenda" list, you can click it and it does what it's supposed to without caring one jot that it is part of a larger scheme.

Things aren't quite so easy with the blog though. That adapts its width nicely to suit the space it's given (especially if one hacks the blog template to achieve a fluid resizing model). But there's one thing blogs all have in common: they get longer. And longer. And then suddenly shorter as some magical archive date is passed.

Now we have the crux of the problem: IFRAMES don't dynamically resize very well. Well, sometimes they do, but not if they are contained in a DIV block that controls their placement on a fluid page layout.

So we have three choices, it seems:
  1. Declare the frame to be a "reasonable size". This works nicely, until the contained content overflows it. Then both the frame and the browser are likely to grow scrollbars and it really isn't a natural experience working two scrollbars at once to follow the content;
  2. Make the frame vastly oversized. This is better in some ways, leaving all the content at the mercy of the main browser scrollbar. But it looks silly when the reader gets to the end to find a screen or mores worth of empty space before the page footer.
  3. Pull some serious JavaScript-Fu. This seems to be the way everyone tries to handle the problem. Essentially it boils down to asking the frame how big it is (repeatedly as it may change as the reader clicks within it) and telling the container blocks to match that size with suitable padding.
Option 3 runs into a serious problem when the embedded content is in a different DNS domain to the container page. Allowing unfettered JavaScript shenanigans between two domains is considered a Bad Idea (TM) for a variety of reasons that could empty your bank account or see all your contacts signed up for a healthy dose of extra SPAM. So the designers made it difficult, on purpose, and with good reason.

There are ways around this, involving putting a little JavaScript "server" on the target site (assuming you can) and having it tell the containing page's JavaScript the rendered size of the frame, so that the containing page can adjust itself. Having experimented with this, I can vouch for the fact that it is complicated and fragile, being easily upset by the semantics of the container blocks, such as DIVs on a two column page layout.

Some people on the forums I visited today suggested other solutions, such as server side handling. For example, rather than embed a blog site, simply process the blog's XML feed and generate your own text for direct inclusion in the page.

That would work well for a number of use cases, mostly where you know you only want the reader to see the last few days' worth of entries. However, you lose the richness of the original site, such as the ease of browsing older archived material or leaving interactive comments.

You could implement that yourself, but at that point, you are coming dangerously close to the amount of work had you written your own personal system from scratch.

But, here's a thought. And it's a crazy one: Wouldn't it be nice to have a page embedding mechanism where it is simple to tell it what you want it to do? You probably either want a fixed size (which may be relative to the browser window, other container or even absolute), or you want it to grow, vertically at least, to suit the content. Possibly, just, you may want to put some constraints on how big or how small it is allowed to go.

Call me naive, but it doesn't sound like a tall order to me, at least not for the browser makers nor the W3C standards body.

I certainly hope they see a need and get on with it - because, I believe that gadgets, to coin a Google term, are the way forward. I can see a future where significant sections of websites could be built quickly and simply out of embedded gadgets and content-blocks either written, or hosted by other sites while still maintaining the odd benefit of hosting your own site.

The whole idea brings a number of other issues, such as searchability and coherent Google indexing, but that's for another article.

Addendum

It occurred to me this morning, over coffee that that there may be a sensible compromise solution. Having concluded that a blog site is probably better being left as a blog site without embedding, then what if:
  • We use the XML feed to present a list of recent titles and perhaps the first paragraph which are server side rendered onto our main website.
  • For each article, we add in a link such as "Read full article".
  • Clicking the link takes the reader to a new browser tab or page which is nothing but the blog site - no embedding tricks.
This might very well be a good compromise solution. It has the advantage of keeping our main website alive with changing content which is good for Google search rankings and also for any Google custom search engines embedded within the website (Google does not, to my knowledge, introspect embedded object/iframe content when spidering a site).