How To Convert Wikis Into Kindle Ebooks

This tutorial is intended to explain how you can convert wikis into ebooks readable by the Amazon Kindle. The same approach can potentially be used for other web sites as well, but this is beyond the scope of this tutorial. Some knowledge of HTML and Perl is useful, but not required.

In addition to the software below, you also need the MobiPocket Creator Publisher Edition to generate the ebook from the HTML files. As far as I know, this program is unfortunately only available for Windows.

Downloading the Wiki

First of all, you need to create a mirror of the wiki you want to convert on your own hard drive. I recommend using HTTrack, which is available in versions for most operating systems. Start it, type in the URL of the website you want to convert, and wait until it has downloaded in its entirety (in case of truly large web sites, this may even take several days - and if you attempt to do this with the entire Wikipedia, you are a braver man than I am…). Once you are finished, you should have a browseable version of the web site on your hard disk.

Writing the Perl script

In theory, you could build your ebook directly from the HTML files you just downloaded, as the MobiPocket Creator uses them as input files. But in practice, it would be inconvenient, since most Wikis and other web sites have extended navigation bars and page borders which would make navigation on the Kindle inconvenient. Take a look at the Wikipedia, for instance - do you really want to click through all the navigation elements in the left sidebar before you reach the main entry?

Thus, you need to convert the HTML files in their current form into something more useful. Fortunately, this process can be automated through a variety of ways. I used a script/programming language named Perl which was explicitly created for text file manipulation, and is thus ideal for this task. In order to use it, you must first install it on your computer. If you use Linux, the odds are that it is already installed. If you are using Windows, I recommend downloading and installing Cygwin, which emulates a Linux-like environment for Windows and also has Perl included. You can download it from here.

The next step is to write a Perl script for converting the HTML files. This is the tricky part, as each wiki is different. I have included a sample Perl script here which I have used for converting the TV Tropes Wiki. I will walk you through the script so that you can understand how it works without learning Perl and adjust the parts you need to alter for the wiki you want to convert.

You will likely make a few mistakes at first, so you should put the script and a sample page from the Wiki into a separate folder so that you can experiment with it a few times until you get it right. You should also create a subfolder called "save" in the same folder - that's where the converted HTML files will end up. You can test the script by opening Cygwin, going to this folder, and typing the following into the command line interface:

perl <<name of conversion script>>.pl <<name of html file to be converted>>.html

For example, I tested this with:

perl ActionAdventureTropes.html

and the script took the file "ActionAdventureTropes.html" from the folder, converted it, and put it into the "save" subfolder.

