Read All About It!  Online Security and Paid Newspaper Content

by Yan Tan Tethera

These days, we're all used to being able to read newspapers online for free.

Apart from a select few, like The Wall Street Journal, which limit access for the most interesting articles to paid-up subscribers, most newspapers give their content away for free online.  However, by all indications, that's about to change.

For a few years after 2001, newspapers bizarrely blamed a "post-9/11 advertising slump" for their ever-decreasing sales.

However, it's become more obvious that the slow death of the newspaper is down to the scale of choice in the news marketplace.  The public can plug straight into the news they want to hear, whether it's geek updates from Slashdot, or a daily dose of hard-hitting, measured and factual reporting from Fox News.

Online news services, like BBC News, and broadcast news networks, like (((CNN))), (((MSNBC))), and Fox, are able to offer up-to-the-minute reports, making their once-a-day dead tree counterparts look woefully slow.

The fact that the quality of "instant" reporting is often severely lacking (reporter to (((Uri Geller))): "And we're just hearing that your friend Michael Jackson has now been declared dead.  How does that make you feel?") is outweighed by the instantaneous nature of the excitement of seeing news as it happens, and being given the chance to select the news you want to see rather than having to wade through pages of stuff you don't care about.

So it is that more and more newspapers are considering turning to paid, online content to make up the shortfall.  Rupert Murdoch, whose monolithic News Corporation owns many major national papers across the world - including pioneers of paid content like The Wall Street Journal - ominously announced recently that "the current days of the Internet will soon be over."  He was referring to the fact that his stable of newspapers intends to switch to charging for online content, potentially even within the next twelve months.  (((The New York Times))), (((Time))), and others are also considering moving to a paid model.

How much of a success charging for news content online will be is a big question for the newspaper industry, especially given that other online news services will remain free.  Most notably of these is the BBC, which, as a state-funded corporation, is prohibited from charging for content within the U.K., and would probably resist charging residents of other countries in order to maintain its global influence.  The other major question for newspapers who plan to charge for their articles is how much focus they place on securing their content, and keeping the non-subscribers out.

In order to show how important security is - or ought to be - to paid content providers, I'm going to concentrate on one example of a website which is already charging for access.

Naturally, I'm going to precede it with a disclaimer: the following is for education only; to my understanding, the points made in this story don't break any rules, but do highlight the reasons why anyone providing paid-for content should implement at least basic security measures.  Personally, I respect websites which charge appropriate prices for exclusive content, and pay for what I use, and so should you.

As one of the oldest newspapers in the world, the history of The (London) Times stretches back to January 1, 1785.

The Times Archive, available at archive.timesonline.co.uk, includes every page of every issue between 1785 and its 200th anniversary in 1985.

For its first few weeks online, access to the archive was completely free, as a "taster" before the site switched to a subscription basis.  At the time, I was working on my master's thesis, which concerned aspects of post-war British political history, so this free access deal became very useful to my research in gaining contemporary views and reports.

Finding the articles I wanted was easy, with a full-text search returning loads of results; the service didn't even require users to sign up.

The only problem was that, once I'd found an article I was interested in, I was restricted to viewing only a small part of the page at a time.  The developers had implemented an unusual, JavaScript-based "viewer" within the results page, which let you read the article you were looking for, and pan around the rest of the page if you felt like it.

Of course, there was no obvious way to save a copy of the whole article for future reference, let alone the whole page.  The only way that I could see was to repeatedly use the "PrintScreen" key to capture bits of the article, and then mess around in Photoshop to join up the pieces.

Since I was planning to come back to the articles throughout my research, I resigned myself to this and started chopping and stitching screenshots of the articles.  Around three articles in, I realized copying and pasting small bits of pages in this way would take more time than it was worth, particularly when I was looking at researching 40-50 articles.

At that point, my geek instincts kicked in.

There had to be a better way.  Firing up the ever useful Live HTTP Headers plugin in Firefox, I loaded an article and watched what the JavaScript viewer was loading.

I was able to determine that the viewer was loading a small piece of the page, with just the selected article visible.  But, if I clicked on the "Full Page" button, it downloaded a plain JPEG of the whole newspaper page in one go!

All it took was a quick look at Live HTTP Headers, and I could get the direct URL of the whole page JPEG.  This sped up my research considerably, meaning I could just download full pages, cut out the articles I needed, and refer back to them later.  I saved the JPEGs of the full pages I wanted, and over the next few days started cutting out the articles I was interested in.  So far, so good.

Then, the inevitable happened: without warning, the free trial period ended, and the Archive was closed to ordinary visitors.  With my research ongoing, I still needed to access more articles.

Of course, I could go down to the city's central library to browse back issues on microfilm, but knowing how much time this would take, I decided to try and find a way of continuing to view the articles online.

The first problem was finding a way of searching the The Times Archive database without logging in.

To my surprise, this was pretty easy to solve: you can search without logging in.  The front page of the The Times Archive lets you search the entire database from 1785-1985 and returns its results, complete with headline, date of publication, and a thumbnail showing the position of the article on the page.  This would prove really useful as a search tool, I thought, even if I wasn't successful and had to go and physically browse back issues at the city library.

The next problem - and one that posed a bigger challenge - was getting to the full page JPEGs.

Being a responsible computer user, I'd cleared my browser history since I last visited the archive, so I didn't have a record of the URLs I'd visited before.  A bit of detective work followed.

Returning to the The Times Archive homepage, I found that selected "articles of the day" were still available to view for free.

It was the same JavaScript viewer, complete with a classic transparent.gif overlay to stop the vaguely curious from getting at the content through right-clicking.

Applying AdBlock to remove transparent.gif and refreshing the page, I found I still couldn't view the location of the image, so it was back to Live HTTP Headers.

Here, I found the "Full Page" function still worked, but now it returned a far smaller, unreadable JPEG, forcing you to zoom in to a selection to read it.  The URL of these images was (and still is) in the format: http://archive.timesonline.co.uk/archiveimg/free/1969/09/08/06/0FFO-1969-SEP08-006-12.jpg

Now, something struck me about that URL; something which indicated that future access to the archive might not be so difficult: the word free.

"Surely this isn't going to work," I thought, as I changed the word free to paid and tried again.

Guess what?  It did work.

The same image loaded up again.  To make sure it wasn't just an accident, I changed the word paid to a few other things, but got only error messages.

Sure enough, the only difference between free and paid content was the word free or paid in the URL!

I was still getting the small version of the page, though.  Then it struck me - I still had the saved full pages from the trial period.

I went back to them, and found the answer in the filenames: changing the suffix -12.jpg to -50.jpg would load the full-size, high-resolution JPEG.

Even if I hadn't had the saved pages on hand, I suspect this information could have been easily found by inspecting the JavaScript viewer's code, since it has to load the full-size full page when viewing free articles.

One last hurdle had to be overcome, and that was knowing what URL to go to for the exact page I wanted.

Because the unpaid search results returned only the date of the article, and not the page number, initially I found myself looking through every page of the newspaper until I found the one where the article was located.

Needless to say, this was time and bandwidth-consuming.  Fortunately, hovering over the links in the public search results reveals the page number.

For example, the search result for the article "Computers: Machines that learn from mistakes," published on August 10, 1974, links to:

javascript:invokeArticleViewer('ARCHIVE-The_Times-1974-08-10-14','ARCHIVE-The_Times-1974-08-10-14-006','')

This shows that the article is on page 14 and that it's the sixth (006) article on the page.

From that information, anyone with half a clue can put together a direct URL to a JPEG of the full page article.

Surprisingly, for a site which also charges for content, it really is that simple.

Note that at no point in the process was any actual payment, access to paid areas, or even basic user registration required to find this information - it's all there, on the unpaid, public website.

So what lessons can be learned from this setup?

Certainly, leaving direct, open access to the content you intend to charge for is a serious flaw, but simply using the paid content system during the free trial period was arguably even more irresponsible and lazy on the part of the developers.

The way the system loads pages is so obvious it can be guessed in a few steps by anyone with a moderate familiarity with how a browser works: loading full, high-resolution pages directly, and changing their URL from free to paid depending on who's viewing them could probably be figured out by a high school computer science student.

The measures taken by the The Times Archive to hide their content from the non-paying public aren't even a good example of security through obscurity, in that they aren't obscure.

A short-term solution could be to only make the full search available to logged-in, paid-up subscribers, or not to reveal the full date and page of the article within the public search.

Replacing the word paid with something that can't be easily guessed, while still technically security through obscurity, would also be a short-term solution.  In the long term, the only real solution - as obvious as it sounds - would be to make sure the full pages are only visible to those who have logged in.

Having completed my thesis, I haven't needed to further access the archive.

I should stress that, had the archive pages not been directly, easily and publicly accessible (as they remain), I would certainly have paid for the content.

Paid archives like this are goldmines to academics, researchers, and people who simply have a keen interest in history.  Like goldmines, though - and here comes the inevitable terrible analogy - they ought to be properly protected from public access.

Since launching its revamped website, The Times has become one of the more forward-looking newspapers when it comes to maintaining its online presence, embracing online chat, Twitter, and, yes, a comprehensive online archive of its historical back issues.

All that is to its credit.

If it decides to charge for the content which is currently free, then that's a business decision for News Corporation; I'm not going to second-guess corporations who have built their billions on running newspapers.  In my opinion, there will always be a place online for the more considered style of reporting found in quality newspapers like the The Times, alongside the immediate and sometimes flawed reporting of rolling broadcast news, and the new angles offered by blogs and micro-blogging.

Some people might even be prepared to pay for access to this kind of content.  The The Times Archive, however, is a perfect example of why those newspapers which do seek to reverse their business fortunes by charging for their content should take the security of that content seriously.

Return to $2600 Index