Latest Publications

Nearby train stations now added to database

Each town and each venue now has nearby train stations listed. Sometime in the future I hope to expand this functionality so that a popup directions guide to the station can be displayed. I’ll see how the site’s users appreciate the current information first though.

ATMs and nearby places now up and running

As suggested in my blog just before new years I was keen to get lists of nearby places published for various places in the directory so people could look for supermarkets after going to the cinema etc. This is now up and running for about half the directory. Similarly I’m now adding nearby ATMs cash machines for venues as we speak. This will be rolled out in phases for parts of the directory as well.

Pre-2011 update

Just a quick update. Working on 2 new ideas.

1) Working on extending the website to include pages on amenities close to other amenities. So all banks in the vicinity of a certain supermarket listed in the directory, or all restaurants next to a certain pub. Im starting to get precise locations for all the amenities in the database in a longitude, latitude format and with this I can make this task possible. I thought it would be very useful for site visitors. So… you’ve spent all day shopping in the Boxing Day and New Year sales and you want a restaurant next to a Sainsbury’s store. Or… you’ve had a big night on the town and want to find a takeaway near the pub you’re drinking your last drink in. Etc etc. Anyway. The code is working well and I’m going to upload the new functionality in the next few days.

2) Ever so slightly related I guess. I’m getting data on ATM locations. I hope this too will be a very useful addition to the database.

Happy New Year

Quick update

Phase 2 has been a big success. The easier site navigation is paying dividends in the form of more search engine referred visits and I think the general user experience has been much improved. I added banks into the mix the other day which had been something on my to do list for a few weeks. Adding them was quite a time intensive task with a special script written to get all the proper details into the database. I’m really now prepared to let the site work in its own automatic way now, having been satisfied that I’ve created enough of a database base level to get the site ticking over by itself.

I’m currently tinkering with an international spin-off version of the site to see whether I can scale things up a bit. More news soon!

Search function complete

Search function now complete! Works well. Nice and simple. I might look to add further “wider search” functionality in the future. For example if you search for “akbar’s” – a great Indian restaurant in Leeds the search won’t return anything as the correct result does not have an apostrophe. Small stuff but room for improvement. Far down my list of things yet to do, but nevertheless, on the list.

Phase 2 going well

Over the course of the last few days I’ve managed to knock the URL duplication issue on its head. I didn’t actually go with the numbered amenity solution (e.g. tesco(2)-birmingham), I actually created a script that made sure the street address for any amenity within a larger area (the town or city) was unique for that amenity name. So if there were more than one Tesco in Towcester then I’d have to make sure that each Tesco had a unique street address. If not – we suspend one entry in the database and mark it as a real duplicate. (There are some real duplicate amenities in the database and I’m constantly trying to eliminate them – I feel more on top of this than ever following some new preening code I wrote over the weekend which scans the database every hour or so for new amenities that might have already been added in the past). So eliminating the duplicate leaves us with a Tesco store with a unique street address with a possibly duplicated name in a possibly duplicated area name (i.e. more than one Tesco). So the street address is our unique identifier here – but how to get this unique information into the URL? Simple – bracket it and add it into the name string. At first, I thought this might look clunky. But it think it looks good and also useful beneficial to the user searching for the amenity.

Lets take an example –

In Edinburgh there are 3 Odeon Cinemas in the database. If you’re searching for one of these Odeon cinemas you’re likely living in Edinburgh. Agreed? Hope so… Anyway. You know that in Edinburgh there are multiple Odeons so you’re more likely to differentiate your search terms by adding the street address e.g. Odeon Cinema West End Place Edinburgh The URL becomes http://www.bigreddirectory.com/odeon_cinema_(120_west_end_place)-edinburgh This clearly displays the street address differentation well.

So URL duplication stopped… What next?

As mentioned I took out the mootools search box javascript function. It was great, really good looking and worked a treat but I didn’t have time to trim the JS code and as a result loading 150kb of mootools for each page was not economical. So I need a new search function. The mootools script was connected to a back end server side PHP JSON messaging script. I think I’m just going to make this the receiving page for a simple vanilla html form post input box on each page. Simple- straightforward.

Oh yes, also looking at a PHP dynamic XML sitemap for the search engines. I was keen to scour the internet and download some code that someone else had made freely available but can’t see too much. I guess that makes sense. If its dynamic then its likely connected to a database containing page data (CMS etc) so the code will likely need to be specific to each website. I have no issue with writing my own one – its just something that’s probably realistically going to take an hour or two and I need to prioritise at the moment.

The next largish mini-project I want to work on is adding more retail stores and their opening times to the database. The post office data admission has worked out really well – I think I now need to apply exactly the same process to high street stores like WH Smith and Borders etc. I’m also thinking deeply about adding another amenity category to the site. I always said I wanted 6 max and I’m at 7 already so I think 2 need to be merged! Shops and supermarkets seem the best fit but I guess a strong case could be made for takeaways and restaurants too. Or pubs and restaurants. Well whatever. I’m thinking about adding banks. Banks are a nightmare for all of us. They’re rarely open on weekends, open late on weekdays and usually they close before 5. If the rest of the British public are like me you take 30 minute out of your working day to leave the office and go to the nearest bank to wait in a queue for 15 minutes to bank a cheque for 35 quid that someone sent you for an old mobile phone you sold on eBay. Bottom line – bank opening times are important. So are their addresses (and map of their address) and phone number. So I might give it a go. The last thing I want is every location page to be full of 500 links to every business in the neighbourhood so I’m keen to keen outgoing page links relevant to the location and within say a 125 (just made that up) limit but I do think banks are the very last category. As mentioned, there’s definitely room for category merging with what we already have.

Good night

Post weekend update

Well bulk of phase 2 update is done. It actually went pretty smoothly I think. The things that I thought would take hours took minutes but unfortunately the things that I thought would take minutes tooks hours. Anyway, what is now in existence looks and seems to work OK. Main index page is now drawing from a 1 hour old cached copy to reduce load on server (index page is heavy because it displays loads of database stats). Mod_rewrite of URLS went relatively smoothly. Ive just sorted out a couple of bugs (some weird escape characters in URLs and a trailing space problem with some names) but its working well, and the 301 redirect from the old URLs is working well according to the logs too. Unfortunately, and this never entered my mind as I was planning the directory and URL structure, but I’m now left with a duplication problem with some venues. Consider 2 Tesco supermarkets that are both in Dunfermline City Centre. Fairly simple and probably fairly common situation in many towns across the British Isles. Well my directory structure won’t distinguish between them at the moment and will both assign them an URL of bigreddirectory.com/tesco-dunfermline. Clearly this needs to be sorted. The site will distinguish between the two stores if it detects that the stores are in different AREAS of Dunfermline. I don’t know any areas in Dunfermline but say for London, Chelsea and Holborn for example. I think the only way forward is to give the stores a number. So bigreddirectory.com/tesco_1-dunfermline. I don’t think this is too bad. I think that’s the neatest way to pump out the URL and probably the easiest to remember as well for anyone trying to memorise URLs on the site. I don’t think tesco-127_aberdeen_street-dunfermline pulls it off.

What the site does do is take the full name of the company at the address so in reality /tesco-dunfermline is actually tesco_stores_ltd-dunfermline and most tesco stores actually seem to take the type of supermarket e.g. supermarket/metro/tesco extra in the name so I think the amount of duplication is actually probably rather low. That’s the next largish task on the list of things to do.

Also need to write a new search script for the top right hand corner search bar. I did away with the mootools javascript search script. It was excellent and looked great but I can’t justify loading 150kb of javascript each page when I’m trying to serve pages to the end user as fast I possibly can. Maybe there was a lot of redundant code in there that could have been cut but I don’t have the time to go through it so I binned the whole thing.

Phase 2 plan afterthoughts

Mmm.

So custom error pages made me think. (I just put them online by the way). My site doesnt have any included headers or footers. I don’t really envisage the base pages expanding much from here, (In fact by the end of phase 2 their number might well have contracted) but if I want to make sitewide cosmetic and possibly coding changes its obviously going to be easier to use embedded headers and footers so that it only needs doing once. Anyway – another one to add to the list.

Oh, had a quick think about site structure today. Preliminary decision is that locations will follow the rules bigreddirectory.com/Hertfordshire/Hertford and venue names are going to be hosted entirely in the root directory so bigreddirectory.com/Tesco-Oldham Why did I come up with this? Well, I think it makes logical sense but is also easier to remember if someone wanted to mentally take down the URL and return sometime. Multiple names in venues are going to have to follow this format I think: bigreddirectory.com/The_Red_Lion-St._Albans with the hyphen splitting the name and town name and the underscores effectively acting as spaces between the words in the name and town name. Ill give it some more thought before I start recoding the website tomorrow but I think it makes good sense.

Phase 1 leads to phase 2

Around Easter last year I wrote down about 20 different ideas I’d had for a while for new websites and also addons/upgrades to the ones I already manage. My first project was scheduled to take around 2 weeks. It ended up taking 4 months. This was my long overdue upgrade to my Phishing Scams website – MillerSmiles.co.uk. This was really the first time in a long while that I’d had a good stab at working with server side advanced (and ultimately automatic) data manipulation. I learned a heck of a lot doing it, and probably most importantly I realised the power of server side scripting, the power of combining “real time” processing┬á that happens when webpage content construction is computed by the likes of PHP or ASP with latent behind the scenes data processing – using cron jobs for example. Anyway – cutting a long story short, I saw the potential for a site that could take advantage of this data processing on a large scale. MillerSmiles told me it was possible and so with that I moved onto project 2. Somewhere down the list of 20 potential projects was “a site containing useful UK placename data such as postcodes, dialling codes and local pubs and shops” And so the idea of Big Red Directory was born.

As usual with my website development I kind of just jumped in the deep end. Phase 1 was to get something live and get it live soon. I started work on my very first web crawling bot at the end of the summer. This would be the information gatherer taking bits of the web back to the back end of the website where it could be processed, parsed and pushed to a directory which would be useful for the site’s visitors. It took a while. From spending my nights on holiday in Vancouver last November programming in the beautiful Bacchus bar at the Vancouver Wedgwood to spending long nights over the Christmas period entranced in PHP wonderland I managed to get something up and running.

Its really taken me by surprise how useful this data is. Having told friends and family about the site its already found its way onto bookmark lists. Over the last 2 months I’ve upgraded the web crawler to start searching for opening times and and daily opening hours for each of the venues listed. This is still work in progress and is ultimately going to form a large part of my work over the next couple of months, making sure the data is watertight. At the moment its a little shaky but ultimately still useful. Some opening times that are being drawn in are just plain wrong. For example the opening times for R Gilbertson antiques in Hastings is plain out of whack. I won’t go into detail about how the bot works but it essentially scours the web looking for keywords and venue names and once it hits a website it deems an authority on the placename (which is very hopefully the domain name of the pub shop whatever) it looks for opening time data and sends it back to the site. I really need to get some kind of logging for the bot up and running soon, at the moment I liken the bot to a firecracker out of control. Well to an extent.

What I am really pleased with recently is an bolt-on I made for the bot last week. The extra bit of code enabled the bot to successfully find all the branches and opening times of UK post offices. Now this really does work well. Couple of examples

  • Ore, Hastings post office opening hours
  • Pett, Hastings post office opening hours

Look at the opening times for Pett post office. They even have lunchtime closing time!! This is something now ready to take to the next level. I’m pretty pleased with the results for various supermarkets and big restaurant chains. e.g. Aintree Mcdonalds. I think next stage is start looking more at retail chains. I had a go at Boots and WHSmith. They seem to have worked ok. I think next step is things like New Look, Homebase, B&Q etc.

Anyway, so thats really the summary of phase 1. Phase 2 is hopefully going to be bit smoother. Phase 2 is acknowledging that phase 1 involved getting the site up quickly to see whether there was any merit in aggressively pushing the site forward on a longer term basis. Well, traffic stats suggest so. People obviously find the website useful and informative and clearly offers information that can’t be readily seen elsewhere. I’m hoping that phase 2 development is something like a 10 degrees trajectory higher as opposed to the 60 degrees for phase 1.

There are quite a few things I want to sort out, a lot of them actually onpage and not really big back end coding issues. Im going to start moving forward with these this weekend. Some things which readily come to mind:

  • Big one is the structure of Big Red Directory. At the moment at its deepest (and most pages are in this category) the directory is 5 directories deep. If the page is a pub or other venue it follows /rp/Pub/Place_Name/Pub_Name/Unique_ID. By the end of phase 2 I will have got rid of /rp and /Unique_ID. These are lazy programming pointers. /rp lets me know its a venue as opposed to a placename and the Unique_ID acts as a checksum, well, a unique ID, exactly as it says. We don’t need either. Similarly I want each directory to have a base. And the base shouldn’t just be stuffed with 4000 links. They should be sorted into different pages alphabetically into groups of say 200 links. So, at the moment /rp/Pub/Place_Name gives a 404. This is bad. It should give a list of pubs for Place_Name. The real problem here is how to handle and distribute the directory structure over a mod_rewrite using the htaccess. What I currently envisage is a general handler page for all mod_rewrite requests. So… All the directory names are turned into an array and the handler will go through the array and determine if its a venue or placename or otherwise. I don’t really think this should be too hard. Although using a page handler might cause some ill side effects which brings me onto point 2
  • Page load time. Its not massive at the moment, but its slower than I’d like. Most pages are taking about 1.5/2 secs to load. I think this is probably too slow. And my worry is that using a page handler to handle mod_rewrite page redistributions might add to the issue. I know what the problem is in general here. There’s a lot of database usage on every page of the website. In other words, theres a lot of onpage processing as opposed to backend processing. Now I could make more things hardcoded but this is going to destroy the much needed flexbility of the website. The solution is going to have to be more efficient code. And thats definitely possible. Phase 1 was always just getting the site online and seeing whether the internet community had use for it. So there are portions of code that can be streamlined / cut altogether.
  • Simpler one – make a proper 404 error page. Needs a link back to homepage. We’ve got a big directory here, there’s plenty of room for mistyped URLs.
  • Sitemap. Need to create an XML sitemap. Shouldn’t be hard at all. In fact I think there are web based tools that can spew one out for you in a few seconds. Obviously need to get point 1 done first – sorting out the directory structure.
  • More efficient page interlinking. I like what I’ve done at the moment. For every pub I list other pub in the area. I’m more on about anchor text here. I think instead of listing the name and full address of the place inside the link a href name tag we can possibly just do the name with a high level location (i.e. not street number and name) and then just have full address after that. I think that should make things a bit easier on the eye. Obviously very simple to implement across the site.
  • Start looking at how we can collate reviews and pictures of places almost like google images. Now this is not really a concrete part of the plan. The last thing I want to do is just copy reviews and just spam my own directory of useful information. I think the image thing might be useful though. Anyway, I need to think about this. I don’t know whether it’s worth doing at the moment or not.
  • More generally, and probably the biggest point – keep improving the robot. I have no specific aims here. It does a good job and has shown its worth with the Post Office data. I’ll just keep building it.

Right, getting late now. Plenty to think about.

Good night.

http://www.bigreddirectory.com/rp/Shop/Hastings/Ore_post_office/222076Ore post office

My blog is born

So this is a first for me. Having worked with the internet for 15 years now I have never once written down my thoughts on web development in a blog. I did actually run and write a news service on a financial website I ran a few years ago and actually come to think of it, many years ago I might have created an attempt at a personal homepage on the now sadly defunct Geocities, but anyway, ramble I do… here is my blog for the Opening Times Big Red Directory website I’m currently developing. To put down thoughts, suggestions and musings into words I find helps construct longer term plans and goals. That’s really the aim here – to postulate my thoughts and realise what needs doing to the website at the end of the day.