To recurse or not to recurse? – Intelligent Agents for Blogs

At the moment, I’m working on a smaller version of my initial idea that should be far more manageable and less completely daunting. Anyway, the spider part of the project is still important and I’ve been thinking about whether it should be recursive or not.

If it was programmed recursively, the spider would look at a webpage and all the urls in it and process the first url…calling itself to do so and then looking at the links on that page and then calling itself on the first url on that page…and so on and on and on and then my computer would explode because there would be so many calls on the stack just waiting for something to finish searching the internet, which would probably send it into a loop anyway and all the memory would get used up and I would then be a sad bunny.

So, recursion isn’t the way, as great as it is – it’s only good if I have maybe a handful of things to look at. I need to plan for a heck load of urls in a page.

Therefore, a method that would not use recursion is needed. This is pretty much just going through the page and keeping a list of the urls I have to visit. I’ll probably stick some processing in there to check whether something is a blog or not from the address and then maybe some stuff to check if it’s a blog from the actual page. Any non-blog pages can be discarded.

Another thing that occurred to me was that I could implement some kind of nice shiny graphical display of the related blogs by using a library that I used for my Team Java project. Should find out what the stance is on using libraries and stuff and the extent of library using allowed.

Leave a Reply Cancel reply