January 2006 – Intelligent Agents for Blogs

Otherwise program may not be able to run within SoCS!

Also, look at JGraph.

~~Figure out way to use previous version of Java DK on departmental machines.~~ Departmental Machines all use Java 1.5blahblahblah. Will have to find laptop-carrying minion or just carry the infernal machine myself. 🙁

Rollback

After hours and hours hacking away at the spider code to try and make it work under the latest version of Java, it looks like it’ll be easier just to revert to the previous version of Java.

Which is slightly annoying, but at least it should all compile correctly without arguing about enum being a keyword.

Manageability – Open Source Web Crawlers Written in Java

A list of various open source spiders written in Java – something to look at.

To recurse or not to recurse?

At the moment, I’m working on a smaller version of my initial idea that should be far more manageable and less completely daunting. Anyway, the spider part of the project is still important and I’ve been thinking about whether it should be recursive or not.

If it was programmed recursively, the spider would look at a webpage and all the urls in it and process the first url…calling itself to do so and then looking at the links on that page and then calling itself on the first url on that page…and so on and on and on and then my computer would explode because there would be so many calls on the stack just waiting for something to finish searching the internet, which would probably send it into a loop anyway and all the memory would get used up and I would then be a sad bunny.

So, recursion isn’t the way, as great as it is – it’s only good if I have maybe a handful of things to look at. I need to plan for a heck load of urls in a page.

Therefore, a method that would not use recursion is needed. This is pretty much just going through the page and keeping a list of the urls I have to visit. I’ll probably stick some processing in there to check whether something is a blog or not from the address and then maybe some stuff to check if it’s a blog from the actual page. Any non-blog pages can be discarded.

Another thing that occurred to me was that I could implement some kind of nice shiny graphical display of the related blogs by using a library that I used for my Team Java project. Should find out what the stance is on using libraries and stuff and the extent of library using allowed.

More hate list

An unprecedented 3 entries in one day!

This entry isn’t project related though, just an addition to the hate list.

Why isn’t there a spellcheck option in WordPress?

It’s one of those things that is really handy and useful. Plus, it makes entries nicer to read.

Perhaps I can find a plugin for it.

WordPress versions

Apparently WordPress version 2.0 was just released, but this is not what baffles me. I’ve just set up another blog for my HCI group using the same easy “One-Click Install” method made available to me from my lovely hosting people.

You would think that the interface would be exactly the same, even though the two blogs look different on the outside. Somehow this is not the case. The two blogs appear to use the same version of WordPress, but this blog as a far less confusing “write post” interface (an interface that I think has slightly confused my team-mates). There are none of these expanding menus on the right to chose from users, categories and breakfast cereals to author your post with (the cereal thing wasn’t really a choice).

For at least the team-mate I live with, posting was a bit confusing. Fair enough, you pick “Write” from the menu and then “Write Post” (as a non-admin user, he doesn’t have the myriad of other options I am faced with), but then what? The rest of the options weren’t visible. Hopefully with time and exploration it’ll be easier. At least with my incredible admin powers, I can at least edit the entries into the right categories. On the other hand, I had to show one team-mate where to log in, and he had to explain over msn messanger to my other team-mate what to do. Putting the log-in options at the bottom of default templates wasn’t an inspired idea, and is counter-intuitive for people not used to blogging.

edit: It appears on checking the other blog that it’s a later version than this one. But why make it more difficult to navigate?

Search methods and things

Breadth-first search would result in computers exploding due to the size of the database before I could get anywhere with results.

Depth-first search would work better, but may still need to impose a limit on the number of links down it goes.

In other news, it occured to me that podcasts have a very similar format to blogs (in fact, they are blogs, just with the addition of sound files in their feed), so I may need to find some way of dealing with this.

Also, Talkr is a services that turns text-based blogs into podcasts. Looks fairly interesting, but I doubt I’ll ever use it. I read all the blogs I subscribe to quicker than I could listen to them and I already have a hefty menu of podcasts that I listen to regularly.

Month: January 2006

Proxy Server and graphing