Proxy Server and graphing

Posted by Rachel on January 30th, 2006 — Posted in Notes, Project

For outgoing HTTP connections you must use the proxy server webcache port 3128.

See instructions

Otherwise program may not be able to run within SoCS!

Also, look at JGraph.

Figure out way to use previous version of Java DK on departmental machines. Departmental Machines all use Java 1.5blahblahblah. Will have to find laptop-carrying minion or just carry the infernal machine myself. :(

Manageability - Open Source Web Crawlers Written in Java

Posted by Rachel on January 23rd, 2006 — Posted in Bibliography, Notes, Project

Manageability - Open Source Web Crawlers Written in Java

A list of various open source spiders written in Java - something to look at.

To recurse or not to recurse?

Posted by Rachel on January 19th, 2006 — Posted in Notes, Project

At the moment, I’m working on a smaller version of my initial idea that should be far more manageable and less completely daunting. Anyway, the spider part of the project is still important and I’ve been thinking about whether it should be recursive or not.

If it was programmed recursively, the spider would look at a webpage and all the urls in it and process the first url…calling itself to do so and then looking at the links on that page and then calling itself on the first url on that page…and so on and on and on and then my computer would explode because there would be so many calls on the stack just waiting for something to finish searching the internet, which would probably send it into a loop anyway and all the memory would get used up and I would then be a sad bunny.

So, recursion isn’t the way, as great as it is - it’s only good if I have maybe a handful of things to look at. I need to plan for a heck load of urls in a page.

Therefore, a method that would not use recursion is needed. This is pretty much just going through the page and keeping a list of the urls I have to visit. I’ll probably stick some processing in there to check whether something is a blog or not from the address and then maybe some stuff to check if it’s a blog from the actual page. Any non-blog pages can be discarded.

Another thing that occurred to me was that I could implement some kind of nice shiny graphical display of the related blogs by using a library that I used for my Team Java project. Should find out what the stance is on using libraries and stuff and the extent of library using allowed.

Search methods and things

Posted by Rachel on January 16th, 2006 — Posted in Miscellaneous, Notes, Project

Breadth-first search would result in computers exploding due to the size of the database before I could get anywhere with results.

Depth-first search would work better, but may still need to impose a limit on the number of links down it goes.

In other news, it occured to me that podcasts have a very similar format to blogs (in fact, they are blogs, just with the addition of sound files in their feed), so I may need to find some way of dealing with this.

Also, Talkr is a services that turns text-based blogs into podcasts. Looks fairly interesting, but I doubt I’ll ever use it. I read all the blogs I subscribe to quicker than I could listen to them and I already have a hefty menu of podcasts that I listen to regularly.

Intelligent Agents pgs 2 + 3

Posted by Rachel on July 7th, 2005 — Posted in Notes, Project

  • 3 major phases of development in AI research.
    1. - formal problems (structured with well-defined problem boundaries) - emphasis on creating general “thinking machines” - sophisticated reasoning + search techniques
    2. - recognition that most sucessful AI projects had v. narrow problem domains + encoded specific problem knowledge. - specific domain knowledge added to more general reasoning systems led to expert systems. - rule-based expert systems (knowledge representations, knowledge engineering, advanced reasoning techniques). - computer workstations specifically developed for Lisp, Prolog + Smalltalk apps. -> featured powerful intergrated development environments.
    3. -solving: machine vision+ speech, natural language understanding + translation, common sense reasoning, robot control. - connectionism (neural networks for data mining, modelling + adaptive control) - genetic algorithms - alternative logic systems (fuzzy logic) - agents that move through network

from here