Following my MSc, I assisted a psychology PhD student with the technical aspects to her thesis. Her work was to analyse means to improve the internet for people with disabilities. As high level summary, add an extra level of search filtering to Google results (unfortunately before the Google API was released) so the many thousands of pages would be practically reduced to a short list people could actually use. I was employed before I could complete the work on this, although source is available.
This project was written to be seen by others rather than to explore ideas so I was writing structural documentation from the start.

Goals of the CAIN project

  • Improve the internet as a research tool, by lowering the barriers to entry;
  • Setup a service to implement the above point;
  • Explore ideas relating to perception and individualised workstations;
  • Ensure the service is adaptive, so the user and the current conditions are supported (lighting specifically);
  • Allow a post-graduate student to get a PhD;

Goals of CAIN webportal

  • Create a means to post-filter search engine results, to only report “readable” ones;
  • Create a web interface through which users may have more targeted searches;
  • Create a user profiling system to allow users to define what is “readable” for them;
  • Track users usage, to allow repeat viewing and to allow users to correct their profiles;

Goals of CAINscript

  1. Be able to generate scores on the webpages with respect to how they would be perceived;
  2. Be able to download and parse webpages to support the previous statement;
  3. To support all common formats and protocols, ensuring that what is written may be evaluated;
  4. Be able to store the results of the computation;
  5. This specific project has nothing to do with the cognitive load/readability of the words themselves, there was a parallel project for that.
  6. Be a highly adaptive system, to support the heterogeneous highly distributed nature of the world wide web.

Practical requirements of CAINscript

  • Be able to understand HTML 3, and 4 (HTML 5 not being written in 2004);
  • Be able to understand XHTML;
  • Be able to understand framesets;
  • Be able to understand iframes;
  • Be able to understand CSS as linked resources;
  • Be able to understand CSS as script elements;
  • Be able to understand CSS as style attributes;
  • Be able to cope with malformed HTML;
  • Be able to convert all colour spaces into a common representation;
  • Be able to compute all sizes on text into a common representation;
  • Be able to compute all positional/ layout commands into a common representation;
  • Be able to extract “human importance” of sections of text from the above presentation queues;
  • Any information presented in flash, Java applets or embedded binaries had to be ignored;
  • We aimed to make intelligent guesses about image files content from their context and how they where written (Google Inc does similar guesses);

I would like to add Javascript interpretation to this list, as pages frequently use that language to alter the appearance of web pages. At the point of contact, I didn't have an interpreter.


Update: by 2013, now there is a more mature open-source market, and something could probably be achieved via spidermonkey, node.js or similar; for interpreting JS. It would still be computationally hard, to know exactly what a section of JS did without running it.


Implementation of website

I was capable of web development, my co-workers where not, so I mocked up a static interface prototype. This was to work out use-case requirements on this service. The prototype was strict HTML4.01 with good use of CSS.
I was intending to make a “live” one after the script engine was completed so activity was possible.

Implementation overview and context

At the time of doing this work, I was an academic and wrote introduction to my aspect of CAIN. This is here, but note it is a seventen year old document, some of the references may not function.

I was intending to help a fellow student, not sell a product, and was expecting the work to be continued by other parties. As such the code needed clear design, compliance to University habits, architecture documentation as well as code documentation. Lastly to be used by my co-worker to make it operational, I needed administrator documentation. All of which need to coherent with the code base and the current requirements. As such, the grammar I choose is documented, a basic IDL is documented, I presented reasoned arguments about features
One of my activities was a twenty minute presentation on how I was building my colleges research theories. This I did much more smoothly than my own MSc presentation.

I pulled a few references out, you may find them useful:

There is more documentation, but it wasn't recoverable.

Technical requirements

In-order to use CAIN script, Perl5 and the following Perl libs must be installed:
cain_version=1.1.0

  • Bundle::BDI - or some similar BDI package
  • Bundle::LWP
  • Data::Dumper - should already be installed
  • Sys::Hostname
  • HTML::Parser - part of LWP I think
  • Net::Ping
  • File::Spec
  • Getopt::Mixed

Remaining TODO

Todo list as of 2004-06-28, please note numbers aren't sequential I have done some of these.

  • [1b] Force the tree structure to get passed around correctly.
  • [6] Other than interface design, I have done no work on the front end. DONE BY JOHN
  • [7] Have no thorough version for the user manual.
  • [8] SemanticFilter is not thorough enough yet (the execing functionality has little error checking as the responsibility is delegated to the SemanticFilter).
  • [10] Assuming all previous done, some real world testing.
  • [11] Tidy up the documentation, as it is not finished.
  • [99] All other details that spontaneously appear.


Install notes

Install notes for the development bundle of CAINscript

1) Email me for the source bundle file. This is a dead project, and I have removed it from public source directories.
put the file in your home

2) decompress it

tar -xzf ./cain-script-dev.tar.gz
3) copy the wrapper to a directory on you path

chmod 755 ~/cain/cain
chmod 755 ~/cain/Tester.pl
chmod 755 ~/cain/Main.pl
cp ~/cain/cain $some_directory_on_your_path
4) ensure you have a recent version of perl

perl -v
5) use CPAN or similar to install required libraries
(these are listed in requirements.txt)
to use CPAN:

perl -MCPAN -e shell

6) cd ~/cain

7) the database tests use a some test stuff in the database.
the host, password etc, are controlled via the cain.conf
anyway the tests read and write to a table called test2 as described:


mysql> use cain
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
 +----------------+
 | Tables_in_cain |
 +----------------+
 | test2          |
 +----------------+
1 row in set (0.00 sec)

mysql> desc test2;
 +-------+-------------+------+-----+---------+----------------+
 | Field | Type        | Null | Key | Default | Extra          |
 +-------+-------------+------+-----+---------+----------------+
 | uid   | int(11)     |      | PRI | NULL    | auto_increment |
 | str   | varchar(20) | YES  |     | NULL    |                |
 +-------+-------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)

mysql> 

8) make any edits,
run it with the wrapper script (the one called 'cain', that you put on your
path)
the wrapper script redirects the debugging output (on STDOUT) to ~/cain/cain.out

there a small number of sample scripts in the 'ts' directory
the interface prototype from alice is in the 'web' directory

Please also note where a critical page is absent, I didn't complete any work on this; as I started paid employment.


Notes on the CAIN project

RSS. Share: Share this resource on your twitter account. Share this resource on your linked-in account. G+

Notes on the CAIN project

RSS. Share: Share this resource on your linked-in account. Share this resource on your twitter account. G+ ­ Follow