NewsGroup URL Gleaner


Welcome to the NewsGroup Gleaner

A new way to surf selected web pages, based on URLs that appear in the headers and text of usenet newsgroup articles.

The basic idea is this: specify a news group by name and the NewsGroup Gleaner will read all the articles in that news group, saving any http: URLs that it encounters. The results are then put into a new web page, that is then displayed.

Prerequisites: NetScape 3, Perl 5, Unix environment; this version runs under NetScape only --- not as a standalone appletviewer applet.

The Problem: The Gleaner is primarily a Java applet, but unfortunately mere applets do not have sufficient power to actually read news groups. Instead, the applet communicates with a small proxy server, written in Perl, that serves as relay between the Java applet and the news server.

Running the Proxy: You'll need to run a Perl script (located here ) in order to get the Gleaner to work. Customize the name of the news server in the first part of the script to match your environment; you may also need to change the first line to specify the location of the Perl interpreter.

Running the applet: Java is very particular about security with sockets and ports; an applet can't communicate with a process running on machine other than where the applet resides. Therefore, the way to get the Gleaner applet to work is:

  1. Copy the following (this page, the two Java classes, the Perl script) to some local disk, say /tmp. The items are:
  2. Start a shell with newspass running.
  3. Be sure you have enabled Java and JavaScript execution under your NetScape options.
  4. Command NetScape to open the url file:/tmp/glean.html (tailored to your environment). Hopefully, now everything resides in the same machine -- the proxy server, the applet class files, and a copy of this page. It may actually run!

Notes: This is not industrial strength software! There are many bugs and inefficiencies in both the Perl script and the Java applet. One very weak point is that the socket and port status is not properly reset after the Gleaner runs, so that you may need to purge NetScape's cache or even restart newspass for repeated gleanings. Other deficiencies are just too numerous to mention.