Working with the Proxy
Gathering Real Time Data from the Web
by D. Hyatt, I. Mirkin, and D. Donado
The Problem with Filters
The problem with accessing real time data from the web at TJHSST is that our
website is being filtered. Any HTTP requests to remote sites must go through
a separate machine called a proxy server whose IP address is 151.188.17.247
and then through port 8002 rather than the standard 80 that most web servers
use. The proxy server then compares the HTTP request against a known list
of banned locations, and if suitable it forwards the request to the real site
on the Internet but otherwise discards the request. This is handled
transparently once the proper manual settings have been set in a web browser
such as Netscape,
but it does create special problems for dynamic websites that wish to gather
data from the web. These requests must also be routed through the proxy
since they also must be filtered.
The following two examples show how to access a remote site through the
school's proxy sever using either PHP or Perl. Give special thanks to student
sysadmin, Ilia Mirkin, who figured this out.
Remote Access with PHP
The following segment of code opens a socket connection through TJHSST's
filter, affectionately known as bigbrother.tjhsst.edu, and will go
through the required port 8002 which now handles http requests.
This connection will then be used to echo a hypothetical
web page called somepage.html that is in a subdirectory called
somedir at a theoretical site called www.somesite.edu
Naturally, users will have to change these lines to access pages
and data from real sites!
<?php
$fp = fsokopen("bigbrother.tjhsst.edu", 8002);
if (!$fp) die;
fputs($fp, "GET /somedir/somepage.html HTTP/1.0\r\n");
fputs($fp, "Host: www.somesite.edu\r\n\r\n");
while (!feof($fp)) {
echo fgets($fp, 128);
}
fclose($fp);
?>
Remote Access with Perl
Perl has a very useful module for Internet access called LWP
(Lib-WWW-Perl). It can directly contact the proxy making a new
request where the host and full URL can all be in one string.
#!/usr/bin/perl
use LWP;
$ua = new LWP::UserAgent;
$ua->proxy(http => "http://bigbrother.tjhsst.edu:8002/");
$req = new HTTP::Request GET => "http://www.somesite.edu/somedir/somepage.html";
$stuff = $ua->request($req);
print "Content-type: text/html\n\n";
print $stuff->content;
Remote Access with Java
For Java, it is necessary to establish the route through the proxy so
that something like a Java Servelet will be able to route the necessary
data. Danny Donato (TJ02) offers the following code to handle the
proxy implementation in Java.
System.getProperties().put( "proxySet", "true" );
System.getProperties().put( "proxyHost", "bigbrother.tjhsst.edu" );
System.getProperties().put( "proxyPort", "8002" );
If you have created some dynamic webpages at TJ and they no longer work,
try modifying your program to include routing HTTP requests through our
proxy.
Donald W. Hyatt
dhyatt@tjhsst.edu
June 11, 2001