Open Proxy Servers
Kevin Guthrie
ALA, January 2003
JSTOR – January 2003 2
Outline
• Background: what are “open proxies”?
• What’s the exposure?
• What happened?
• How was it done?
• Not an isolated case
• What to do
JSTOR – January 2003 3
What has been taken:51,392 Articles from 11 Titles
# of articles Pct. of Run
Sociology Journal 1 4,997 95%
Sociology Journal 2 11,340 87%
Economics Journal 3 5,514 77%
Sociology Journal 4 349 73%
Economics Journal 2 402 71%
Sociology Journal 5 14,537 65%
Economics Journal 3 3,619 55%
Statistics Journal 1 6,555 44%
Economics Journal 4 120 3%
Sociology Journal 6 3,728 23%
Economics Journal 4 231 <1%
JSTOR – January 2003 4
Proxy Servers
A proxy server is a web server that acts as an intermediary or relay station between a
workstation user and the Internet.
www.jstor.org
proxy.inst.eduIP: 2.3.4.5
User IP: 1.2.3.4
http
://w
ww
.jsto
r.org
/bro
wse
http://www.jstor.org/browse
JSTOR – January 2003 6
Proxy Servers Common Reasons for Their Use
• Caching
• Remote access
• Usage tracking
• Controlled access
• Approved filtering
JSTOR – January 2003 7
What is an “open” proxy server?
• There is a configuration process to specify who is authorized to access the server. It is similar to the configuration process for any web server
• When a proxy server is not set up with the appropriate access controls, anyone can access that machine and “assume its identity”
JSTOR – January 2003 8
“Open” Proxy Servers:How and Why are they Created
• Some are organizational or departmental proxy servers incorrectly configured.
• Some are set up intentionally to provide access to restricted resources (probably for convenience).
• We believe many are set up accidentally as an unknown by-product of setting up a web server.
What’s the Exposure?
Search For Lists of Open Proxy Servers
Find Lists of Open Proxy Servers
Lists of Open Proxy Servers by Domain Type
A List of Open .edu Proxies
[The server hostnames have been edited to protect the institutions with open proxy servers listed on this page.]
What Happened and How it was Discovered
JSTOR – January 2003 15
JSTOR Monitors Use
• We have triggers to alert us to unusual levels of usage activity
• We investigate when usage seems unusual
JSTOR – January 2003 16
The Abuse What Happened
August 22nd to the 27th -- 13413 articles are downloaded from Proxy #1.
August 27th we deny this IP access to JSTOR.-------------------------------------------------------------
August 26th to September 4th -- 3859 articles are downloaded from Proxy #2 at a different
participating site.September 4th we deny the IP address of this second
proxy.
JSTOR – January 2003 17
The AbuseWhat Happened
• It appeared the two abuse situations were related:
1. There was an overlap in journals downloaded, but not an overlap in articles downloaded.
2. Analysis of our log files showed that the URLs being downloaded via Proxy #2 were created through use at Proxy #1.
JSTOR – January 2003 18
The AbuseThe Pattern Continues
• Between August 27th and October 31st downloads occurred from:– 27 open proxy servers at– 16 different sites
• As JSTOR staff denied each proxy server, the abuse moved on.
~51,000 articles downloaded from 11 journals
How Is It Done?
JSTOR – January 2003 20
Automate The Process
• Download lists of open proxies• Automate a process to probe each to see if
there is access to restricted resources• Identify a set of open proxy servers with
such access and set them aside• Automate a process to download content• From the “confirmed” list – commence
downloading.
JSTOR – January 2003 21
Not an Isolated Case
We have found web pages providing explicit instructions for others to help them exploit open proxies in order to download content.
Not an Isolated Case
JSTOR – January 2003 23
Not an Isolated Case - Translations
– “The Bible for Downloading Journal Articles”– “To be blunt about it, you find an overseas proxy. The
institution that the proxy server belongs to has spent money to buy the electronic edition of some journal, and then you use this proxy, (so) of course you can download the entire text of that journal!”
– “I cannot deny that some servers can download complete texts from many journals, but please, everyone, let’s not grab onto the ones which are easy to use and use them madly. The result of doing so will be to hasten the death of that server! So when you are using them, it’s best to do so equitably!”
Not an Isolated Case
Questions & Discussion
JSTOR – January 2003 26
What to do?
• Shibbolethhttp://shibboleth.internet2.edu/
• DLF Certificateshttp://www.diglib.org/architectures/digcert.htm
• Education
• Drive all campus access through a set of properly authenticated proxy servers
http://www.jstor.org/