eda095 urlconnections - lunds tekniska...
TRANSCRIPT
EDA095URLConnections
Pierre Nugues
Lund Universityhttp://cs.lth.se/home/pierre_nugues/
March 30, 2017
Covers: Chapter 7, Java Network Programming, 4th ed., Elliotte Rusty Harold
Pierre Nugues EDA095 URLConnections March 30, 2017 1 / 1
URLConnection
URLConnection represents a communication link between the applicationand a URL.It is created from a URL object that calls openConnection()
We have seen how to open a stream from a URL:
URL myDoc = new URL("http://cs.lth.se/");
InputStream is = myDoc.openStream();
These lines are equivalent to:
URL myDoc = new URL("http://cs.lth.se/");
URLConnection uc = myDoc.openConnection();
InputStream is = uc.getInputStream();
They are more complex than openStream() but more flexible.
Pierre Nugues EDA095 URLConnections March 30, 2017 2 / 1
Reading a URL
try {
URL myDoc = new URL("http://cs.lth.se/");
URLConnection uc = myDoc.openConnection();
InputStream is = uc.getInputStream();
BufferedReader bReader =
new BufferedReader(new InputStreamReader(is));
String line;
while ((line = bReader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) { e.printStackTrace(); }
//ReadURL.javaNearly the same as in ViewHTML.java except that we have aURLConnection object instead of an InputStream object.
Pierre Nugues EDA095 URLConnections March 30, 2017 3 / 1
URLConnection
URLConnection enables the programmer to have more control over aconnection:
Access the header fields
Configure the client properties
Use more elaborate commands (POST, PUT for HTTP)
Pierre Nugues EDA095 URLConnections March 30, 2017 4 / 1
Reading the Header
The header is part of the HTTP protocol and consists of a list of pairs:parameter/value. There are two main methods to read it:
String getHeaderFieldKey(int n) // the parameter name of thenth header
String getHeaderField(int n) // the parameter of the nthheader
Headers have typical parameters: Date, Content type, etc.There are shortcuts to access them:
String getContentEncoding()
String getContentType()
long getDate(), etc.
Pierre Nugues EDA095 URLConnections March 30, 2017 5 / 1
Reading the Header (I)
Extracting the complete list:
try {
URL myDoc = new URL("http://cs.lth.se/");
URLConnection uc = myDoc.openConnection();
for (int i = 0; ; i++) {
String header = uc.getHeaderField(i);
if (header == null) break;
System.out.println(uc.getHeaderFieldKey(i) + ": "
+ header);
}
} catch (Exception e) {
e.printStackTrace();
}
// ReadHeader.java
Pierre Nugues EDA095 URLConnections March 30, 2017 6 / 1
Reading the Header (II)
Extracting selected parameters:
try {
URL myDoc = new URL("http://cs.lth.se/");
URLConnection uc = myDoc.openConnection();
System.out.println("Date: " + new Date(uc.getDate()));
System.out.println("Content type: " + uc.getContentType());
System.out.println("Content encoding: " +
uc.getContentEncoding());
System.out.println("Last modified: " + uc.getLastModified());
} catch (Exception e) {
e.printStackTrace();
}
// ReadHeader2.java
Pierre Nugues EDA095 URLConnections March 30, 2017 7 / 1
MIME
The MIME (Multipurpose Mail Internet Extensions) is a tag to identify thecontent type. RFC 2045 and 2046(http://tools.ietf.org/html/rfc2045)MIME defines a category and a format: a type and a subtypeUseful MIME types are text/html, text/plain, image/gif,image/jpeg, application/pdf, and so on.In ReadHeader.java, let’s replace
URL myDoc = new URL("http://cs.lth.se/");
with
URL myDoc = new URL("http://fileadmin.cs.lth.se/cs/Bilder/
Grundplatta-3.jpg");
HTTP servers should send a content type together with data. It is notalways present however.Sometimes, the client has to guess usingURLConnection.guessContentTypeFromStream()
Pierre Nugues EDA095 URLConnections March 30, 2017 8 / 1
Downloading Text (and Ignoring the Rest)
public ArrayList<URL> readURL(URL url) {
LinkGetter callback = null;
try {
URLConnection uc = url.openConnection();
String type = uc.getContentType().toLowerCase();
// We read only text pages
if ((type != null) && !type.startsWith("text/html")) {
System.out.println(url + " ignored. Type " + type);
return null;
}
ParserGetter kit = new ParserGetter();
HTMLEditorKit.Parser parser = kit.getParser();
...
}
Pierre Nugues EDA095 URLConnections March 30, 2017 9 / 1
Caches
When a web page is viewed multiple times, the images it can contain aredownloaded once and then retrieved from on a cache on the local machine.The cache can be controlled by HTTP headers.
Expires (HTTP 1.0)
Cache-control (HTTP 1.1): max-age, no-store
Last-modified
ETag: A unique identifier sent by the server. The identifier changeswhen the resource changes.
It is possible to write a cache manager in Java, the idea is: If you access aURL, check if it is in the cache before downloading it.See the book, pages 203–208.
Pierre Nugues EDA095 URLConnections March 30, 2017 10 / 1
Configuring the Parameters
A set of URLConnection methods enables a program to read and modifythe connection’s request parameters:
protected boolean connected //false
protected boolean doInput // true
protected boolean doOutput // false
protected URL url, etc.
The methods to read and modify the connection are:
URL getURL()
boolean getDoInput()
void setDoInput(boolean)
String getRequestProperty(String key)
void setRequestProperty(String key, String value)
etc.
Pierre Nugues EDA095 URLConnections March 30, 2017 11 / 1
Getting the Parameters
try {
URL myDoc = new URL("http://cs.lth.se/");
URLConnection uc = myDoc.openConnection();
System.out.println("URL: " + uc.getURL());
System.out.println("Do Input: " + uc.getDoInput());
System.out.println("Do Output: " + uc.getDoOutput());
uc.setDoOutput(true);
System.out.println("Do Output: " + uc.getDoOutput());
} catch (Exception e) {
e.printStackTrace();
}
// ReadParameters.java
Pierre Nugues EDA095 URLConnections March 30, 2017 12 / 1
Extracting the Server Type
URL url = new URL("http://" + address);
URLConnection uc = url.openConnection();
uc.setConnectTimeout(50);
uc.setReadTimeout(50);
uc.connect(); //This is necessary to start the timeout clock
String serverType = uc.getHeaderField("Server");
// Normally, it should call connect().
// See JNP 4, page 189.
// On a Mac, this does not seem to be systematic
Pierre Nugues EDA095 URLConnections March 30, 2017 13 / 1
Maps or Dictionaries
A data structure that enables to use any kind of index:
my_array[1] = 2
my_dict["word"] = 3
In Java, this in implemented with maps:
Map<String, Integer>
HashMap
TreeMap
and the methods put() and get().
Pierre Nugues EDA095 URLConnections March 30, 2017 14 / 1
Counting Words in Java: Useful Features
Useful instructions and data structures:
Map<String, Integer> count(String[] words) {
Map<String, Integer> counts = new TreeMap<>();
for (String word : words) {
if (counts.get(word) == null) {
counts.put(word, 1);
} else {
counts.put(word, counts.get(word) + 1);
}
}
return counts;
}
Pierre Nugues EDA095 URLConnections March 30, 2017 15 / 1
The input box of the Google page
<form action=/search name=f>...
<input type=hidden name=hl value=sv>
<input maxLength=256 size=55 name=q value="">
<input type=submit value="Google-sokning" name=btnG>
<input type=submit value="Jag har tur" name=btnI>
<input id=all type=radio name=meta value="" checked>
<label for=all> webben</label>
<input id=lgr type=radio name=meta value="lr=lang_sv" >
<label for=lgr> sidor pa svenska</label>
<input id=cty type=radio name=meta value="cr=countrySE" >
<label for=cty>sidor fran Sverige</label>...
</form>
Pierre Nugues EDA095 URLConnections March 30, 2017 16 / 1
Queries
The query Nugues to Google is a sequence of pairs (name, value)
https://www.google.se/search?
q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr
URISplitter extracts the query:
q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr
We can create a GET request using the URL constructor and send it toGoogle using openStream()
Google returns a 403 error: Forbidden.//GoogleQuery.java
Pierre Nugues EDA095 URLConnections March 30, 2017 17 / 1
Faking the Lizard
A naıve Google query from Java fails miserably:
Server returned HTTP response code: 403 for URL:
q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr
Google sets constraints on theuser agent.It is possible to remedy this.Just set a user agent corre-sponding to a known browser
uc.setRequestProperty("User-Agent", "Mozilla/5.0");
Pierre Nugues EDA095 URLConnections March 30, 2017 18 / 1
Using the POST Command
The two main commands of a HTTP client are GET and POST
GET sends parameters as an extension of a URL addressThey are visible to everybodyHow to send data with POST?Programs FormPoster.java and QueryString.java from Elliotte RustyHarold, Java Network Programming, pages 220–222 and page 153 areexamples of it.(http://www.cafeaulait.org/books/jnp3/examples/15/)They form a client that works in conjunction with a server and formats aquery.The query is sent back by the server
Pierre Nugues EDA095 URLConnections March 30, 2017 19 / 1
Formatting a Query
QueryString.java encodes and formats a query.The pairs of parameter names (keys) and values are separated with &
In the example, we send:
Name: Elliotte Rusty Harold
Email: [email protected]
Pierre Nugues EDA095 URLConnections March 30, 2017 20 / 1
Using the POST Command
Switching from GET to POST is done implicitly through setDoOutput()
URLConnection uc = url.openConnection();
uc.setDoOutput(true);
OutputStreamWriter out =
new OutputStreamWriter(uc.getOutputStream(), "UTF-8");
out.write(query.toString());
out.write("\r\n");
out.flush();
out.close();
The client header is sent automatically
Pierre Nugues EDA095 URLConnections March 30, 2017 21 / 1
Using the POST Command (II)
HttpURLConnection is a subclass designed to carry out HTTP interactionThe POST method is more explicit with it and the setRequestMethod()
HttpURLConnection uc =
(HttpURLConnection) url.openConnection();
uc.setRequestMethod("POST");
uc.setDoOutput(true);
(FormPoster2.java)
Pierre Nugues EDA095 URLConnections March 30, 2017 22 / 1
HttpURLConnection
Possible requests with HttpURLConnection are:
GET // default download
POST
PUT // Upload a file
DELETE // delete a file
HEAD //same as GET but return the header only
OPTIONS //lists the possible commands
TRACE // send back the header
This makes provision to manage HTTP protocol codes directly at the APIlevel.
Pierre Nugues EDA095 URLConnections March 30, 2017 23 / 1
DELETE
HttpURLConnection uc =
(HttpURLConnection) url.openConnection();
uc.setRequestMethod("DELETE");
uc.setDoOutput(true);
OutputStreamWriter out =
new OutputStreamWriter(uc.getOutputStream(), "UTF-8");
// The DELETE line, the Content-type header,
// and the Content-length headers are sent by the
// URLConnection.
// We just need to send the data
out.write("/none");
out.write("\r\n");
(Poster.java)
Pierre Nugues EDA095 URLConnections March 30, 2017 24 / 1
FTP
//URL url = new URL("ftp://username:[email protected]/file.zip;type=i");
URL url =
new URL("ftp://anonymous:Pierre.Nugues%[email protected]/pub/");
URLConnection uc = url.openConnection();
InputStream is = uc.getInputStream();
BufferedReader bReader =
new BufferedReader(new InputStreamReader(is));
String line;
while ((line = bReader.readLine()) != null) {
System.out.println(line);
}
//ReadURLftp.java
Pierre Nugues EDA095 URLConnections March 30, 2017 25 / 1
REST Architecture
REST – representation state transfer – An a posteriori model of the web:clients, servers, and HTTPRESTful architecture implicitly means: the client-server transactions basedon three standards:
HTTP:
Transfer protocol of the webOn top of TCP/IPPairs of requests from clients and responses from servers
URI/URLs:
A way to name and address objects on the net
HTML/XML
Pierre Nugues EDA095 URLConnections March 30, 2017 26 / 1
REST Methods
Most web servers use databases to store data.REST transactions are essentially database operations.In the context of REST, we reuse HTTP methods with a differentmeaning.This defines the interaction protocol or API.CRUD is another name of the same concept.The CRUD operations are mapped onto HTTP methods.
CRUD names HTTP methods
Create POSTRead (Retrieve) GETUpdate PUTDelete DELETE
See: http:
//en.wikipedia.org/wiki/Create,_read,_update_and_delete.
Pierre Nugues EDA095 URLConnections March 30, 2017 27 / 1
REST Methods: POST
Doodle is a popular planning service for meetings: http://doodle.com/
Doodle uses a REST API:http://doodle.com/xsd1/RESTfulDoodle.pdf
We create a meeting (poll in Doodle) with POST
Client Server
POST /polls −→Data in the message body
←− 201 CreatedContent-Location:Ischgtq77kunavkcu
Pierre Nugues EDA095 URLConnections March 30, 2017 28 / 1
REST Methods: GET
We retrieve a meeting with GET
Client Server
GET /polls/Ischgtq −→←− The poll encoded in XML
according to schema poll.xsd.
Pierre Nugues EDA095 URLConnections March 30, 2017 29 / 1
REST Methods: PUT
We update a meeting with PUT
Client Server
PUT /polls/Ischgtq −→Data in the message body
←− 200 OK
Pierre Nugues EDA095 URLConnections March 30, 2017 30 / 1
REST Methods: DELETE
We delete a meeting with DELETE
Client Server
DELETE /polls/Ischgtq −→←− 204 No Content
Pierre Nugues EDA095 URLConnections March 30, 2017 31 / 1
HTTP Methods and REST
Amazon S3 is another example (from RESTful web services, Chap. 3,O’Reilly)It uses two types of objects: buckets (a folder or a collection) and objectsand four methods: GET, HEAD, PUT, and DELETE
GET HEAD PUT DELETE
Bucket List content Create bucket Delete bucketObject Get data Get metadata Set values Delete object
and metadata and metadata
One example among others. . .
Pierre Nugues EDA095 URLConnections March 30, 2017 32 / 1
REST and Sesame
Sesame extends the REST protocol to manage graphs:
GET: Fetches statements from the repository.
PUT: Updates data in the repository, replacing any existing data withthe supplied data. The data supplied with this request is expected tocontain an RDF document in one of the supported RDF formats.
DELETE: Deletes statements from the repository.
POST: Performs updates on the data in the repository. The datasupplied with this request is expected to contain either an RDFdocument or a special purpose transaction document. In case of theformer, the statements found in the RDF document will be added tothe repository. In case of the latter, the updates specified in thetransaction document will be executed.
Pierre Nugues EDA095 URLConnections March 30, 2017 33 / 1
REST Examples
Get all repositories (tuple query):
curl -X GET -H "Accept: application/sparql-results+xml"
http://vm33.cs.lth.se/openrdf-sesame/repositories
SPARQL queries is also straightforward.A SELECT query: SELECT ?s ?p ?o WHERE {?s ?p ?o} (tuple query):
curl -X GET -H "Accept: application/sparql-results+json"
http://vm33.cs.lth.se/openrdf-sesame/repositories/rosetta_tests
?query=SELECT+%3fs+%3fp+%3fo+WHERE+%7b%3fs+%3fp+%3fo%7d
Pierre Nugues EDA095 URLConnections March 30, 2017 34 / 1
REST In Practice
Few programmers would build a REST application from scratch.There are plenty of tools available:
Reference:
JSR 311, JAX-RS: The Java API for RESTful Web Services,http://jsr311.java.net/
Tools:
cURL, a command line tool, http://curl.haxx.se/Poster, a plugin module for Firefox, https://addons.
mozilla.org/en-US/firefox/addon/poster/
soapUI, http://www.soapui.org/
Implementation:
Jersey, http://jersey.java.net/
Pierre Nugues EDA095 URLConnections March 30, 2017 35 / 1
Poster
Pierre Nugues EDA095 URLConnections March 30, 2017 36 / 1