eda095 urlconnections - lunds tekniska...

36
EDA095 URLConnections Pierre Nugues Lund University http://cs.lth.se/home/pierre_nugues/ March 30, 2017 Covers: Chapter 7, Java Network Programming,4 th ed., Elliotte Rusty Harold Pierre Nugues EDA095 URLConnections March 30, 2017 1/1

Upload: others

Post on 25-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

EDA095URLConnections

Pierre Nugues

Lund Universityhttp://cs.lth.se/home/pierre_nugues/

March 30, 2017

Covers: Chapter 7, Java Network Programming, 4th ed., Elliotte Rusty Harold

Pierre Nugues EDA095 URLConnections March 30, 2017 1 / 1

Page 2: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

URLConnection

URLConnection represents a communication link between the applicationand a URL.It is created from a URL object that calls openConnection()

We have seen how to open a stream from a URL:

URL myDoc = new URL("http://cs.lth.se/");

InputStream is = myDoc.openStream();

These lines are equivalent to:

URL myDoc = new URL("http://cs.lth.se/");

URLConnection uc = myDoc.openConnection();

InputStream is = uc.getInputStream();

They are more complex than openStream() but more flexible.

Pierre Nugues EDA095 URLConnections March 30, 2017 2 / 1

Page 3: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Reading a URL

try {

URL myDoc = new URL("http://cs.lth.se/");

URLConnection uc = myDoc.openConnection();

InputStream is = uc.getInputStream();

BufferedReader bReader =

new BufferedReader(new InputStreamReader(is));

String line;

while ((line = bReader.readLine()) != null) {

System.out.println(line);

}

} catch (Exception e) { e.printStackTrace(); }

//ReadURL.javaNearly the same as in ViewHTML.java except that we have aURLConnection object instead of an InputStream object.

Pierre Nugues EDA095 URLConnections March 30, 2017 3 / 1

Page 4: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

URLConnection

URLConnection enables the programmer to have more control over aconnection:

Access the header fields

Configure the client properties

Use more elaborate commands (POST, PUT for HTTP)

Pierre Nugues EDA095 URLConnections March 30, 2017 4 / 1

Page 5: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Reading the Header

The header is part of the HTTP protocol and consists of a list of pairs:parameter/value. There are two main methods to read it:

String getHeaderFieldKey(int n) // the parameter name of thenth header

String getHeaderField(int n) // the parameter of the nthheader

Headers have typical parameters: Date, Content type, etc.There are shortcuts to access them:

String getContentEncoding()

String getContentType()

long getDate(), etc.

Pierre Nugues EDA095 URLConnections March 30, 2017 5 / 1

Page 6: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Reading the Header (I)

Extracting the complete list:

try {

URL myDoc = new URL("http://cs.lth.se/");

URLConnection uc = myDoc.openConnection();

for (int i = 0; ; i++) {

String header = uc.getHeaderField(i);

if (header == null) break;

System.out.println(uc.getHeaderFieldKey(i) + ": "

+ header);

}

} catch (Exception e) {

e.printStackTrace();

}

// ReadHeader.java

Pierre Nugues EDA095 URLConnections March 30, 2017 6 / 1

Page 7: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Reading the Header (II)

Extracting selected parameters:

try {

URL myDoc = new URL("http://cs.lth.se/");

URLConnection uc = myDoc.openConnection();

System.out.println("Date: " + new Date(uc.getDate()));

System.out.println("Content type: " + uc.getContentType());

System.out.println("Content encoding: " +

uc.getContentEncoding());

System.out.println("Last modified: " + uc.getLastModified());

} catch (Exception e) {

e.printStackTrace();

}

// ReadHeader2.java

Pierre Nugues EDA095 URLConnections March 30, 2017 7 / 1

Page 8: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

MIME

The MIME (Multipurpose Mail Internet Extensions) is a tag to identify thecontent type. RFC 2045 and 2046(http://tools.ietf.org/html/rfc2045)MIME defines a category and a format: a type and a subtypeUseful MIME types are text/html, text/plain, image/gif,image/jpeg, application/pdf, and so on.In ReadHeader.java, let’s replace

URL myDoc = new URL("http://cs.lth.se/");

with

URL myDoc = new URL("http://fileadmin.cs.lth.se/cs/Bilder/

Grundplatta-3.jpg");

HTTP servers should send a content type together with data. It is notalways present however.Sometimes, the client has to guess usingURLConnection.guessContentTypeFromStream()

Pierre Nugues EDA095 URLConnections March 30, 2017 8 / 1

Page 9: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Downloading Text (and Ignoring the Rest)

public ArrayList<URL> readURL(URL url) {

LinkGetter callback = null;

try {

URLConnection uc = url.openConnection();

String type = uc.getContentType().toLowerCase();

// We read only text pages

if ((type != null) && !type.startsWith("text/html")) {

System.out.println(url + " ignored. Type " + type);

return null;

}

ParserGetter kit = new ParserGetter();

HTMLEditorKit.Parser parser = kit.getParser();

...

}

Pierre Nugues EDA095 URLConnections March 30, 2017 9 / 1

Page 10: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Caches

When a web page is viewed multiple times, the images it can contain aredownloaded once and then retrieved from on a cache on the local machine.The cache can be controlled by HTTP headers.

Expires (HTTP 1.0)

Cache-control (HTTP 1.1): max-age, no-store

Last-modified

ETag: A unique identifier sent by the server. The identifier changeswhen the resource changes.

It is possible to write a cache manager in Java, the idea is: If you access aURL, check if it is in the cache before downloading it.See the book, pages 203–208.

Pierre Nugues EDA095 URLConnections March 30, 2017 10 / 1

Page 11: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Configuring the Parameters

A set of URLConnection methods enables a program to read and modifythe connection’s request parameters:

protected boolean connected //false

protected boolean doInput // true

protected boolean doOutput // false

protected URL url, etc.

The methods to read and modify the connection are:

URL getURL()

boolean getDoInput()

void setDoInput(boolean)

String getRequestProperty(String key)

void setRequestProperty(String key, String value)

etc.

Pierre Nugues EDA095 URLConnections March 30, 2017 11 / 1

Page 12: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Getting the Parameters

try {

URL myDoc = new URL("http://cs.lth.se/");

URLConnection uc = myDoc.openConnection();

System.out.println("URL: " + uc.getURL());

System.out.println("Do Input: " + uc.getDoInput());

System.out.println("Do Output: " + uc.getDoOutput());

uc.setDoOutput(true);

System.out.println("Do Output: " + uc.getDoOutput());

} catch (Exception e) {

e.printStackTrace();

}

// ReadParameters.java

Pierre Nugues EDA095 URLConnections March 30, 2017 12 / 1

Page 13: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Extracting the Server Type

URL url = new URL("http://" + address);

URLConnection uc = url.openConnection();

uc.setConnectTimeout(50);

uc.setReadTimeout(50);

uc.connect(); //This is necessary to start the timeout clock

String serverType = uc.getHeaderField("Server");

// Normally, it should call connect().

// See JNP 4, page 189.

// On a Mac, this does not seem to be systematic

Pierre Nugues EDA095 URLConnections March 30, 2017 13 / 1

Page 14: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Maps or Dictionaries

A data structure that enables to use any kind of index:

my_array[1] = 2

my_dict["word"] = 3

In Java, this in implemented with maps:

Map<String, Integer>

HashMap

TreeMap

and the methods put() and get().

Pierre Nugues EDA095 URLConnections March 30, 2017 14 / 1

Page 15: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Counting Words in Java: Useful Features

Useful instructions and data structures:

Map<String, Integer> count(String[] words) {

Map<String, Integer> counts = new TreeMap<>();

for (String word : words) {

if (counts.get(word) == null) {

counts.put(word, 1);

} else {

counts.put(word, counts.get(word) + 1);

}

}

return counts;

}

Pierre Nugues EDA095 URLConnections March 30, 2017 15 / 1

Page 16: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Google

The input box of the Google page

<form action=/search name=f>...

<input type=hidden name=hl value=sv>

<input maxLength=256 size=55 name=q value="">

<input type=submit value="Google-sokning" name=btnG>

<input type=submit value="Jag har tur" name=btnI>

<input id=all type=radio name=meta value="" checked>

<label for=all> webben</label>

<input id=lgr type=radio name=meta value="lr=lang_sv" >

<label for=lgr> sidor pa svenska</label>

<input id=cty type=radio name=meta value="cr=countrySE" >

<label for=cty>sidor fran Sverige</label>...

</form>

Pierre Nugues EDA095 URLConnections March 30, 2017 16 / 1

Page 17: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Queries

The query Nugues to Google is a sequence of pairs (name, value)

https://www.google.se/search?

q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr

URISplitter extracts the query:

q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr

We can create a GET request using the URL constructor and send it toGoogle using openStream()

Google returns a 403 error: Forbidden.//GoogleQuery.java

Pierre Nugues EDA095 URLConnections March 30, 2017 17 / 1

Page 18: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Faking the Lizard

A naıve Google query from Java fails miserably:

Server returned HTTP response code: 403 for URL:

q=Nugues&ie=utf-8&oe=utf-8&gws_rd=cr

Google sets constraints on theuser agent.It is possible to remedy this.Just set a user agent corre-sponding to a known browser

 uc.setRequestProperty("User-Agent", "Mozilla/5.0");

Pierre Nugues EDA095 URLConnections March 30, 2017 18 / 1

Page 19: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Using the POST Command

The two main commands of a HTTP client are GET and POST

GET sends parameters as an extension of a URL addressThey are visible to everybodyHow to send data with POST?Programs FormPoster.java and QueryString.java from Elliotte RustyHarold, Java Network Programming, pages 220–222 and page 153 areexamples of it.(http://www.cafeaulait.org/books/jnp3/examples/15/)They form a client that works in conjunction with a server and formats aquery.The query is sent back by the server

Pierre Nugues EDA095 URLConnections March 30, 2017 19 / 1

Page 20: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Formatting a Query

QueryString.java encodes and formats a query.The pairs of parameter names (keys) and values are separated with &

In the example, we send:

Name: Elliotte Rusty Harold

Email: [email protected]

Pierre Nugues EDA095 URLConnections March 30, 2017 20 / 1

Page 21: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Using the POST Command

Switching from GET to POST is done implicitly through setDoOutput()

URLConnection uc = url.openConnection();

uc.setDoOutput(true);

OutputStreamWriter out =

new OutputStreamWriter(uc.getOutputStream(), "UTF-8");

out.write(query.toString());

out.write("\r\n");

out.flush();

out.close();

The client header is sent automatically

Pierre Nugues EDA095 URLConnections March 30, 2017 21 / 1

Page 22: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Using the POST Command (II)

HttpURLConnection is a subclass designed to carry out HTTP interactionThe POST method is more explicit with it and the setRequestMethod()

HttpURLConnection uc =

(HttpURLConnection) url.openConnection();

uc.setRequestMethod("POST");

uc.setDoOutput(true);

(FormPoster2.java)

Pierre Nugues EDA095 URLConnections March 30, 2017 22 / 1

Page 23: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

HttpURLConnection

Possible requests with HttpURLConnection are:

GET // default download

POST

PUT // Upload a file

DELETE // delete a file

HEAD //same as GET but return the header only

OPTIONS //lists the possible commands

TRACE // send back the header

This makes provision to manage HTTP protocol codes directly at the APIlevel.

Pierre Nugues EDA095 URLConnections March 30, 2017 23 / 1

Page 24: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

DELETE

HttpURLConnection uc =

(HttpURLConnection) url.openConnection();

uc.setRequestMethod("DELETE");

uc.setDoOutput(true);

OutputStreamWriter out =

new OutputStreamWriter(uc.getOutputStream(), "UTF-8");

// The DELETE line, the Content-type header,

// and the Content-length headers are sent by the

// URLConnection.

// We just need to send the data

out.write("/none");

out.write("\r\n");

(Poster.java)

Pierre Nugues EDA095 URLConnections March 30, 2017 24 / 1

Page 25: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

FTP

//URL url = new URL("ftp://username:[email protected]/file.zip;type=i");

URL url =

new URL("ftp://anonymous:Pierre.Nugues%[email protected]/pub/");

URLConnection uc = url.openConnection();

InputStream is = uc.getInputStream();

BufferedReader bReader =

new BufferedReader(new InputStreamReader(is));

String line;

while ((line = bReader.readLine()) != null) {

System.out.println(line);

}

//ReadURLftp.java

Pierre Nugues EDA095 URLConnections March 30, 2017 25 / 1

Page 26: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Architecture

REST – representation state transfer – An a posteriori model of the web:clients, servers, and HTTPRESTful architecture implicitly means: the client-server transactions basedon three standards:

HTTP:

Transfer protocol of the webOn top of TCP/IPPairs of requests from clients and responses from servers

URI/URLs:

A way to name and address objects on the net

HTML/XML

Pierre Nugues EDA095 URLConnections March 30, 2017 26 / 1

Page 27: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Methods

Most web servers use databases to store data.REST transactions are essentially database operations.In the context of REST, we reuse HTTP methods with a differentmeaning.This defines the interaction protocol or API.CRUD is another name of the same concept.The CRUD operations are mapped onto HTTP methods.

CRUD names HTTP methods

Create POSTRead (Retrieve) GETUpdate PUTDelete DELETE

See: http:

//en.wikipedia.org/wiki/Create,_read,_update_and_delete.

Pierre Nugues EDA095 URLConnections March 30, 2017 27 / 1

Page 28: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Methods: POST

Doodle is a popular planning service for meetings: http://doodle.com/

Doodle uses a REST API:http://doodle.com/xsd1/RESTfulDoodle.pdf

We create a meeting (poll in Doodle) with POST

Client Server

POST /polls −→Data in the message body

←− 201 CreatedContent-Location:Ischgtq77kunavkcu

Pierre Nugues EDA095 URLConnections March 30, 2017 28 / 1

Page 29: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Methods: GET

We retrieve a meeting with GET

Client Server

GET /polls/Ischgtq −→←− The poll encoded in XML

according to schema poll.xsd.

Pierre Nugues EDA095 URLConnections March 30, 2017 29 / 1

Page 30: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Methods: PUT

We update a meeting with PUT

Client Server

PUT /polls/Ischgtq −→Data in the message body

←− 200 OK

Pierre Nugues EDA095 URLConnections March 30, 2017 30 / 1

Page 31: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Methods: DELETE

We delete a meeting with DELETE

Client Server

DELETE /polls/Ischgtq −→←− 204 No Content

Pierre Nugues EDA095 URLConnections March 30, 2017 31 / 1

Page 32: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

HTTP Methods and REST

Amazon S3 is another example (from RESTful web services, Chap. 3,O’Reilly)It uses two types of objects: buckets (a folder or a collection) and objectsand four methods: GET, HEAD, PUT, and DELETE

GET HEAD PUT DELETE

Bucket List content Create bucket Delete bucketObject Get data Get metadata Set values Delete object

and metadata and metadata

One example among others. . .

Pierre Nugues EDA095 URLConnections March 30, 2017 32 / 1

Page 33: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST and Sesame

Sesame extends the REST protocol to manage graphs:

GET: Fetches statements from the repository.

PUT: Updates data in the repository, replacing any existing data withthe supplied data. The data supplied with this request is expected tocontain an RDF document in one of the supported RDF formats.

DELETE: Deletes statements from the repository.

POST: Performs updates on the data in the repository. The datasupplied with this request is expected to contain either an RDFdocument or a special purpose transaction document. In case of theformer, the statements found in the RDF document will be added tothe repository. In case of the latter, the updates specified in thetransaction document will be executed.

Pierre Nugues EDA095 URLConnections March 30, 2017 33 / 1

Page 34: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST Examples

Get all repositories (tuple query):

curl -X GET -H "Accept: application/sparql-results+xml"

http://vm33.cs.lth.se/openrdf-sesame/repositories

SPARQL queries is also straightforward.A SELECT query: SELECT ?s ?p ?o WHERE {?s ?p ?o} (tuple query):

curl -X GET -H "Accept: application/sparql-results+json"

http://vm33.cs.lth.se/openrdf-sesame/repositories/rosetta_tests

?query=SELECT+%3fs+%3fp+%3fo+WHERE+%7b%3fs+%3fp+%3fo%7d

Pierre Nugues EDA095 URLConnections March 30, 2017 34 / 1

Page 35: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

REST In Practice

Few programmers would build a REST application from scratch.There are plenty of tools available:

Reference:

JSR 311, JAX-RS: The Java API for RESTful Web Services,http://jsr311.java.net/

Tools:

cURL, a command line tool, http://curl.haxx.se/Poster, a plugin module for Firefox, https://addons.

mozilla.org/en-US/firefox/addon/poster/

soapUI, http://www.soapui.org/

Implementation:

Jersey, http://jersey.java.net/

Pierre Nugues EDA095 URLConnections March 30, 2017 35 / 1

Page 36: EDA095 URLConnections - Lunds tekniska högskolafileadmin.cs.lth.se/cs/Education/EDA095/2017/lectures/...Reading the Header The header is part of the HTTP protocol and consists of

Poster

Pierre Nugues EDA095 URLConnections March 30, 2017 36 / 1