web services and open data - npa

76
Web Services and Open Data Sébastien Tixeuil [email protected] Thanks to Lélia Blin, Quentin Bramas, Fabien Mathieu

Upload: others

Post on 20-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Web Services and Open Data

Sébastien Tixeuil [email protected]

Thanks to Lélia Blin, Quentin Bramas, Fabien Mathieu

Web Services

What is a Web Service?A Web Service is a method of communication between two programs over the Web.

HTTP is the typical protocol used to communicate via Web Services.

What is a Web Service?

Request

Response

Client ServerHTTP

What is a Web Service?

Request

Response

Client ServerHTTP

XML:

XML:

<id>5</id>

<note id=‘5’> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget the diner</body></note>

What is a Web Service?

Request

Response

Client ServerHTTP

SOAP:

SOAP:

<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope"> <soap:Header> </soap:Header> <soap:Body> <m:GetStockPrice xmlns:m="http://www.example.org/stock/Surya"> <m:StockName>IBM</m:StockName> </m:GetStockPrice> </soap:Body></soap:Envelope>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Header> <ResponseHeader xmlns="https://www.google.com/apis/ads/publisher/v201508"> <requestId>xxxxxxxxxxxxxxxxxxxx</requestId> <responseTime>1063</responseTime> </ResponseHeader> </soap:Header> <soap:Body> <getAdUnitsByStatementResponse xmlns="https://www.google.com/apis/ads/publisher/v201508"> <rval> <totalResultSetSize>1</totalResultSetSize> <startIndex>0</startIndex> <results> <id>2372</id> <name>RootAdUnit</name> <description></description> <targetWindow>TOP</targetWindow> <status>ACTIVE</status> <adUnitCode>1002372</adUnitCode> <inheritedAdSenseSettings> <value> <adSenseEnabled>true</adSenseEnabled> <borderColor>FFFFFF</borderColor>

What is a Web Service?

Request

Response

Client ServerHTTP

Url Encoded:

JSON:

order=date&limit=2

{ "data": [{ "id": 1001, "name": "Jim" }, { "id": 1002, "name": "Matt" }]}

API Business Model

REST Web APIIs a web service using simpler REpresentational State Transfer (REST) based communication.

Request is just a HTTP Method over an URI. Response is typically JSON or XML.

Example:

GET : http://pokeapi.co/api/v1/pokemon/25

HTTP Method URI that represents a resource

base URL of the API API version

REST Web API Call Example

{ "name": "Pikachu", "attack": 55, "abilities": [ { "name": "static", "resource_uri": "/api/v1/ability/9/" }, { "name": "lightningrod",

"resource_uri": "/api/v1/ability/31/" } ] }

GET /api/v1/pokemon/25/ HTTP/1.1Host: pokeapi.coConnection: keep-alivePragma: no-cacheCache-Control: no-cacheAccept: application/json,;q=0.9,*/*;q=0.8Accept-Encoding: gzip, deflate, sdch

HTTP Request Headers

HTTP Response HeadersHTTP/1.1 200 OKServer: nginx/1.1.19Date: Fri, 08 Jan 2016 13:10:08 GMTContent-Type: application/jsonTransfer-Encoding: chunkedConnection: keep-aliveVary: AcceptX-Frame-Options: SAMEORIGINCache-Control: s-maxage=360, max-age=360

HTTP Response Body

REST Web API Call Example

{ "name": "Pikachu", "attack": 55, "abilities": [ { "name": "static", "resource_uri": "/api/v1/ability/9/" }, { "name": "lightningrod",

"resource_uri": "/api/v1/ability/31/" } ] }

GET /api/v1/pokemon/25/ HTTP/1.1Host: pokeapi.coConnection: keep-alivePragma: no-cacheCache-Control: no-cacheAccept: application/json,;q=0.9,*/*;q=0.8Accept-Encoding: gzip, deflate, sdch

HTTP Request Headers

HTTP Response HeadersHTTP/1.1 200 OKServer: nginx/1.1.19Date: Fri, 08 Jan 2016 13:10:08 GMTContent-Type: application/jsonTransfer-Encoding: chunkedConnection: keep-aliveVary: AcceptX-Frame-Options: SAMEORIGINCache-Control: s-maxage=360, max-age=360

HTTP Response Body

REST: Architectural Properties

• Simplicity of a uniform interface

• Modifiability of components to meet changing needs (even while the application is running)

• Visibility of communication between components by service agents

• Portability of components by moving program code with the data

• Reliability in the resistance to failure at the system level in the presence of of failures within components, connectors, or data

REST: Architectural Constraints

• Client-server architecture

• Statelessness

• Cacheability

• Layered system

• Code on demand (optional)

• Uniform interface

Resources

Command based (ex: Flicker Api): GET: https://api.flickr.com/services/rest/?method=flickr.galleries.getList&user_id=XX POST: https://api.flickr.com/services/rest/?method=flickr.galleries.addPhoto&gallery_id=XX

Resources

• ex: Facebook Graph Api: GET: /{photo-id} to retrieve the info of a photo GET: /{photo-id}/likes to retrieve the people who like it POST: /{photo-id} to update the photo DELETE : /{photo-id} to delete the photo

URI/Resource based:

• ex: Google Calendar Api: GET: /calendars/{calendarId} to retrieve the info of a calendar PUT: /calendars/{calendarId} to update a calendar DELETE : /calendars/{calendarId} to delete a calendarPOST: /calendars to create a calendar GET: /calendars/{calendarId}/events/{eventId}

ResponseHTTP Response: • 200: OK • 3 _ _: Redirection • 404: not found (4 _ _ : something went wrong with what you try to access) • 5 _ _ : Server Error

API Response: • Flickr:

{ "stat": "fail", "code": 1, "message": "User not found" } { "galleries": { ... }, "stat": "ok" }

• Google Calendar:{ "error": {"code": 403, "message": "User Rate Limit Exceeded" } } { "kind": "calendar#events","summary": ..., "description": ...

• text/plain

• text/html

• text/xml or application/xml

• application/json

• image/png

• ...

Response

Content-Type:

Client-side HTTP

HTTP Requestsfrom requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")

HTTP Requestsfrom requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")

Request status is 200,Content length is 53111 bytes,Request encoding is UTF-8,Text size is 53105 chars.

HTTP Requestsfrom requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")

Request status is 200,Content length is 53111 bytes,Request encoding is UTF-8,Text size is 53105 chars.

Response headers: {'Date': 'Wed, 04 Nov 2020 14:40:27 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=da1986d3c036d3d4b0dfdbf3f16812e5f1604500827; expires=Fri, 04-Dec-20 14:40:27 GMT; path=/; domain=.lelscan.net; HttpOnly; SameSite=Lax, mobile_lelscan=0; expires=Thu, 05-Nov-2020 14:40:27 GMT; Max-Age=86400; path=lelscan.net', 'Vary': 'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC', 'cf-request-id': '06354cc73b000032c20c30b000000001', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report?s=KFPpQxY2A5IilAqwG6j1BXgoJEskCp%2BkW7uCp0z63eYihMbvUnyfBx7abOP6nhy%2B5H1KHR51De457l7y84Ois4b3gD5D1Fi15RrJmklRlavxKwGsFBw3fA%3D%3D"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"report_to":"cf-nel","max_age":604800}', 'Server': 'cloudflare', 'CF-RAY': '5ecf171ec88e32c2-CDG', 'Content-Encoding': 'gzip'}

HTTP Requests

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>One Piece lecture en ligne scan</title>

<meta name="description" content="One Piece Lecture en ligne, tous les scan One Piece." />

<meta name="lelscan" content="One Piece" /><meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1" /><meta http-equiv="Content-Language" content="fr" /><meta name="keywords" content="One Piece lecture en ligne, lecture en ligne One

Piece, scan One Piece, One Piece scan, One Piece lel, lecture en ligne One Piece, Lecture, lecture, scan, chapitre, chapitre One Piece, lecture One Piece, lecture Chapitre One Piece, mangas, manga, One Piece, One Piece fr, One Piece france, scans, image One Piece " />

<meta name="subject" content="One Piece lecture en ligne scan" /><meta name="identifier-url" content="https://lelscan.net" /><meta property="og:image" content="/mangas/one-piece/thumb_cover.jpg" /><meta property="og:title" content="Lecture en ligne One Piece scan" /><meta property="og:url" content="/lecture-ligne-one-piece.php" /><meta property="og:description" content="One Piece lecture en ligne - lelscan" /><link rel="alternate" type="application/rss+xml" title="flux rss" href="/rss/

rss.xml" /><link rel="icon" type="image" href="/images/icones/favicon.ico" /><style type="text/css" media="screen">

from requests import *

manga = "http://lelscano.com"

r = get(manga)

print(f"Request status is {r.status_code},\n"

f"Content length is {len(r.content)} bytes,\n"

f"Request encoding is {r.encoding},\n"

f"Text size is {len(r.text)} chars.")

print(f"Response headers: {r.headers}")

print(f"{r.text}")

Stream Downloading

from pathlib import *from requests import *

def stream_download(source_url, dest_file): r = get(source_url, stream=True) dest_file = Path(dest_file) with open(dest_file, "wb") as f: for chunk in r.iter_content(chunk_size=8192): if chunk: f.write(chunk)

Stream Downloadingfrom pathlib import *from requests import *

def stream_download(source_url, dest_file): r = get(source_url, stream=True) dest_file = Path(dest_file) with open(dest_file, "wb") as f: for chunk in r.iter_content(chunk_size=8192): if chunk: f.write(chunk)

img = "http://ftp.crifo.org/debian-cd/current/amd64/iso-dvd/debian-10.6.0-amd64-DVD-1.iso"

stream_download(source_url=img, dest_file="debian1.iso")

Elementary String Parsing

Splits = "Python is a great language\n but Erlang is pretty cool too"

l = s.split()

print(l)

l2 = s.split('a')

print(l2)

l3 = s.split('\n')

print(l3)

l4 = s.split('an')

print(l4)

Splits = "Python is a great language\n but Erlang is pretty cool too"

l = s.split()

print(l)

l2 = s.split('a')

print(l2)

l3 = s.split('\n')

print(l3)

l4 = s.split('an')

print(l4)

['Python', 'is', 'a', 'great', 'language', 'but', 'Erlang', 'is', 'pretty', 'cool', ‘too']

['Python is ', ' gre', 't l', 'ngu', 'ge\n but Erl', 'ng is pretty cool too']

['Python is a great language', ' but Erlang is pretty cool too']

[‘Python is a great l', 'guage\n but Erl', 'g is pretty cool too']

Joins4 = 'an'.join(l4)

print(s4)

s3 = '\n'.join(l3)

print(s3)

s2 = 'a'.join(l2)

print(s2)

s1 = ' '.join(l)

print(s1)

Joins4 = 'an'.join(l4)

print(s4)

s3 = '\n'.join(l3)

print(s3)

s2 = 'a'.join(l2)

print(s2)

s1 = ' '.join(l)

print(s1)

Python is a great language but Erlang is pretty cool too Python is a great language but Erlang is pretty cool too Python is a great language but Erlang is pretty cool too Python is a great language but Erlang is pretty cool too

Regular Expressions

Regular Expressionsa, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters that have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below) . (a period) -- matches any single character except newline '\n' \w -- (lowercase w) matches a "word" character: a letter or digit or underscore [a-zA-Z0-9_]. \W matches any non-word character. \b -- boundary between word and non-word \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character. \t, \n, \r -- tab, newline, return \d -- decimal digit [0-9]$=end—match the end of the string \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

Regular Expressions

[] — set of possible characters | — or{n}— exactly n occurrences. ()— create group + — at least one occurence. * — zero or more occurence? — zero or one occurence

[email protected]

Regular ExpressionsExtract Email Information:

([^@]+)@([^@]+)

[ ]

^ a characterthat is not

@ the at symbol + at least one of this character

m = re.match('([^@]+)@([^@]+)','[email protected]') print(m.group(1)) print(m.group(2))

[email protected]

Regular ExpressionsExtract Email Information:

([^@]+)@([^@]+)

[ ]

^ a characterthat is not

@ the at symbol + at least one of this character

m = re.match(‘([^@]+)@([^@]+)',[email protected]) print(m.group(1)) print(m.group(2))

sebastien.tixeuil

lip6.fr

Extracting Information with Regular Expressions

Extracting Information with Regular Expressions

Extracting Information with Regular Expressions

from requests import *

from re import *

r = get('https://www.lip6.fr/recherche/team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))', r.text))

Extracting Information with Regular Expressions

from requests import *

from re import *

r = get('https://www.lip6.fr/recherche/team_membres.php?acronyme=NPA')

print(findall(‘(26-00/([0-9]{3}))', r.text))

[('26-00/103', '103'), ('26-00/112', '112'), ('26-00/122', '122'), ('26-00/109', '109'), ('26-00/111', '111'), ('26-00/108', '108'), ('26-00/103', '103'), ('26-00/107', '107'), ('26-00/126', '126'), ('26-00/105', '105'), ('26-00/105', '105'), ('26-00/115', '115'), ('26-00/128', '128'), ('26-00/114', '114'), ('26-00/113', '113'), ('26-00/224', '224'), ('26-00/410', '410'), ('26-00/412', '412'), ('26-00/230', '230'), ('26-00/216', '216'), ('26-00/119', '119'), ('26-00/119', '119'), ('26-00/116', '116'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/120', '120'), ('26-00/116', '116'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/104', '104'), ('26-00/102', '102'), ('26-00/102', '102'), ('26-00/132', '132'), ('26-00/102', '102'), ('26-00/104', '104'), ('26-00/420', '420'), ('26-00/120', '120'), ('26-00/120', '120'), ('26-00/132', '132'), ('26-00/119', '119'), ('26-00/119', '119')]

JSON Parsing

JSONfrom json import *

from socket import *

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

s = socket(AF_INET,SOCK_STREAM)

try:

print(dumps(s))

except TypeError:

print("this data does not seem serializable with JSON")

JSONfrom json import *

from socket import *

print(dumps(['aéçèà',1234,[2,3,4,5,6]]))

print(loads('["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]'))

s = socket(AF_INET,SOCK_STREAM)

try:

print(dumps(s))

except TypeError:

print("this data does not seem serializable with JSON")

["a\u00e9\u00e7\u00e8\u00e0", 1234, [2, 3, 4, 5, 6]]

['aéçèà', 1234, [2, 3, 4, 5, 6]]

JSON Filesfrom json import *

data = {}

data['people'] = []

data['people'].append({

'name': 'Mark',

'website': 'facebook.com',

})

data['people'].append({

'name': 'Larry',

'website': 'google.com',

})

data['people'].append({

'name': 'Tim',

'website': 'apple.com',

})

JSON Fileswith open('data.txt', 'w') as outfile:

dump(data, outfile)

{"people": [{"name": "Mark", "website": "facebook.com"}, {"name": "Larry", "website": "google.com"}, {"name": "Tim", "website": "apple.com"}]}

data.txt

JSON Fileswith open('data.txt') as infile:

data = load(infile)

for p in data['people']:

print('Name: ' + p['name'])

print('Website: ' + p['website'])

print('')

{"people": [{"name": "Mark", "website": "facebook.com"}, {"name": "Larry", "website": "google.com"}, {"name": "Tim", "website": "apple.com"}]}

data.txt

JSON Fileswith open('data.txt') as infile:

data = load(infile)

for p in data['people']:

print('Name: ' + p['name'])

print('Website: ' + p['website'])

print('')

{"people": [{"name": "Mark", "website": "facebook.com"}, {"name": "Larry", "website": "google.com"}, {"name": "Tim", "website": "apple.com"}]}

data.txt

Name: Mark Website: facebook.com

Name: Larry Website: google.com

Name: Tim Website: apple.com

XML Parsing

XML Example

<?xml version="1.0" encoding="UTF-8"?> <note>

<to>Tove</to> <from>Jani</from>

<heading>Reminder</heading> <body>Don't forget me this weekend!</body>

</note>

XML Example 2<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country> <country name="Singapore">

<rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/>

</country> <country name="Panama">

<rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data>

XML Example 2<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country> <country name="Singapore">

<rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/>

</country> <country name="Panama">

<rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data> countryXML.xml

XML Example 2<?xml version="1.0"?> <data>

<country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/>

<neighbor name="Switzerland" direction="W"/>

</country> <country name="Singapore">

<rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/>

</country> <country name="Panama">

<rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/>

<neighbor name="Colombia" direction="E"/>

</country> </data> countryXML.xml

XML Parsing

With xml.etree.ElementTree xml.etree.ElementTree loads the whole file, you can then navigate in the tree structure.

import xml.etree.ElementTree as ET tree = ET.parse(‘countryXML.xml')

XML Parsingimport xml.etree.ElementTree as ET tree=ET.parse('countryXML.xml') root=tree.getroot() print(root.tag) print(root.attrib) print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

XML Parsingimport xml.etree.ElementTree as ET tree=ET.parse('countryXML.xml') root=tree.getroot() print(root.tag) print(root.attrib) print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data' {} ‘2008’

XML Parsingimport xml.etree.ElementTree as ET tree=ET.parse('countryXML.xml') root=tree.getroot() print(root.tag) print(root.attrib) print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data' {} ‘2008’

country {‘name’: ‘Liechtenstein’} country {‘name’: ‘Singapore’} country {‘name’: ‘Panama’}

XML Parsingimport xml.etree.ElementTree as ET tree=ET.parse('countryXML.xml') root=tree.getroot() print(root.tag) print(root.attrib) print(root[0][1].text)

for child in root:

print(child.tag, child.attrib)

for n in root.iter(‘neighbor’):

print(n.attrib)

'data' {} ‘2008’

country {‘name’: ‘Liechtenstein’} country {‘name’: ‘Singapore’} country {‘name’: ‘Panama’}

{'direction': 'E', 'name': 'Austria'}{'direction': 'W', 'name': 'Switzerland'}{'direction': 'N', 'name': 'Malaysia'}{'direction': 'W', 'name': 'Costa Rica'}{'direction': 'E', 'name': 'Colombia'}

XML Parsingimport xml.etree.ElementTree as ET tree = ET.parse('ContryXML.xml') root = tree.getroot()

# Or Short: root = ET.fromstring(country_data_as_string)

print("---------------country")

for child in root:

print(child.tag, child.attrib)

print("---------------Rank:")

for rank in root.iter('rank'):

print(rank.text)

print("---------------neighbors") for neighbor in root.iter('neighbor'):

print(neighbor.attrib)

print("---------------neighbors name")

for neighbor in root.iter('neighbor'):

print(neighbor.get('name'))

print("---------------country and neighbors")

for child in root:

print("the neighbors of",child.get('name'),":")

for neighbor in root.iter('neighbor'):

print(neighbor.get('name'))

CSV Parsing

CSV File

CSV Filename,iso_a3,currency_code,local_price,dollar_ex,GDP_dollar,date Argentina,ARG,ARS,2.5,1,,2000-04-01 Australia,AUS,AUD,2.59,1.68,,2000-04-01 Brazil,BRA,BRL,2.95,1.79,,2000-04-01 Britain,GBR,GBP,1.9,0.632911392,,2000-04-01 Canada,CAN,CAD,2.85,1.47,,2000-04-01 Chile,CHL,CLP,1260,514,,2000-04-01 China,CHN,CNY,9.9,8.28,,2000-04-01 Czech Republic,CZE,CZK,54.37,39.1,,2000-04-01 Denmark,DNK,DKK,24.75,8.04,,2000-04-01 Euro area,EUZ,EUR,2.56,1.075268817,,2000-04-01 Hong Kong,HKG,HKD,10.2,7.79,,2000-04-01 Hungary,HUN,HUF,339,279,,2000-04-01 Indonesia,IDN,IDR,14500,7945,,2000-04-01 Israel,ISR,ILS,14.5,4.05,,2000-04-01 Japan,JPN,JPY,294,106,,2000-04-01 Malaysia,MYS,MYR,4.52,3.8,,2000-04-01 Mexico,MEX,MXN,20.9,9.41,,2000-04-01 New Zealand,NZL,NZD,3.4,2.01,,2000-04-01 Poland,POL,PLN,5.5,4.3,,2000-04-01 Russia,RUS,RUB,39.5,28.5,,2000-04-01

CSV Parsing

from csv import *

with open('big-mac-source-data.csv', newline='') as csvfile:

r = reader(csvfile, delimiter=',', quotechar='|')

for row in r:

if(row[0] == "France"):

print(str(row[0]) + ',' + str(row[3]) + ',' + str(row[6]) )

CSV ParsingFrance,3.5,2011-07-01 France,3.6,2012-01-01 France,3.6,2012-07-01 France,3.6,2013-01-01 France,3.9,2013-07-01 France,3.8,2014-01-01 France,3.9,2014-07-01 France,3.9,2015-01-01 France,4.1,2015-07-01 France,4.1,2016-01-01 France,4.1,2016-07-01 France,4.1,2017-01-01 France,4.1,2017-07-01 France,4.2,2018-01-01 France,4.2,2018-07-01 France,4.2,2019-01-01 France,4.2,2019-07-09 France,4.2,2020-01-14 France,4.2,2020-07-01

from csv import *

with open('big-mac-source-data.csv', newline='') as csvfile:

r = reader(csvfile, delimiter=',', quotechar='|')

for row in r:

if(row[0] == "France"):

print(str(row[0]) + ',' + str(row[3]) + ',' + str(row[6]) )

HTML Parsing

Beautiful SoupMake a soup (a navigable version of a string)Browse a soup soup.find("tag") / soup.tag (returns soup)soup.find_all("tag") / soup("tag") (returns list)soup.find("tag", {'attr_name': 'attr_value'})soup.contents (list of children)

Extract text soup.decode_contents(): returns soup as stringsoup.encode_contents(): returns soup as bytessoup.text: return soup as tagless stringsoup['attr_name']: return attribute value

Make a Soupfrom requests import *

from bs4 import BeautifulSoup as bs

news = "https://www.lip6.fr/production/publications-type.php?id=-1&annee=2020&type_pub=ART"

r = get(news)

soup = bs(r.text, features="lxml")

Example

Example

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700'}))

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700'}))

<li class="D700"><strong>L. Amorim Reis, A. Murillo Piedrahita, S. Rueda Rodríguez, N. Castro Fernandes, D. Scherly Varela de Medeiros, M. Dias De Amorim, D. Ferrazani Mattos</strong> : “<a href="https://hal.archives-ouvertes.fr/hal-02569404">Unsupervised and Incremental Learning Orchestration for Cyber-Physical Security</a>”, Transactions on emerging telecommunications technologies, (Wiley-Blackwell) [Amorim Reis 2020]</li>

Browse Soup and Extract Text

print(soup.find('li', {'class': ‘D700’}))

for p in soup.find_all('li', {'class': 'D700'}):

print(p.find(‘a’)[‘href'])

https://hal.archives-ouvertes.fr/hal-02569404 https://hal.archives-ouvertes.fr/hal-02945354 https://hal.archives-ouvertes.fr/hal-02986029 https://hal.archives-ouvertes.fr/hal-02980298 https://hal.archives-ouvertes.fr/hal-02985997 https://hal.archives-ouvertes.fr/hal-02443135 https://hal.archives-ouvertes.fr/hal-02911665 https://hal.archives-ouvertes.fr/hal-02931632 https://hal.archives-ouvertes.fr/hal-02527916 https://hal.archives-ouvertes.fr/hal-02955863 https://hal.archives-ouvertes.fr/hal-02984494 https://hal.archives-ouvertes.fr/hal-02945921 https://hal.archives-ouvertes.fr/hal-02906806 https://hal.archives-ouvertes.fr/hal-02985461 https://hal.archives-ouvertes.fr/hal-02400963 https://hal.archives-ouvertes.fr/hal-02929626 https://hal.archives-ouvertes.fr/hal-01805478 https://hal.archives-ouvertes.fr/hal-02682005 https://hal.archives-ouvertes.fr/hal-02568587

Some Websites have Python Library!

Wikipedia

Wikipediafrom wikipedia import *

r = page("Python (programming language)")

print(r.summary)

Wikipediafrom wikipedia import *

r = page("Python (programming language)")

print(r.summary)

Python is an interpreted, high-level and general-purpose programming language. Created by Guido van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented, and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Python was created in the late 1980s as a successor to the ABC language. Python 2.0, released in 2000, introduced features like list comprehensions and a garbage collection system with reference counting. Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3. The Python 2 language was officially discontinued in 2020 (first planned for 2015), and "Python 2.7.18 is the last Python 2.7 release and therefore the last Python 2 release." No more security patches or other improvements will be released for it. With Python 2's end-of-life, only Python 3.6.x and later are supported. Python interpreters are available for many operating systems. A global community of programmers develops and maintains CPython, a free and open-source reference implementation. A non-profit organization, the Python Software Foundation, manages and directs resources for Python and CPython development.

Google Scholar

Google Scholarfrom scholarly import *

s = next(scholarly.search_author("Sebastien Tixeuil"))

print(s.interests)

Google Scholarfrom scholarly import *

s = next(scholarly.search_author("Sebastien Tixeuil"))

print(s.interests)

['Algorithms & Theory', 'Computer Networks', 'Distributed Computing']