web servers guntis bārzdiņš artūrs lavrenovs. what web servers do?

48
Web servers Guntis Bārzdiņš Artūrs Lavrenovs

Upload: anissa-dennis

Post on 26-Dec-2015

227 views

Category:

Documents


4 download

TRANSCRIPT

Web servers

Guntis BārzdiņšArtūrs Lavrenovs

What web servers do?

What web servers do

● Implement HTTP protocol● Listen for HTTP requests from browsers● Try to fulfill them with static content from file

system● Modern web servers also

– Forward dynamic content requests to other systems– Do lots of useful tasks using modules

What are some of the web servers

C10K problem

● Dan Kegel, 1999● Web servers should handle ten thousand

clients simultaneously● Operating system kernel limitations● Operating system provided functionality● Web server design flaws

C10K problem solution – OS kernel

● Open source nature of unix kernels allowed to quickly identify all C10K bottlenecks and fix them

● Networking related algorithms and data structures in unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n)

● As a result networking capabilities of unix kernels are virtually limitless (limited by hardware resources)

C10K - OS functionality

● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue - *BSD)– Better performance than traditional poll/select

– Can receive all pending event using one system call

● AIO - The POSIX asynchronous I/O (AIO) interface allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background). The application can elect to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all.

C10K – web server design

● Non-blocking I/O for networking and disk– Don't block waiting on action completion, serve

other requests and wait for notifications about I/O completion

● Many threads– Use all available CPU cores to achieve maximum

concurrency, avoid locking data structures● Each thread serves many requests

– Don't create thread per request, reuse threads, while some non-blocking action completes process other requests

C10M problem – Next decade

● 10 million concurrent connections per server● Current unix kernels can't handle that

– Application thread locks in kernel– Hardware drivers (NIC)– Memory management

● Solution: new generation of high load unix kernels– 1 main application per server– Minimize system call ammount– Minimize kernel work

Dynamic content

Dynamic content

● Web servers can't create dynamic content themselves

● We need application created in some programming language

● We need some method how web server can communicate with application– CGI– Apache modules– FastCGI, SCGI, ...– WSGI, PSGI, JSGI, ...

CGI - Common Gateway Interface

● Oldest method of getting dynamic content from web servers

● For each browser request web server defines set of environment variables derived from request and server configuration

● Web server starts application in prepared environment

● Send POST data as standard input (if any)● Waits for standard output from executed file

and returns it to browser

CGI application

● Can be ANY script or binary file executable in UNIX

● No libraries required● Use request information from environment

variables● Or ignore it completely if not needed● Process standard input if needed● Output additional HTTP headers and then

generated document body in standard output

CGI enivronment variables

● REQUEST_METHOD: name of HTTP method

● PATH_INFO: path suffix, if appended to URL after program name and a slash

● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present

● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi

● QUERY_STRING: the part of URL after ? character. The query string may be composed of *name=value pairs separated with ampersands (such as var1=val1&var2=val2...) when used to submit form data transferred via GET method as defined by HTML application/x-www-form-urlencoded

● REMOTE_HOST: host name of the client, unset if server did not perform such lookup

● REMOTE_ADDR: IP address of the client (dot-decimal)

● Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers

● Only few more

CGI example

#!/bin/bash

echo "Content-type: text/plain"

echo ""

echo "Hello world!"

echo "Today is:" `date`

CGI issues

● Each request forces to create new process, big overhead for process creation and destruction

● All script files must be interpreted on each request, another big overhead

● Not scalable● Not suitable for modern web servers● Still widely used in embedded systems (e.g. wifi

router web management console) which require occasional requests

FastCGI● Multiple processes started● Web server communicate over sockets or TCP● Each process serves many requests● Good performance● Complete separation of web server and

dynamic content system● Great scalability – put FastCGI processes

across server farm

Other communication methods

● Integrate dynamic content generation system with web server process (Apache modules)

● CGI derivatives (SCGI)● *SGI implement programming language specific

method of communication between web server and selected programming language (WSGI – Python, PSGI – Perl)

● Proxy requests to applications that implement communication via HTTP

LAMP

● Linux Apache MySQL PHP● Most common web server stack● Simple to install and configure● Simple to develop web applications● Acceptable performance and security

Apache● One of the oldest web servers● Still actively developed● Most popular web server today and in recorded

web server history● Highly configurable and extensible using

modules● All in one solution● Runs on many OS, most often on unix servers

PHP

● One of the most popular web application programming language

● Easy to learn (bad coding practices)● Interpreted language● Functions from unix libraries and tools● Huge ammount of ready applications, libraries

and modules

MySQL

● Unix distributions moving towards MariaDB (MySQL fork) after acquisition by Oracle

● Fast relation DB implementation● Fairly easy to user● Different storage engines (faster without

transactions, slower with, memory based, etc.)● Query caching● User quotas

Historical installation

● Acquire source files for all required software (Apache MySQL PHP)

● Acquire all dependencies and install them● Configure make files via ./configure● Compile everything● Configure each piece of software so it works

with other● Use it

Modern installation

● Use OS package manager– root@server# apt-get install libapache2-mod-php5

apache2 php5 mysql-server● Use it

Simple web site example● Create database user, database, table structure

and maybe some data● Using MySQL command prompt accessed by

– $ mysql -u root -p– > CREATE DATABASE `example` COLLATE

'utf8_general_ci';– > CREATE TABLE `posts` (...)– > CREATE USER 'example'@'localhost' IDENTIFIED

BY PASSWORD '…'– > GRANT ... ON `example`.* TO 'example'@'localhost';– > INSERT INTO `posts` (`title`, `info`) VALUES ('a', 'a');

Simple web site example II

● Or be lazy and use some web interface like phpMyAdmin or Adminer– Download single file adminer.php– Drop it into /var/www– Navigate your browser to

http://localhost/adminer.php– Do all the tasks in browser without really knowing

SQL

Simple web site example III

● Create file example.php in /var/www● Write your HTML and PHP code inside

– Connect to database– Select data– Show data

● Your simple web site is ready● Navigate your browser to

http://localhost/example.php● Enjoy result

Simple web site example - Source

Simple web site example - Result

● From http://localhost/example.php

nginx

● Contestant for 2nd place in web server rating● Event-driven● High-performance (thousands req/s)● Small memory footprint per request● Efficient CPU usage● Advanced configuration and functionality via

modules● Often used as FrontEnd to big websites● CloudFlare built on top of it

High-load web systems

● Big dynamic web site can't reside in only 1 server

● Need some strategy how to split load across multiple web servers

● One possible strategy– One entry point “FrontEnd” which receives all

requests and can handle the load (e.g., Varnish, nginx)

– Backends process requests from FrontEnd (nginx, Apache)

Kas ir Varnish?

● Starpniekserveris (proxy server)– Reversais– Kešojošais– Programmējams

● Slodzes dalītājs (load balancer)● Dinamiskā satura ģenerētājs● Rīki – žurnalēšanas, atkļūdošanas,

monitorēšānas

Kādēļ Varnish?● Fantastiska veiktspēja pat uz lētā gala

serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma● C + LABI C programmētāji● Izmanto Unix arhitektūras labumus

● Pēc tūninga desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s

● Brīva programmatūra (free open source)● Pieprasījuma orientēta domēnspecifiska

konfigurēšanas/programmēšanas valoda VCL● Gandrīz viss, kas nepieciešams augstas

noslodzes tīmeklim, vienā

Kešošana● Jebkura dinamiskas tīmekļa lapas ģenerēšana

ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana● Lētā gala serveris var ģenerēt pāris simtus šādu

dinamisku lapu sekundē● Jebkurš izstrādes ietvars padara dinamiskas

lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāks (it īpaši Java EE, Zend Framework)● Jau tikai daži desmiti pieprasījumi sekundē

● Rupja matemātika 100x100=10 000 reižu lēnāk kā statiska lapa

Kešošana II● Ideja – būtu ideāli atgriezt dinamisku saturu ar

veiktspēju līdzīgu statiskām lapām● Mēs varam saglabāt tās lapas, kas ir vienādas

lietotājam un būtiski nemainās noteiktā laika posmā

● Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai

● Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva

Varnish kešošana● Pēc pieprasījuma adreses (pilnas vai regulāras

izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot – standarta kešošanas pieeja praktiski visur

● Lietotāji – Facebook, Twitter, WikiLeaks, ThePirateBay

● Izstrādāts Norvēģijā● Reklamējas, ka var paātrināt lapas atgriešanu no 300

līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs

● Ātra salīdzinoši ar citām kešošanas pieejām

DSL VCL● Vienkārša sintakse (līdzīga C), kas tiek notranslēta

uz C un tad nokompilēts uz mašīnkodu● =, ==, !=, ~, !~, !, &&, ||, +, “string”● if () {} else {}, set, unset, return

● 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt

● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp

sub vcl_recv {

if (req.request == "GET" && req.url ~ “\.js$”) {

return (lookup); }

}

VCL apstrādes arhitektūra

Integrēšana● Fiksētais kešošanas laiks var nebūt optimāls

● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju

● Retāk – serveri veic nevajadzīgu darbu

● Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina

acl purge { "192.168.0.0"/24; }

sub vcl_recv { if (req.request == "PURGE" ) {

if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } }

sub vcl_hit { if (req.request == "PURGE") {

purge;

error 200 "Purged."; } }

Dinamiskā satura ģenerēšana ESI● Bieži vien tīmekļa lapas sastāv no blokiem, kuru

mainība ir dažāda● Vai arī ir neliels informācijas bloks, kas atbilst katram

lietotājam (piemēram, “Sveiks, Jānis Bērziņš | Tev ir [0] jauns ziņas”)

● Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu uz Varnish

<TABLE><TR><esi:include src=”sveiks.html”/></TR>

<TR><TD><esi:include src=”index.html”/></TD>

<TD><esi:include src=”article.html”/></TD></TR>

</TABLE>● Varnish parsē <esi> birkas un saliek elementus kopā, visi

elementi konfigurēti un kešoti kā neatkarīgi

Slodzes dalīšana● Vienu adresi var apstrādāt vairāki ar bakendi● Dažādus url var apstrādāt dažādi bakendi● Monitorēšana

● Beigto serveru atslēgšana (restart, upgrade, repair)● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)

● Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai

● Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance)

● Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār

Varnish lietojums Latvijā$ curl -I www.tvnet.lv

● HTTP/1.1 200 OK

● Server: Apache

● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT

● Expires: Wed, 07 Nov 2012 20:10:08 GMT

● Cache-Control: max-age=60

● Vary: Accept-Encoding

● Content-Type: text/html; charset=UTF-8

● Content-Length: 185924

● Date: Wed, 07 Nov 2012 20:10:15 GMT

● X-Varnish: 2025605055 2025545136

● Age: 67

● Via: 1.1 varnish

● Connection: keep-alive

● $ curl -I www.delfi.lv

● HTTP/1.1 200 OK

● X-Fe-Node: nuffy

● Content-type: text/html; charset=utf-8

● Server: lighttpd/1.4.31 (PLD Linux)

● Content-Length: 159097

● Date: Wed, 07 Nov 2012 20:20:58 GMT

● X-Varnish: 734492112 734450241

● Age: 58

● Via: 1.1 varnish

● Connection: keep-alive

Nestandarta lietojumi - WAF● Programmējamība ļauj veidot nestandarta lietojumus,

piemēram, WAF● Definējam pēc iespējas precīzākas saņemto pieprasījumu

apstrādes adreses un metodes

– req.url ~ “^/topic/([0-9])$” nevis “^/topic”– req.request == “GET”

● Beigās izmantojam return(error);● Ierobežojam piekļuvi backend serveriem (vai atvienojam no

interneta)● Uzbrucēji tagad uzbrūk frontendam, aizsargājam to● Nepalīdz pret loģiskām (un daudzām citām) ievainojamībām

New trend

● Web application is central thing● Develop application in some framework● No separate web server, it is now just a part of

application (it is library from used framework)● Extremely customizable

Situācija šobrīd

● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas:● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi● Vienlaicīgi apkalpojamo klientu skaits● Savietojamība ar citām tehnoloģijām● Nākotnes attīstības iespējas

Notikumvirzītie programmēšanas ietvari

●Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js)

●Maza izplatība tīmekļa risinājumos●Risina standarta tehnoloģiju problēmas●Reaktora projektējums, C10K problēma●Ļauj tīmekļa programmētājiem veidot tīkla risinājumus

Node.js●Bibliotēku kopums, kas ļauj veidot tīkla

risinājumus JavaScript programmēšanas valodā, darbojas V8 dzinī

●JavaScript dziņu veiktspējas novērtējums●Jaunas saistītās tehnoloģijas – Socket.IO,

CoffeeScript●Problemātiski aspekti - pakotņu pārvaldība,

lietotņu mitināšana