Web Technologies

Download Report

Transcript Web Technologies

Web Technologies
Typical Web Usage
4. Browser displays
HTML results
1. User interacts with
graphical browser
3.1. Proxy caches
response
3. Server returns HTTP
reply
2. Browser submits
HTTP requests to
server
2.1. Request relayed
by proxy
Components of Web Technology
We are primarily interested in the parts
that have implications for (reliable)
distributed computing
(HTML)
XML
(URLs)
HTTP
Proxy Servers
Core Web Technologies
HTML(HyperText Markup Language)
Defines a standard set of special textual
indicators(markups) specifying how a Web pages
words and images should be displayed by the web
browser
Technologies for Supporting Remote
Clients
Original intent of core Web Technologies
 enable linking and sharing documents
It was quickly realized, that by wrapping local
information systems to expose their presentation
layer by using HTML documents, one could leverage
the core Web technologies to have clients that are
distributed across the internet.
HTML
 HyperText Markup Language
 Text format for publishing hypertexts on the World Wide Web
 Based on Standard Generalized Markup Language (SGML; ISO 8879) (as
is XML)
 Created in 1991, HTML 2.0 in 1994 (60 pages), HTML 4.01 (> 350 pages)
in 1997, now work on XHTML
 Representation rather than presentation – sort of...
 HTML is not XML
 E.g., <br>: start tag required, end tag forbidden
 XHTML: HTML in XML
XML
XML declaration
 Extensible Markup Language
 Extensible
 XML is a framework for defining
languages tailored to application
domains
 Markup
 XML documents are made up of
entities
 Entity data contains intermingled
character data or markup
 No fixed set of markup tags
 An example...
 Reference
 http://www.w3.org/TR/2004/REC-xml20040204/
attribute
<?xml version="1.0" encoding="UTF-8"?>
<patient id="301174-...">
<name>
Klaus Marius Hansen
</name>
<status>
Admitted
</status>
<medicine>
<item>
<dose>100</dose>
<kind>Aspirin</kind>
</item>
<item>
<dose>50</dose>
<kind>Ibuprofen</kind>
</item>
</medicine>
</patient>
character data
element name
element (end
markup) tag
XML Well-Formedness and Validity
 Which patient documents are
Namespace
regarded as describing patients?
 The valid ones
<?xml version="1.0" encoding="UTF-8"?>
<p:patient id="301174-..."
 Have a reference to a
xmlns:p="http://ehr.org"
document describing legal
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
documents
xsi:schemaLocation="http://ehr.org patient.xsd">
 E.g., using XML Schemas <name>
Klaus Marius Hansen
</name>
 Fulfil the requirements in
Location of schema
<medicine>
these
<item>
 Are well-formed
<dose>100</dose>
<kind>Aspirin</kind>
 Well-formed patients...
</item>
<item>
 Matches the ”document”
<dose>50</dose>
production of the XML spec
<kind>Ibuprofen</kind>
</item>
 Including that start and
end tags match and that </medicine>
<status>
element tags are properly Admitted
</status>
nested
</p:patient>
 + other well-formedness
constraints in the spec
Patient XML Schema Example

























<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://ehr.org"
targetNamespace="http://ehr.org">
<xs:element name="patient" type="p:patient_type"/>
<xs:complexType name="patient_type">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="medicine">
<xs:complexType>
<xs:sequence>
<xs:element name="item" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="dose" type="xs:int"/>
<xs:element name="kind" type="xs:string"/>
(Altova XMLSpy
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="status"/>
</xs:sequence>
<xs:attribute name="id" use="required"/>
</xs:complexType>
</xs:schema>
syntax)
XML Schema Constructs
 Constructs
 A complex type definition
 attribute declarations describe which attributes that may or
must appear
 element references: describe which sub-elements that may
or must appear, how many, and in which order
 A simple type definition
 defines a set of strings to be used as attribute values or
character data
 A global element declaration
 associates element names with types
 (in the patient example, the complex type definition was
inlined in the patient element declaration)
 Validity
 An element is valid according to a given schema if associated
element type rules are satisfied
 A document is valid if all its elements are valid
Complex Types
 Attribute declarations
 E.g., <xs:attribute name="id" type="xs:string" use="required"/>
 Content of one of the following content model kinds
 Empty content
 Simple content
 <simpleContent>...</simpleContent>
 Only character data
 Regexp content
 <sequence> ... </sequence>
 <choice> ... </choice>
 <all> ... </all>
e.g., with <element name=item
minOccurs=”0" maxOccurs=”unbounded"/>
Namespaces
XML languages are typically assigned to
namespaces
 <xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://ehr.org"
targetNamespace="http://ehr.org">
 XML Schema
Uses namespaces itself to distinguish XML Schema
constructs from the language being defined
Allows namespace assignments of the language being
defined
Other XML Technologies
 Namespaces
 Linking
XLink
 Addressing parts of documents
XPath
 Transformation
XSL
 Querying
XQuery
 RPC
WSDL and SOAP
XML is Bloated
 An XML encoding is large
 What to do (in particular for RPC using XML)?
 Undecided what actually to do in W3C...
 Compress/decompress XML?
 Compression is expensive (often more than
decompression)
 Assign gateway between XML and other
formats?
 E.g., using XML only for interoperable
messaging
 May introduce single point-of-failure
 Another approach
 Sacrifice self-description
 Create mapping to a more efficient format
 More efficient to serialize and deserialize and
on the wire
 Allow applications to choose between formats
 Such a format could be described by ASN.1
 Formal language for describing messages
exchanged in a distributed system
 Used heavily in telecommunications
 More next time...
HTTP
 HyperText Transfer Protocol
 RPC-style interface to web servers
 Messages represented as user-readable ASCII strings
 May contain encoded information (e.g., Quoted-Printable or
Base64)
 MIME types in Content-Type: and Accept: headers
 Typically runs on TCP/IP, port 80 as default
 But may use other reliable transports
 Behavior
 HTTP/1.0 (RFC 1945) behavior
 Open socket, request, response, close socket
 HTTP/1.1 (RFC 2616) behavior
 Persistent connections
 Good for user
 Good for network
Core Web Technologies
HTTP(HyperText Transfer Protocol)
generic, stateless protocol
governs the transfer of files across a network
developed at CERN (Central European Research
Network), they also came up with the name WWW,
later W3C
supports access to SMTP,FTP and other protocols
was designed to support hypertext
Core Web Technologies
Exchanged information, can be static or dynamic
Every resource, accessible over the Web has a
URL(Uniform resource locator)
HTTP mechanism is based on client/server model
typically using TCP/IP sockets
Core Web Technologies
 since Version 1.1 HTTP requires servers to support
persistent connections, to minimize overhead associated
with opening and closing connections.
 Typical methods on the server side are:
• OPTIONS
 send information about the communication options
• GET
 retrieve document or document produced by a program
• POST
 Append or attach information
• PUT
 Store information
• DELETE
 Delete the resource indicated in the request
Core Web Technologies
Another limitation HTTP is stateless
• Does not provide storing of information between requests
• No indication of any relationship between two different
requests
 cookies, small data structures that a web server
requests the HTTP client to store on the local machine,
are used to maintain state information
e.g. cookies store recently view items on a web shop
HTTP Messages (1)
 HTTP-message
= Request | Response ; HTTP/1.1 messages
 generic-message = start-line
*(message-header CRLF)

CRLF [ message-body ]
 start-line
= Request-Line | Status-Line
 message-header = field-name ":" [ field-value ]’host:’ mandatory
 Request-Line
= Method SP Request-URI SP HTTP-Version CRLF
 Method
= "OPTIONS"
; Section 9.2

| "GET"
; Section 9.3

| "HEAD"
; Section 9.4

| "POST"
; Section 9.5

| "PUT"
; Section 9.6

| "DELETE"
; Section 9.7

| "TRACE"
; Section 9.8

| "CONNECT"
; Section 9.9

| extension-method

An Example GET Interaction
HTTP Messages (2)

HTTP-message






Response

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF





















Status-Code
= Request | Response ; HTTP/1.1 messages
= Status-Line
*(( general-header
| response-header
| entity-header ) CRLF)
CRLF
[ message-body ]
;
;
;
;
Section
Section
Section
Section
6.1
4.5
6.2
7.1
; Section 7.2
Informational
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=
"100" ; Section
"101" ; Section
"200" ; Section
"201" ; Section
"202" ; Section
...
"300" ; Section
"301" ; Section
...
"400" ; Section
"401" ; Section
"402" ; Section
"403" ; Section
"404" ; Section
"405" ; Section
...
"500" ; Section
"501" ; Section
...
extension-code
10.1.1:
10.1.2:
10.2.1:
10.2.2:
10.2.3:
Success
Continue
Switching Protocols
OK
Created
Accepted
Redirection
Client Error
10.3.1: Multiple Choices
10.3.2: Moved Permanently
10.4.1:
10.4.2:
10.4.3:
10.4.4:
10.4.5:
10.4.6:
Bad Request
Unauthorized
Payment Required
Forbidden
Not Found
Method Not Allowed
Server Error
10.5.1: Internal Server Error
10.5.2: Not Implemented
HTTP Methods
 GET
 Retrieves the resource identified by the request URI. May encode request
parameters in URI
 HEAD
 Identical to GET except that a message-body must not be returned. E.g.,
for testing validity, recent modification, accessibility of links
 POST
 Request that server accepts entity enclosed in request as new subordinate
of URI, e.g., to annotate of resources, append to a database, ...
 PUT
 Requests for a resource to be stored under the URI
 DELETE
 Removes the resource identified by the request URI
 OPTIONS
 Returns the HTTP methods the server supports
 CONNECT
 Reserved for proxies that can dynamically switch to a tunnel
 TRACE
 Returns the header fields sent with the TRACE request, e.g., for testing
Proxies
 An intermediary program which
acts as both a client and a server
 Caching
 E.g., GET cacheable
 E.g., HEAD not cacheable
(well, sort of)
 The Web is stateless so stale
caches a problem
 Age: sum of time resident at
caches + time on network
 Used for reliable cache
expiration
 But proxy MAY still return
stale resource with
warning
More on Reliability and Web
Technology
Mostly focussed on security
Authentication and confidentiality
Secure Socket Layer (SSL)
Privacy
Will get back to this next time on web
services
Web Browsers
One of the first problems  web Browsers
were originally intended only to display
static documents, returned by HTTP calls
Difficult to build sophisticated application specific
clients for web browsers
Applets
One answer to this problem  Applets
Java programs, can be embedded in an
HTML document
When the document is downloaded, the program
is executed by the JVM, presented in the
browser, turning the browser into a client by
sending the client code as an applet
• Limitations  download the code
• Advantage  complexity
CGI(Common Gateway Interface)
Web servers must be able to server up
content from dynamic sources
How can a Web server respond to a request by
invoking an application that will automatically
generate a document to be returned
One of the first approaches to solve this problem,
was CGI, a standard mechanism that enables HTTP
servers, to interface with external applications, which
can serve as „gateways“ to the local information
system
CGI
 How does CGI work
 it assigns programs to URLs, so that when the URL is
invoked, the program is executed
 CGI programs often serve as an interface between a
database and a Web server, allowing users to submit
complex queries over the DB through predefined URLs
 When the Web server receives request for the URL, it will run
a program, that will act as a client of the database and submit
the query  executing and packs the result into a HTML
document  returned to remote browser
Servlets
Performance  CGI programs involve a
certain overhead
Separate process for each instance  takes time,
requires a context switch in the operating system
Multiple request results – multiple process
To avoid this overhead, Jave servlets can
be used instead
The idea is exactly the same as in CGI programs, but
the implementation differs.
Servlets
How do they work?
Execution and result is the same, but servlets
are invoked directly by embedding servletspecific information within an HTTP request
 run as threads of the Java server process,
moreover they run as a part of the Web server
 eliminates overhead
Summary
Web technology underlies web services
HTTP is the basic transport
XML a cornerstone in web services definition