Web Technologies
Download
Report
Transcript Web Technologies
Web Technologies
Typical Web Usage
4. Browser displays
HTML results
1. User interacts with
graphical browser
3.1. Proxy caches
response
3. Server returns HTTP
reply
2. Browser submits
HTTP requests to
server
2.1. Request relayed
by proxy
Components of Web Technology
We are primarily interested in the parts
that have implications for (reliable)
distributed computing
(HTML)
XML
(URLs)
HTTP
Proxy Servers
Core Web Technologies
HTML(HyperText Markup Language)
Defines a standard set of special textual
indicators(markups) specifying how a Web pages
words and images should be displayed by the web
browser
Technologies for Supporting Remote
Clients
Original intent of core Web Technologies
enable linking and sharing documents
It was quickly realized, that by wrapping local
information systems to expose their presentation
layer by using HTML documents, one could leverage
the core Web technologies to have clients that are
distributed across the internet.
HTML
HyperText Markup Language
Text format for publishing hypertexts on the World Wide Web
Based on Standard Generalized Markup Language (SGML; ISO 8879) (as
is XML)
Created in 1991, HTML 2.0 in 1994 (60 pages), HTML 4.01 (> 350 pages)
in 1997, now work on XHTML
Representation rather than presentation – sort of...
HTML is not XML
E.g., <br>: start tag required, end tag forbidden
XHTML: HTML in XML
XML
XML declaration
Extensible Markup Language
Extensible
XML is a framework for defining
languages tailored to application
domains
Markup
XML documents are made up of
entities
Entity data contains intermingled
character data or markup
No fixed set of markup tags
An example...
Reference
http://www.w3.org/TR/2004/REC-xml20040204/
attribute
<?xml version="1.0" encoding="UTF-8"?>
<patient id="301174-...">
<name>
Klaus Marius Hansen
</name>
<status>
Admitted
</status>
<medicine>
<item>
<dose>100</dose>
<kind>Aspirin</kind>
</item>
<item>
<dose>50</dose>
<kind>Ibuprofen</kind>
</item>
</medicine>
</patient>
character data
element name
element (end
markup) tag
XML Well-Formedness and Validity
Which patient documents are
Namespace
regarded as describing patients?
The valid ones
<?xml version="1.0" encoding="UTF-8"?>
<p:patient id="301174-..."
Have a reference to a
xmlns:p="http://ehr.org"
document describing legal
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
documents
xsi:schemaLocation="http://ehr.org patient.xsd">
E.g., using XML Schemas <name>
Klaus Marius Hansen
</name>
Fulfil the requirements in
Location of schema
<medicine>
these
<item>
Are well-formed
<dose>100</dose>
<kind>Aspirin</kind>
Well-formed patients...
</item>
<item>
Matches the ”document”
<dose>50</dose>
production of the XML spec
<kind>Ibuprofen</kind>
</item>
Including that start and
end tags match and that </medicine>
<status>
element tags are properly Admitted
</status>
nested
</p:patient>
+ other well-formedness
constraints in the spec
Patient XML Schema Example
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://ehr.org"
targetNamespace="http://ehr.org">
<xs:element name="patient" type="p:patient_type"/>
<xs:complexType name="patient_type">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="medicine">
<xs:complexType>
<xs:sequence>
<xs:element name="item" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="dose" type="xs:int"/>
<xs:element name="kind" type="xs:string"/>
(Altova XMLSpy
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="status"/>
</xs:sequence>
<xs:attribute name="id" use="required"/>
</xs:complexType>
</xs:schema>
syntax)
XML Schema Constructs
Constructs
A complex type definition
attribute declarations describe which attributes that may or
must appear
element references: describe which sub-elements that may
or must appear, how many, and in which order
A simple type definition
defines a set of strings to be used as attribute values or
character data
A global element declaration
associates element names with types
(in the patient example, the complex type definition was
inlined in the patient element declaration)
Validity
An element is valid according to a given schema if associated
element type rules are satisfied
A document is valid if all its elements are valid
Complex Types
Attribute declarations
E.g., <xs:attribute name="id" type="xs:string" use="required"/>
Content of one of the following content model kinds
Empty content
Simple content
<simpleContent>...</simpleContent>
Only character data
Regexp content
<sequence> ... </sequence>
<choice> ... </choice>
<all> ... </all>
e.g., with <element name=item
minOccurs=”0" maxOccurs=”unbounded"/>
Namespaces
XML languages are typically assigned to
namespaces
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:p="http://ehr.org"
targetNamespace="http://ehr.org">
XML Schema
Uses namespaces itself to distinguish XML Schema
constructs from the language being defined
Allows namespace assignments of the language being
defined
Other XML Technologies
Namespaces
Linking
XLink
Addressing parts of documents
XPath
Transformation
XSL
Querying
XQuery
RPC
WSDL and SOAP
XML is Bloated
An XML encoding is large
What to do (in particular for RPC using XML)?
Undecided what actually to do in W3C...
Compress/decompress XML?
Compression is expensive (often more than
decompression)
Assign gateway between XML and other
formats?
E.g., using XML only for interoperable
messaging
May introduce single point-of-failure
Another approach
Sacrifice self-description
Create mapping to a more efficient format
More efficient to serialize and deserialize and
on the wire
Allow applications to choose between formats
Such a format could be described by ASN.1
Formal language for describing messages
exchanged in a distributed system
Used heavily in telecommunications
More next time...
HTTP
HyperText Transfer Protocol
RPC-style interface to web servers
Messages represented as user-readable ASCII strings
May contain encoded information (e.g., Quoted-Printable or
Base64)
MIME types in Content-Type: and Accept: headers
Typically runs on TCP/IP, port 80 as default
But may use other reliable transports
Behavior
HTTP/1.0 (RFC 1945) behavior
Open socket, request, response, close socket
HTTP/1.1 (RFC 2616) behavior
Persistent connections
Good for user
Good for network
Core Web Technologies
HTTP(HyperText Transfer Protocol)
generic, stateless protocol
governs the transfer of files across a network
developed at CERN (Central European Research
Network), they also came up with the name WWW,
later W3C
supports access to SMTP,FTP and other protocols
was designed to support hypertext
Core Web Technologies
Exchanged information, can be static or dynamic
Every resource, accessible over the Web has a
URL(Uniform resource locator)
HTTP mechanism is based on client/server model
typically using TCP/IP sockets
Core Web Technologies
since Version 1.1 HTTP requires servers to support
persistent connections, to minimize overhead associated
with opening and closing connections.
Typical methods on the server side are:
• OPTIONS
send information about the communication options
• GET
retrieve document or document produced by a program
• POST
Append or attach information
• PUT
Store information
• DELETE
Delete the resource indicated in the request
Core Web Technologies
Another limitation HTTP is stateless
• Does not provide storing of information between requests
• No indication of any relationship between two different
requests
cookies, small data structures that a web server
requests the HTTP client to store on the local machine,
are used to maintain state information
e.g. cookies store recently view items on a web shop
HTTP Messages (1)
HTTP-message
= Request | Response ; HTTP/1.1 messages
generic-message = start-line
*(message-header CRLF)
CRLF [ message-body ]
start-line
= Request-Line | Status-Line
message-header = field-name ":" [ field-value ]’host:’ mandatory
Request-Line
= Method SP Request-URI SP HTTP-Version CRLF
Method
= "OPTIONS"
; Section 9.2
| "GET"
; Section 9.3
| "HEAD"
; Section 9.4
| "POST"
; Section 9.5
| "PUT"
; Section 9.6
| "DELETE"
; Section 9.7
| "TRACE"
; Section 9.8
| "CONNECT"
; Section 9.9
| extension-method
An Example GET Interaction
HTTP Messages (2)
HTTP-message
Response
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Status-Code
= Request | Response ; HTTP/1.1 messages
= Status-Line
*(( general-header
| response-header
| entity-header ) CRLF)
CRLF
[ message-body ]
;
;
;
;
Section
Section
Section
Section
6.1
4.5
6.2
7.1
; Section 7.2
Informational
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
=
"100" ; Section
"101" ; Section
"200" ; Section
"201" ; Section
"202" ; Section
...
"300" ; Section
"301" ; Section
...
"400" ; Section
"401" ; Section
"402" ; Section
"403" ; Section
"404" ; Section
"405" ; Section
...
"500" ; Section
"501" ; Section
...
extension-code
10.1.1:
10.1.2:
10.2.1:
10.2.2:
10.2.3:
Success
Continue
Switching Protocols
OK
Created
Accepted
Redirection
Client Error
10.3.1: Multiple Choices
10.3.2: Moved Permanently
10.4.1:
10.4.2:
10.4.3:
10.4.4:
10.4.5:
10.4.6:
Bad Request
Unauthorized
Payment Required
Forbidden
Not Found
Method Not Allowed
Server Error
10.5.1: Internal Server Error
10.5.2: Not Implemented
HTTP Methods
GET
Retrieves the resource identified by the request URI. May encode request
parameters in URI
HEAD
Identical to GET except that a message-body must not be returned. E.g.,
for testing validity, recent modification, accessibility of links
POST
Request that server accepts entity enclosed in request as new subordinate
of URI, e.g., to annotate of resources, append to a database, ...
PUT
Requests for a resource to be stored under the URI
DELETE
Removes the resource identified by the request URI
OPTIONS
Returns the HTTP methods the server supports
CONNECT
Reserved for proxies that can dynamically switch to a tunnel
TRACE
Returns the header fields sent with the TRACE request, e.g., for testing
Proxies
An intermediary program which
acts as both a client and a server
Caching
E.g., GET cacheable
E.g., HEAD not cacheable
(well, sort of)
The Web is stateless so stale
caches a problem
Age: sum of time resident at
caches + time on network
Used for reliable cache
expiration
But proxy MAY still return
stale resource with
warning
More on Reliability and Web
Technology
Mostly focussed on security
Authentication and confidentiality
Secure Socket Layer (SSL)
Privacy
Will get back to this next time on web
services
Web Browsers
One of the first problems web Browsers
were originally intended only to display
static documents, returned by HTTP calls
Difficult to build sophisticated application specific
clients for web browsers
Applets
One answer to this problem Applets
Java programs, can be embedded in an
HTML document
When the document is downloaded, the program
is executed by the JVM, presented in the
browser, turning the browser into a client by
sending the client code as an applet
• Limitations download the code
• Advantage complexity
CGI(Common Gateway Interface)
Web servers must be able to server up
content from dynamic sources
How can a Web server respond to a request by
invoking an application that will automatically
generate a document to be returned
One of the first approaches to solve this problem,
was CGI, a standard mechanism that enables HTTP
servers, to interface with external applications, which
can serve as „gateways“ to the local information
system
CGI
How does CGI work
it assigns programs to URLs, so that when the URL is
invoked, the program is executed
CGI programs often serve as an interface between a
database and a Web server, allowing users to submit
complex queries over the DB through predefined URLs
When the Web server receives request for the URL, it will run
a program, that will act as a client of the database and submit
the query executing and packs the result into a HTML
document returned to remote browser
Servlets
Performance CGI programs involve a
certain overhead
Separate process for each instance takes time,
requires a context switch in the operating system
Multiple request results – multiple process
To avoid this overhead, Jave servlets can
be used instead
The idea is exactly the same as in CGI programs, but
the implementation differs.
Servlets
How do they work?
Execution and result is the same, but servlets
are invoked directly by embedding servletspecific information within an HTTP request
run as threads of the Java server process,
moreover they run as a part of the Web server
eliminates overhead
Summary
Web technology underlies web services
HTTP is the basic transport
XML a cornerstone in web services definition