It’s only in 1989 that Tim TV Web Berners Lee wrote the first proposal for the World Wide Web, which was proposing a radically different way of sharing information on a global scale, built on the existing infrastructure of the internet.
And in that very short time, we’ve gone from nothing to 2 and 1/2 billion users and over 600 million Web pages. And both of those statistics is changing, going up all the time. We’ve built the largest information infrastructure in human history in just that short space of time. In this lecture, what I’d like to consider is two questions about that. The first one is, how on Earth did we get from there to here?
And very briefly, where exactly is here that we are at the moment? We’ve got some clues already from the previous lecture. So we know that the Web had a history. It didn’t come from nowhere. The Web was linked to technologies that existed before 1989.
The internet, of course, was really important–microchips, the personal computer, file transfer protocols. And it was also linked to much broader technologies that were shaping our modern world –mass production, electricity, the cables that provided the internet. But as well as technological innovations that enabled us to develop the Web, it’s important to recognize that it was linked to a cultural history. As we’ve heard in the previous lecture, it wasn’t the first way of thinking about a global information infrastructure.
And indeed, if you read science fiction at all, go and have a look at William Gibson’s book Neuromancer, was written in 1981. And you’ll find it almost impossible to imagine that that book was written before the Web existed, because there it is, in 1981, in this book. The Web also had a history that was tied in with economics and with social change. So we need to think about the postwar economic boom.
We need to think about electronics. We need to think about the Cold War. We also need to think about mass higher education and the way in which science was funded in the postwar period. So the Web had a history–a technological, a social, an economic, and a political history in terms of where it came from. In 1989, Tim Berners-Lee made a very specific proposal to use HTTP, HTML, and URLs or URIs to share information and to navigate information on a global scale.
At the very beginning, or so the story goes, Tim Berners-Lee kept a notebook, in which he decided he would write down every time a new Website appeared on the internet. And he got to 20 and decided that perhaps he would stop doing it, because it was getting a bit difficult to keep up with it all. You imagine the notebook he’d need now–over 600 million Websites and counting, 2 and 1/2 billion people and counting.
And you know what? The main uses of the Web are not physicists. So how did we get from there to here? A popular way of understanding science and understanding technical innovation is to imagine that innovations take off because they’re very clever and because they’re designed to achieve certain outcomes. So one answer to that question, how do we get from there to here, would be to say, well, it was designed to do that, and it’s a really clever technology.
I’m afraid I think that the answer to that is, no, that’s not how we got from there to here. And there’s three different things I’d like you to think about, which underline my reason for saying no. The first one is that technology on its own is not enough. However clever, however innovative something is, technologies don’t happen on their own. They happen because people use them.
And people use them or don’t use them depending on the circumstances of their lives, depending on their motivations, depending on all kinds of social and economic factors. So the World Wide Web is a really obvious point. It needs to be able to read and write. If we don’t have maths literacy, no one’s going to use the Web, or at least not on the scale that we’re used to.
We need disposable incomes. If people can’t pay for access to the internet, they can’t buy computers, they’re not going to use the World Wide Web. Slightly more complicated, this, but it needed a range of use values. So if all you could do on the Web was share physics datasets, not very many people would be using it. All the physicists might be, but nobody else would be using it. And it also needed an open model. If the Web had been copyrighted, if we have to pay every time we wanted to use it, would it looks like it looks today? I really don’t think that it would. And those of you who watched the opening ceremony of the Olympics last year in 2012 might remember Tim Berners-Lee being present to that ceremony with a message flashing around the Olympic Stadium in London, saying, this is for everyone.
And that has been a really important decision, I would say almost as important as the technologies themselves, in shaping how we got from there to here. So that’s the first reason. The second reason why we can’t just say that this was inevitable outcome of the technology that was developed is because the Web we have now, even in technical terms, is not the Web we had in 1989.
In 1989, or 1990, I suppose, to be more accurate, you could put static Web pages up–text, no visuals. And the only people really who could put Websites up were those who had quite high-level technical skills to be able to do that. All of that changed as we moved into a second generation of the Web, what people have called Web 2.0, where it started to look much nicer. You could have visuals, you could have dynamic Web pages.
All of it became much fancier, much more interesting and engaging. But also really importantly, Web 2.0 is used to describe a phase of the Web where user-generated content became possible. So it wasn’t just a relatively small number of people with high technical skills who could put information on the Web. All of us–you, me, anybody with access to the Web could put their information out there, whether it’s on Facebook or Twitter or whether we’re blogging, a whole range of ways in which people can share information, share their photographs, share their life histories, sell their products, be on eBay, whatever it is, user-generated content is driving the Web or has driven the Web to a large extent in terms of that growth in the recent period. It’s not stopping there. People now are talking about Web 3.0. And that’s something we’ll talk about later on it. But that is going to change again how the Web is and how we’re using it. The third reason why we can’t simply say, oh, the Web grew because it was a great technology, is because we’ve had to work very, very hard to make the Web what it is today.
Some of you will have heard of an organisation called W3C, the World Wide Web Consortium. The World Wide Web Consortium is an organisation that develops protocols and guidelines to ensure the stability of the Web and the continued growth of the Web. It’s an organisation that brings together governments, businesses, academics, a whole range of people who negotiate long and hard over how to enable the Web to continue to function in a stable,reliable, and sustainable kind of way. And it’s really important to know that at W3C, there’s two underpinning values. One is, the Web is for everyone. And two is, the second is, the Web is for everything. It has to be possible to use the Web on any kind of device, not on one that’s produced by one company or another company or a particular kind of device, but on any kind of device.
And again, you can imagine if that hadn’t been the case, the Web might look very different today to how it does. W3C isn’t the only organisation that’s doing all that hard work to try and hold the Web together. But it’s a very powerful organisation, and it has as its vision–I think it’s important to say this–a commitment to participation, knowledge sharing, and trust. And that’s not easy. That’s really, really hard work–the effort, the energy, that it takes to hold the Web together.
So we’ve gone from physicists sharing data to eBay, Twitter, through all of those mechanisms I’ve just described. And the Web, what it is and what it’s become, is really, really complicated. And that’s why we need Web science to help us to understand it.
The Web has changed the world to be sure, but the world has also changed the Web over that period of the last 25 years. People have taken it, they’ve used it, they’ve transformed it, and all kinds of unexpected things have happened. And we really don’t know what’s going to happen next. Where are we at the moment? The last question I want to consider–where are we now? Two and a half billion users is amazing. But we’re heading for 7 and 1/2 billion people in the wild.
Most people in the world don’t use the Web. What’s going to happen when more people start using the Web? What are the consequences of the fact that most people are excluded or not included at least from the Web? Many of those people are in countries outside of the West, but it’s estimated that 15 million people in the UK have never used the Web. So we have to be really careful when we talk about the Web and we say its changed all our lives and we all use the Web, because we don’t.
And we need to think very carefully about the consequences of that. Where we are now is also not guaranteed. The Web that we have now is not inevitable. The work that I’ve described to hold it together, the things that make the Web what it is today are not guaranteed. And we need to be very careful to consider if and how we want to keep the Web that we have now the way this it is now and what might happen if changes are allowed, changes for example in the way that government’s access our data in terms of questions about privacy, governance of the Web, corporate ownership of parts of the Web, and so on.
The Web that we have now is not guaranteed. Lastly, the Web will not stand still. The Web is going to change. There’s no doubt about that. I think all that effort couldn’t hold it still if we wanted it to. So we all need to take responsibility for the Web for understanding where it is, where it might go in the future, and our part in that. And those are the challenges that Web Science faces.
Component-based programming has become more popular than ever. Hardly an application is built today that does not involve leveraging components in some form, usually from different vendors. As applications have grown more sophisticated, the need to leverage components distributed on remote machines has also grown.
An example of a component-based application is an end-to-end e-commerce solution. An e-commerce application residing on a Web farm needs to submit orders to a back-end Enterprise Resource Planning (ERP) application. In many cases, the ERP application resides on different hardware and might run on a different operating system.
The Microsoft Distributed Component Object Model (DCOM), a distributed object infrastructure that allows an application to invoke Component Object Model (COM) components installed on another server, has been ported to a number of non-Windows platforms. But DCOM has never gained wide acceptance on these platforms, so it is rarely used to facilitate communication between Windows and non-Windows computers. ERP software vendors often create components for the Windows platform that communicate with the back-end system via a proprietary protocol.
Some services leveraged by an e-commerce application might not reside within the datacenter at all. For example, if the e-commerce application accepts credit card payment for goods purchased by the customer, it must elicit the services of the merchant bank to process the customer’s credit card information. But for all practical purposes, DCOM and related technologies such as CORBA and Java RMI are limited to applications and components installed within the corporate datacenter. Two primary reasons for this are that by default these technologies leverage proprietary protocols and these protocols are inherently connection oriented.
Clients communicating with the server over the Internet face numerous potential barriers to communicating with the server. Security-conscious network administrators around the world have implemented corporate routers and firewalls to disallow practically every type of communication over the Internet. It often takes an act of God to get a network administrator to open ports beyond the bare minimum.
If you’re lucky enough to get a network administrator to open up the appropriate ports to support your service, chances are your clients will not be as fortunate. As a result, proprietary protocols such those used by DCOM, CORBA, and Java RMI are not practical for Internet scenarios.
The other problem, as I said, with these technologies is that they are inherently connection oriented and therefore cannot handle network interruptions gracefully. Because the Internet is not under your direct control, you cannot make any assumptions about the quality or reliability of the connection. If a network interruption occurs, the next call the client makes to the server might fail.
The connection-oriented nature of these technologies also makes it challenging to build the load-balanced infrastructures necessary to achieve high scalability. Once the connection between the client and the server is severed, you cannot simply route the next request to another server.
Developers have tried to overcome these limitations by leveraging a model called stateless programming, but they have had limited success because the technologies are fairly heavy and make it expensive to reestablish a connection with a remote object.
Because the processing of a customer’s credit card is accomplished by a remote server on the Internet, DCOM is not ideal for facilitating communication between the e-commerce client and the credit card processing server. As in an ERP solution, a third-party component is often installed within the client’s datacenter (in this case, by the credit card processing solution provider). This component serves as little more than a proxy that facilitates communication between the e-commerce software and the merchant bank via a proprietary protocol.
Do you see a pattern here? Because of the limitations of existing technologies in facilitating communication between computer systems, software vendors have often resorted to building their own infrastructure. This means resources that could have been used to add improved functionality to the ERP system or the credit card processing system have instead been devoted to writing proprietary network protocols.
In an effort to better support such Internet scenarios, Microsoft initially adopted the strategy of augmenting its existing technologies, including COM Internet Services (CIS), which allows you to establish a DCOM connection between the client and the remote component over port 80. For various reasons, CIS was not widely accepted.
It became clear that a new approach was needed. So Microsoft decided to address the problem from the bottom up. Let’s look at some of the requirements the solution had to meet in order to succeed.
- Interoperability The remote service must be able to be consumed by clients on other platforms.
- Internet friendliness The solution should work well for supporting clients that access the remote service from the Internet.
- Strongly typed interfaces There should be no ambiguity about the type of data sent to and received from a remote service. Furthermore, datatypes defined by the remote service should map reasonably well to datatypes defined by most procedural programming languages.
- Ability to leverage existing Internet standards The implementation of the remote service should leverage existing Internet standards as much as possible and avoid reinventing solutions to problems that have already been solved. A solution built on widely adopted Internet standards can leverage existing toolsets and products created for the technology.
- Support for any language The solution should not be tightly coupled to a particular programming language. Java RMI, for example, is tightly coupled to the Java language. It would be difficult to invoke functionality on a remote Java object from Visual Basic or Perl. A client should be able to implement a new Web service or use an existing Web service regardless of the programming language in which the client was written.
- Support for any distributed component infrastructure The solution should not be tightly coupled to a particular component infrastructure. In fact, you shouldn’t be required to purchase, install, or maintain a distributed object infrastructure just to build a new remote service or consume an existing service. The underlying protocols should facilitate a base level of communication between existing distributed object infrastructures such as DCOM and CORBA.
Given the title of this book, it should come as no surprise that the solution Microsoft created is known as Web services. A Web service exposes an interface to invoke a particular activity on behalf of the client. A client can access the Web service through the use of Internet standards.
Web Services Building Blocks
The following graphic shows the core building blocks needed to facilitate remote communication between two applications.
Let’s discuss the purpose of each of these building blocks. Because many readers are familiar with DCOM, I will also mention the DCOM equivalent of each building block.
- Discovery The client application that needs access to functionality exposed by a Web service needs a way to resolve the location of the remote service. This is accomplished through a process generally termed discovery. Discovery can be facilitated via a centralized directory as well as by more ad hoc methods. In DCOM, the Service Control Manager (SCM) provides discovery services.
- Description Once the end point for a particular Web service has been resolved, the client needs sufficient information to properly interact with it. The description of a Web service encompasses structured metadata about the interface that is intended to be consumed by a client application as well as written documentation about the Web service including examples of use. A DCOM component exposes structured metadata about its interfaces via a type library (typelib). The metadata within a component’s typelib is stored in a proprietary binary format and is accessed via a proprietary application programming interface (API).
- Message format In order to exchange data, a client and a server have to agree on a common way to encode and format the messages. A standard way of encoding data ensures that data encoded by the client will be properly interpreted by the server. In DCOM, messages sent between a client and a server are formatted as defined by the DCOM Object RPC (ORPC) protocol.
Without a standard way of formatting the messages, developing a toolset to abstract the developer from the underlying protocols is next to impossible. Creating an abstraction layer between the developer and the underlying protocols allows the developer to focus more on the business problem at hand and less on the infrastructure required to implement the solution.
- Encoding The data transmitted between the client and the server needs to be encoded into the body of the message. DCOM uses a binary encoding scheme to serialize the data contained by the parameters exchanged between the client and the server.
- Transport Once the message has been formatted and the data has been serialized into the body of the message, the message must be transferred between the client and the server over some transport protocol. DCOM supports a number of proprietary protocols bound to a number of network protocols such as TCP, SPX, NetBEUI, and NetBIOS over IPX.
Web Services Design Decisions
Let’s discuss some of the design decisions behind these building blocks for Web services.
Choosing Transport Protocols
The first step was to determine how the client and the server would communicate with each other. The client and the server can reside on the same LAN, but the client might potentially communicate with the server over the Internet. Therefore, the transport protocol must be equally suited to LAN environments and the Internet.
As I mentioned earlier, technologies such as DCOM, CORBA, and Java RMI are ill suited for supporting communication between the client and the server over the Internet. Protocols such as Hypertext Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP) are proven Internet protocols. HTTP defines a request/response messaging pattern for submitting a request and getting an associated response. SMTP defines a routable messaging protocol for asynchronous communication. Let’s examine why HTTP and SMTP are well suited for the Internet.
HTTP-based Web applications are inherently stateless. They do not rely on a continuous connection between the client and the server. This makes HTTP an ideal protocol for high-availability configurations such as firewalls. If the server that handled the client’s original request becomes unavailable, subsequent requests can be automatically routed to another server without the client knowing or caring.
Almost all companies have an infrastructure in place that supports SMTP. SMTP is well suited for asynchronous communication. If service is disrupted, the e-mail infrastructure automatically handles retries. Unlike with HTTP, you can pass SMTP messages to a local mail server that will attempt to deliver the mail message on your behalf.
The other significant advantage of both HTTP and SMTP is their pervasiveness. Employees have come to rely on both e-mail and their Web browsers, and network administrators have a high comfort level supporting these services. Technologies such as network address translation (NAT) and proxy servers provide a way to access the Internet via HTTP from within otherwise isolated corporate LANs. Administrators will often expose an SMTP server that resides inside the firewall. Messages posted to this server will then be routed to their final destination via the Internet.
In the case of credit card processing software, an immediate response is needed from the merchant bank to determine whether the order should be submitted to the ERP system. HTTP, with its request/response message pattern, is well suited to this task.
Most ERP software packages are not capable of handling large volumes of orders that can potentially be driven from the e-commerce application. In addition, it is not imperative that the orders be submitted to the ERP system in real time. Therefore, SMTP can be leveraged to queue orders so that they can be processed serially by the ERP system.
If the ERP system supports distributed transactions, another option is to leverage Microsoft Message Queue Server (MSMQ). As long as the e-commerce application and the ERP system reside within the same LAN, connectivity via non-Internet protocols is less of an issue. The advantage MSMQ has over SMTP is that messages can be placed and removed from the queue within the scope of a transaction. If an attempt to process a message that was pulled off the queue fails, the message will automatically be placed back in the queue when the transaction aborts.
Choosing an Encoding Scheme
HTTP and SMTP provide a means of sending data between the client and the server. However, neither specifies how the data within the body of the message should be encoded. Microsoft needed a standard, platform-neutral way to encode data exchanged between the client and the server.
Because the goal was to leverage Internet-based protocols, Extensible Markup Language (XML) was the natural choice. XML offers many advantages, including cross-platform support, a common type system, and support for industry -standard character sets.
Binary encoding schemes such as those used by DCOM, CORBA, and Java RMI must address compatibility issues between different hardware platforms. For example, different hardware platforms have different internal binary representation of multi-byte numbers. Intel platforms order the bytes of a multi-byte number using the little endian convention; many RISC processors order the bytes of a multi-byte number using the big endian convention.
XML avoids binary encoding issues because it uses a text-based encoding scheme that leverages standard character sets. Also, some transport protocols, such as SMTP, can contain only text-based messages.
Binary methods of encoding, such as those used by DCOM and CORBA, are cumbersome and require a supporting infrastructure to abstract the developer from the details. XML is much lighter weight and easier to handle because it can be created and consumed using standard text-parsing techniques.
In addition, a variety of XML parsers are available to further simplify the creation and consumption of XML documents on practically every modern platform. XML is lightweight and has excellent tool support, so XML encoding allows incredible reach because practically any client on any platform can communicate with your Web service.
Choosing a Formatting Convention
It is often necessary to include additional metadata with the body of the message. For example, you might want to include information about the type of services that a Web service needs to provide in order to fulfill your request, such as enlisting in a transaction or routing information. XML provides no mechanism for differentiating the body of the message from its associated data.
Transport protocols such as HTTP provide an extensible mechanism for header data, but some data associated with the message might not be specific to the transport protocol. For example, the client might send a message that needs to be routed to multiple destinations, potentially over different transport protocols. If the routing information were placed into an HTTP header, it would have to be translated before being sent to the next intermediary over another transport protocol, such as SMTP. Because the routing information is specific to the message and not the transport protocol, it should be a part of the message.
Simple Object Access Protocol (SOAP) provides a protocol-agnostic means of associating header information with the body of the message. Every SOAP message must define an envelope. The envelope has a body that contains the payload of the message and a header that can contain metadata associated with the message.
SOAP imposes no restrictions on how the message body can be formatted. This is a potential concern because without a consistent way of encoding the data, it is difficult to develop a toolset that abstracts you from the underlying protocols. You might have to spend a fair amount of time getting up to speed on the Web service’s interface instead of solving the business problem at hand.
What was needed was a standard way of formatting a remote procedure call (RPC) message and encoding its list of parameters. This is exactly what Section 7 of the SOAP specification provides. It describes a standard naming convention and encoding style for procedure-oriented messages.
Because SOAP provides a standard format for serializing data into an XML message, platforms such as ASP.NET and Remoting can abstract away the details for you.
Choosing Description Mechanisms
SOAP provides a standard way of formatting messages exchanged between the Web service and the client. However, the client needs additional information in order to properly serialize the request and interpret the response. XML Schema provides a means of creating schemas that can be used to describe the contents of a message.
XML Schema provides a core set of built-in datatypes that can be used to describe the contents of a message. You can also create your own datatypes. For example, the merchant bank can create a complex datatype to describe the content and structure of the body of a message used to submit a credit card payment request.
A schema contains a set of datatype and element definitions. A Web service uses the schema not only to communicate the type of data that is expected to be within a message but also to validate incoming and outgoing messages.
A schema alone does not provide enough information to effectively describe a Web service, however. The schema does not describe the message patterns between the client and the server. For example, a client needs to know whether to expect a response when an order is posted to the ERP system. A client also needs to know over what transport protocol the Web service expects to receive requests. Finally, the client needs to know the address where the Web service can be reached.
This information is provided by a Web Services Description Language (WSDL) document. WSDL is an XML document that fully describes a particular Web service. Tools such as ASP.NET WSDL.exe and Remoting SOAPSUDS.exe can consume WSDL and automatically build proxies for the developer.
As with any component used to build software, a Web service should also be accompanied by written documentation for developers who program against the Web service. The documentation should describe what the Web service does, the interfaces it exposes, and some examples of how to use it. Good documentation is especially important if the Web service is exposed to clients over the Internet.
Choosing Discovery Mechanisms
Once you’ve developed and documented a Web service, how can potential clients locate it? If the Web service is designed to be consumed by a member of your development team, your approach can be pretty informal, such as sharing the URL of the WSDL document with your peer a couple of cubicles down. But when potential clients are on the Internet, advertising your Web service effectively is an entirely different story.
What’s needed is a common way to advertise Web services. Universal Description, Discovery, and Integration (UDDI) provides just such a mechanism. UDDI is an industry-standard centralized directory service that can be used to advertise and locate Web services. UDDI allows users to search for Web services using a host of search criteria, including company name, category, and type of Web service.
Web services can also be advertised via DISCO, a proprietary XML document format defined by Microsoft that allows Web sites to advertise the services they expose. DISCO defines a simple protocol for facilitating a hyperlink style for locating resources. The primary consumer of DISCO is Microsoft Visual Studio.NET. A developer can target a particular Web server and navigate through the various Web services exposed by the server.
What’s Missing from Web Services?
You might have noticed that some key items found within a distributed component infrastructure are not defined by Web services. Two of the more noticeable omissions are a well-defined API for creating and consuming Web services and a set of component services, such as support for distributed transactions. Let’s discuss each of these missing pieces.
- Web service -specific API Most distributed component infrastructures define an API to perform such tasks as initializing the runtime, creating an instance of a component, and reflecting the metadata used to describe the component. Because most high-level programming languages provide some degree of interoperability with C, the API is usually exposed as a flat set of C method signatures. RMI goes so far as to tightly couple its API with a single high-level language, Java.
In an effort to ensure that Web services are programming language-agnostic, Microsoft has left it up to individual software vendors to bind support for Web services to a particular platform. I will discuss two Web service implementations for the.NET platform, ASP.NET and Remoting, later in the book.
- Component services The Web services platform does not provide many of the services commonly found in distributed component infrastructures, such as remote object lifetime management, object pooling, and support for distributed transactions. These services are left up to the distributed component infrastructure to implement.
Some services, such as support for distributed transactions, can be introduced later as the technology matures. Others, such as object pooling and possibly object lifetime management, can be considered an implementation detail of the platform. For example, Remoting defines extensions to provide support for object lifetime management, and Microsoft Component Services provides support for object pooling.
Component-based programming has proven to be a boon to developer productivity, but some services cannot be encapsulated by a component that resides within the client’s datacenter. Legacy technologies such as DCOM, CORBA, and Java RMI are ill-suited to allowing clients to access services over the Internet, so Microsoft found it necessary to start from the bottom and build an industry-standard way of accessing remote services.
Web services is an umbrella term that describes a collection of industry- standard protocols and services used to facilitate a base-line level of interoperability between applications. The industry support that Web services has received is unprecedented. Never before have so many leading technology companies stepped up to support a standard that facilitates interoperability between applications, regardless of the platform on which they are run.
One of the contributing factors to the success of Web services is that they’re built on existing Internet standards such as XML and HTTP. As a result, any system capable of parsing text and communicating via a standard Internet transport protocol can communicate with a Web service. Companies can also leverage the investment they have already made in these technologies.