The Internet


Very Brief History

The need to move information or data from one physical host computer to another quickly arose as fundamental problem facing the computing industry. Driven perhaps more by government and defense department needs, early in the 1960s computer scientists took up the task of finding ways to connect remote computers. ARPANET, a network funded by the U.S. Advanced Research Projects Agency, was an experimental network that would allow research and development sites to exchange information. The design constraints behind ARPANET included:

  1. Continued operation even if physical sections of the network were lost.
  2. Simple strategy for the addition/deletion of nodes.
  3. Minimal impact on the network when adding or deleting nodes.
  4. Allow computers of different types to communicate easily.

The internet evolved from the solution proposed by Vinton Cerf and Robert Kahn in the early 1970s. The basic concept involved a three-pronged approach. The model offered consisted of 1) a network system with multiple routes of transmission, 2) a method of transmitting information in packets rather than a steady stream, and 3) commons ways of linking incompatiable hardware.

This last constraint led to the development of TCP/IP (Transmission Control Protocol/Internet Protocol) network protocol which quickly became the standard for ARPANET. The conversion of research networks to a TCP/IP base was completed in 1983, and ARPANET became the backbone of the new internet. And, in the beginning there were 213 registered hosts.

The mid-1980s saw the creation of federally funded supercomputing sites around the country, with similar activity in Europe. These supercomputer centers were similar in kind and philosophy to the high-energy physics centers around the world (CERN, in Switzerland, is prominent in this regard). A supercomputer center is a platform that supports the experimental activities of many researchers separated by space and time. Like its high-energy physics counterparts the results of an "experiment" produces enormous amounts of data that must be "moved" from one location to another. Thus, with supercomputing comes the added requirement to provide remote connectivity for researchers. There was also an initial realization that such connectivity could (should) support collaborative activity between the scientific community.

Rising from its meager beginning the Internet infrastructure now interconnects over 10 million computers (9.5 million in January 1996, up from 4.9 million in 1995) with an estimated 20 million users (after writing this line I feel the need to put a counter in similar to that used for lotteries, else the values are out-of-date quickly), and the growth rate is not linear. There are 10 terabytes of information publicly available. The problem we face is providing our users with access to the information. Clearly, the physical access is solved, though we are moving very rapidly to performance problems. The Internet and other network configurations provide the physical layer.

Resource Discovery

Information Retrieval Services
NameDescription
FTP

The file transfer protocol allows you to connect to a remote computer (host) using an FTP program on your machine, browse a list of files available, retrieve files, and navigate the directory structure of the host system. The problem here is what file formats are compatible across various platforms. FTP servers tend to have the files in some compressed form, thus a file once moved must be uncompressed prior to viewing.

Our ability to retrieve the information we need from the ftp server is very dependent on our having prior knowledge of the contents of the ftp site. This usually requires precise knowledge of the file name, even though we can connect (anonymously) and "look" around. The file and directory names are at the discretion of the owner.

FTP Model

FTP Client/Server Model

FTP is an example of a client-server system. You use a client program on your system to connect to the server running on the remote host. The server coordinates activity between the client and the host operating system. Clients developed for PCs move us away from the cryptic world of workstation operating systems by providing interactive browsers that operated on a point-and-click strategy to navigate the directory space.

Additional FTP information.
Gopher

Gopher organizes information into a hierarchy in which intermediate nodes are directories, or indexes, and leaf nodes are documents. Gopher provides a menu-based interface to resources which usually results in easier retrieval of the information you want. Gopher also allows for the display of a document. Originally gopher services were all text-based, but in recent years Gopher services have moved to support more media formats.

A major plus for Gopher servers was the ability to provide automatic links between them. Thus, when a user selects a certain item from a Gopher server menu, the user is moved to a different Gopher server somewhat transparently.

Additional Gopher information.

Information Search Services
NameDescription
Veronica

Gopher provides a mechanism to ease the burden of navigating from site to site by allowing menu-based linking from site to site. Easing the user's burden in navigation does not necessarily ease the user's burden in finding what is needed. The proliferation of information servers (FTP or Gopher) posed an addition problem, where is what I need?

A service, called Veronica was added to Gopher that allows the user to search Gopherspace. A Veronica database is built from the Gopher menus of the world-wide Gopher servers. Since Gopher menus are usually more descriptive than file name, it can be easier to find the information you are seeking. A Veronica search returns a menu of items that matched the string the user submitted.

With the new service comes a new problem, maintenance of the database. Electronic facilities tend to be very dynamic, changing regularly. To provide a searchable database of Gopherspace material requires some mechanism to automatically or semi-automatically update the central database.

ArchieAs the internet grew the problem of locating a resource by finding a machine that contained it became more and more difficult. First, as more nodes were added the user continues to have difficulty maintaining a current list of available host to connect. Second, as individuals "organized" their materials, old remembrances of location became faulty. A strategy was needed that allowed those knowing what they wanted to specify the name and have a search engine provide a list of possible sites. Archie deals directly with that functionality.

Archie creates a central index of file available on various FTP sites (only those sites that have agreed to participate). Periodically, Archie connects to these sites and downloads lists of available files. The lists obtained are merged in the server and indexed.

An Archie search begins when the user provides a string to be matched against the file names in the index. The server will return the complete path name to any file that matches the string provided (there are usually some user preferences that allow you to specify exact match or substring match).

Additional Archie information.
Archie Client/Server Model
WAISWith Archie there is an index of file names and machine names. With Veronica there is an index of Gopher menus and machine names. Clearly, the success of our search is dependent on the naming strategy of those operating the servers. WAIS (Wide Area Information Server) is also an indexing server, but with a real difference. WAIS keeps a keyword index over the content of the document. The user provides the word or words she deems important and WAIS engine returns a document with list of documents, locations of the documents, and a computed value which is basically a probability that the document listed meets the users request.

Additional WAIS information.

Information Communication Services
NameDescription
e-mailE-Mail was an early internet service. E-mail provides a quick method for communicating across vast distances.
UseNetInternet newsgroups are on-line discussions center on a broad topical area that defines the group. Many sites maintain 5000 newsgroups. An item written at a particular site is stored then forwarded to machines that maintain that newsgroup. These machines in turn send this article to yet other machines until, after few hours, the news article is posted throughout the world.

You can find a news group on practically any topic. Refer to the list of Internet Newsgroups to confirm my statement.

IRCInternet Relay Chat allows multiple individuals to converse simultaneously by typing their comments.

The Future?

Current activity on the internet is pushing for desktop-to-desktop conferencing with video, audio, and collaborative applications.


Reading

Press, L. (1995) McLuhan Meets the Net. Communications of the ACM, 38. p. 15-20.


Go To [WWW] [Overview]


References

Hughes, K. (1993) Entering the World-Wide Web: A Guide to Cyberspace. Honolulu Community College.
or
Entering the World-Wide Web: A Guide to Cyberspace. Enterprise Integration Technologies.

Pike, M. A. (1995) Using the Internet. Indianapolis, IN: Que Corp.