Back to the Basics — Do you know what happens when you type the address of a website?

Back to the Basics — Do you know what happens when you type the address of a website?

I wonder how the internet infrastructure is all connected. Have you ever thought how by just knowing the name of a website, the web page exposed on some XYZ server residing somewhere is shown in front of your laptop screen in just matter of seconds.

It is very interesting!! First, let us dig step by step and see what happens at each step. With this my goal is to understand or picturize what is happening behind the scenes when I type some webpage say “medium.com”

Here are some questions to be explored —

  • What happens when I enter the website address?
  • Who will map my domain name to the Server?
Domain Name System ( DNS) is the Transmission Control Protocol/ Internet Protocol ( TCP/IP) facility that lets you use names rather than numbers to refer to host computers.

Understanding DNS names

DNS is a name service that provides a standardized system for providing names to identify TCP/IP hosts as well as a way to look up the IP address of a host, given the host’s DNS name. For Example, if you use DNS to look up the name www.ebay.com you get the IP address of the eBay web host: 23.288.65.111. Thus, DNS allows you to access the eBay website by using the DNS name www.ebay.com instead of the site’s IP address

Domain and Domain Names

To provide a unique DNS name for every host computer on the internet, DNS uses a time-tested technique: Divide and Conquer. DN uses a hierarchical naming system that is similar to how folders are organized hierarchically on a windows computer. Instead of folders, DNS organizes its names into domains. Each domain includes all the names that appear directly beneath it in the DNS hierarchy.

Few additional details about DNS names:

  • DNS names are case sensitive
  • The name of each DNS node can be up to 63 characters long (not including dot) and can include letters, numbers and hyphens
  • A subdomain is a domain that is beneath an existing domain
  • DNS is a hierarchical naming system that is similar to the hierarchical folder system used by windows
  • The DNS tree is up to 127 levels deep
  • Although DNS tree is shallow. It is very broad

Fully Qualified Domain names

If a domain name ends with a trailing dot, that trailing dot represents the root domain, and the domain name is said to be a fully qualified domain name ( also known as an FQDN). A fully qualified domain name is also called an absolute name. The domain names that does not end with a trailing dot, the name may be interpreted in context of some other domain. Thus, they are called relative names.

DNS and URLs

Whenever we use a web browser to navigate to a resource on the Internet, you use a Uniform Resource Locator (URL) to find your way.

A URL can consist of five distinct parts scheme://authority/path?query#fragment

  • Scheme — Identifies the protocol used to access the resource. The scheme followed by colon (:). When working with HTTP, this should be http:
  • Authority — The authority portion of the URL is strangely named. It generally consists of the hostname that can be resolved by DNS. But it can have additional elements. The host name can be preceded by username followed by the at symbol (@) And it is followed by a port number separated from the host by a colon.
  • Path — The path identifies a specific resource on the host server. It can be the name of a file. or more commonly a file system path that lists one or more folders separated by slashes(/) The path may end with a file name but not always necessary
  • Query — The query part of a URL is optional but very useful. It provides additional information to the server. The query begins with a question mark(?) and consists of one or more key value pairs in the form of key=value.
  • Fragment — The fragment portion of URL is used less commonly than the other elements but is still useful from time to time. It refers to specific part of the resource.

Top- Level Domains

A top-level domain appears immediately beneath the root domain. Top-level domains come in two categories: generic domains and geographic domains.

Generic Domains

Generic domains are the popular top-level domains that you see most often on the internet.

  • com — Commercial organizations
  • edu — Educational Institutions
  • gov — Government Institutions
  • int — International treaty Organizations
  • mil — Military Institutions
  • net — Network Providers
  • org — Noncommercial organizations
  • aero — Aerospace Industry
  • biz — Business
  • coop — Cooperatives
  • info — Information Sites
  • museum — Museums
  • name — Individual Users
  • pro — Professional Organizations

Geographic Domains

Although the top-level domains are open to anyone, U.S companies and organizations dominate them. An additional set of top-level domains corresponds to international country designations.

us — United States, in — India, am — Armenia, ae — United Arab Emirates, jp — Japan etc.,

The Hosts File

We used to track the hosts on the network using hosts file when there are only few devices connected to the network. Then DNS was invented to solve the problem of keeping track of IP address to the host names.

  • The Hosts file is still being used — For small networks, file may still be the easiest way to provide name resolution for the network’s computers. A hosts file can coexist with the DNS. The hosts file is always checked before DNS us used. We can use to override DNS
  • The Hosts file is precursor to the DNS — DNS was devised to circumvent the limitations of the hosts file.
The Hosts file is a simple text file that contains lines that match IP addresses with host names. You can edit the hosts file with any text editor, including notepad. The location of the files in Windows — c:\windows\system32\drivers\etc\ for Unix — /etc/hosts

All TCP/IP implementations are installed with a starter Hosts file. The hosts file ends with comments which show the hsot mapping commands used to map for the host name localhost , mapped tp the IP address 127.0.0.1 The IP address 127.0.0.1 is the standard loopback address. As a result, this entry allows a computer to refer to itself by using the name localhost

After this entry there is one more entry defines the standard IPv6 loopback address(::1)

192.168.168.201 server1.mywebsite.com This line when added to the hosts file, whenever an application requests the IP address of the host name server1.mywebsite.com, the IP address 192.168.168.201 is returned.

Note that even if your network uses DNS, every client still has a hosts file.

Understanding DNS servers and Zones

A DNS server is a computer that runs DNS server software, helps to maintain the DNS databases and responds to DNS name resolution requests from other computers. Although there are many DNS implementations, the two most popular are BIND and the Windows DNS service. BIND runs on Unix based computers and Windows DNS runs on windows computers.

The key to understanding how DN servers work is to realize that the DNS database — is the list of all the domains, subdomains, and host mappings — is a massively distributed database. No single DNS server contains the entire DNS database. Instead, the authority over different parts of the database is delegated to different servers throughout the internet

Zones

To simplify the management of the DNS database, the entire DNS namespace is divided in zones, and the responsibility for each zone is delegated to particular DNS server.

The main reason to delegate the authority for the zone to separate servers is that in the figure you see that one part is responsible to administer the France portion and other US portion.

A Primary Zone — Master copy of a zone. The data for primary zone is stored in the local database of the DNS server that hosts the primary zone. One one DNS server can host a particular primary zone. Any updates to the zone must be made to the primary zone.

A secondary zone — Is read-only copy of the zone. When a server hosts a secondary zone, the server doesn’t store a local copy of the zone data. Instead, it obtains its copy of the zone from the zone’s primary server by using the process called zone transfer. Secondary server periodically check primary servers to see whether their secondary zone data is still current. If not, a zone transfer is initiated to update the secondary zone.

Primary and Secondary Servers

Each DNS server is responsible for one or more zones.

  • Primary server for a zone — which means that the DNS server hosts a primary zone. The data for this zone is stored in files on the DNS server. Every zone must have one primary server
  • Secondary Server for a Zone — which means that the DNS server obtains the data for a secondary zone from a primary server. Every zone should have at least one secondary server. That way, if the primary server goes down, the domain defined by the zone can be accessed via the secondary server or servers

A secondary server should be on a different subnet than the zone’s primary server. If the primary and secondary servers are on the same subnet both servers will be unavailable if the router that controls the subnet goes down

Note that a single DNS server can be the primary server for some zones and a secondary server for other zones. A server is said to be authoritative for the primary and secondary zones that it hosts because it can provide definitive answers for queries against those zones.

Root Servers

The core of DNS comprises the root servers, which are authoritative for the entire Internet. The main function of the root servers is to provide the address of the DNS servers that are responsible for each of the top-level domains. these servers in turn can provide the DNS server address for subdomains beneath the top-level domains.

The root servers are a major part of the glue that holds the Internet together. As you can imagine, the load on these servers through out day and night. A total of 13 root servers are located throughout the world.

DNS Servers learn how to reach the root server by consulting a root hints file that is located on the server. In the Unix/Linux this is known as named.root and can be found at /etc/named.root

Caching

DNS servers keep a cache of query results. The next time the user visits the same name server it is resolved without having to query all those other name servers

DNS gave relatively shot expiration time. The expiration value for DNS data is called the TTL ( Time To Live) is specified in seconds.

Understanding DNS queries

When a DNS client needs to resolve a DNS name to an IP address, it uses a library routine — a resolver — to handle the query. The resolver takes care of sending the query message over the network to the DNS server, receiving and interpreting the response and informing the client of the results of the query.

A DNS client can make two basic types of queries: recursive and iterative.

  • Recursive Queries — When a client issues a recursive DNS query, the server must reply with either IP address of the requested host name or tan error message indicating that the host name doesn’t exist. If the server doesn’t have the information it asks another DNS server for the IP Address. When the first server finally gets the IP address it sends it back to the client. If the server determines that the information doesn’t exist, it returns the error message
  • Iterative Queries — When a server receives an iterative query, it returns the IP address of the requested hostname if it knows the address. If the server doesn’t know the address, it returns a referral, which is simply the address of a DNS server that should know. The client can then issue an iterative query to the server to which it is referred.

DNS clients issue recursive queries to DNS server. If the server knows the answer to the query replies directly to the client. If not, the server issues an iterative query to a DNS server that it thinks should know the answer. If the original server gets an answer from the second server, it return the answer to the client. If it gets the referral, it issues iterative query to third server and so on until it gets answer. Else it returns the answer or the error to the client.

Zone Files and Resource Records

Each DNS zone is defined by a zone file ( also known as DNS database or a master file) A zone file consists of one or more resource records. Creating and updating the resource records that comprise the zone files is one of the primary tasks of a DNS administrator.

The resource records are written as simple text lines with the following fields

Owner TTL CLass Type RDATA

These fields must be separated from each other by one or more spaces

  • Owner — The name of the DNS domain or the host that the record applies to. This is usually specified as fully qualified name or simple host name,
  • TTL — Also known as Time to Live; the number of seconds that the record should be retained in a server’s cache before it is invalidated. If you mot the TTL value for a resource record, a default TTL is obtained from the start of authority (SOA) record
  • Class — Defines the protocol to which the record applies. You should always specify IN, for internet protocol. If you omit the class field, the last class field that you specified explicitly is used. As a result, you will sometimes see zone files that specify IN only on the first resource record and then allow it to default to IN on all subsequent records
  • Type — the resource record type. The most commonly used resource types are shown below. Like the class field, you can also omit the Type field and allow it to default to the last specified value
  • RDATA — Resource record data that is specific to each record type.

The data fits in single line but it needed more then we can use parentheses. Also to include comments we can use semicolon(;) and continues to the end of the line.

SOA Record

Every zone must begin with SOA record which names the zone and provides default information for the zone.

The table lists the fields that appear in the RDATA section of an SOA record. These fields are positional which means we should include a value for all of them and list them in the order specified.

  • The email address of the person responsible for the zone is given in DNS format not the normal email format. For example, doug@mywebsite.com is listed as doug.mywebsite.com
  • The serial number should be incremented every time you change the zone file.
mywebsite.com. IN SOA(
ns1.mywebsite.com ; authoritative name server
doug.mywebsite.com ; responsible person
148 ; version number
3600 ; refresh (1 hour)
600 ; retry (10 minutes)
86400 ; expire ( 1 day)
3600 ; minimum TTL ( 1 hour)
)

NS Records

Name Server (NS) records identify the name servers that are authoritative for th zone. Every zone must have at least one NS record. Using two or more NS records is better so that if the first name server is unavailable, the zone will still be accessible.

The owner field should either be the fully qualified domain name for the zone, with a trailing dot, or at symbol. The RDATA consists of just one field the fully qualified domain name of the name server.

mywebsite.com IN NS ns1.mywebsite.com
mywebsite.com IN NS ns2.mywebsite.com

A Record

Address ( A) records provides the IP addresses for each of the hosts that you want to make accessible via DNS. In an A record, you usually list just the host name in the owner field, thus allowing DNS to add the domain name to derive the fully qualified name for the host. The RDATA field for the A record is the IP address of the host.

The following lines define various hosts for the mywebsite.com domain

doug IN A 192.168.168.200
server1 IN A 192.168.168.201
debbie IN A 192.168.168.202
printer1 IN A 192.168.168.203
router1 IN A 207.126.127.129
www IN A 64.71.129.102

We don’t need to use fully qualified name for host.

CNAME Record

A canonical name (CNAME) record creates an alias for a fully qualified domain name. When a user attempts to access a domain name that is actually an alias, the DNS system substitutes the real domain name — Known as canonical name for the alias. The owner field in the CNAME record provides the name of the alias that you want to create. Then, the RDATA field provides the Canonical name that is the real name on the host.

ftp.mywebsite.com. IN A 207.126.127.132
files.mywebsite.com IN CNAME www1.website.com.

Here the host name of the FTP server at 207.126.127.132 is ftp.mywebsite.com. The CNAME record allows users to access this host at files.mywebsite.com if they prefer.

PTR Record

A pointer record is the opposite of an address record. It provides the fully qualified domain name for the given address. The owner field should specify the reverse lookup domain name and the RDATA field specifies the fully qualified domain name.

102.129.71.64.in-addr.arpa. IN PTR www.mywebsite.com.

PTR records don’t usually appear in normal domain zones. Instead, they appear in special reverse lookup zones.

MX records

Mail Exchange (MX) records identify the mail server for a domain. The owner field provides the domain name that users address mail to. The RDATA section of the record has two fields. The first is a priority number used to determine which mail servers to use when several are available. The second is the fully qualified domain name of the mail server itself

mywebsite.com IN MX 0 mail1.mywebsite.com.
mywebsite.com. IN MX 10 mail2.mywebsite.com.

the priority numbers are 0 and 10.

The server name specified in the RDATA section should be an actual host name, not an alias created by a CNAME record. Although some mail servers can handle MX records that point to CNAMEs, not all can. As a result, you shouldn’t specify an alias in an MX record.

Be sure to create a reverse lookup records for your mail servers. Some mail servers won’t accept mail from a server that doesn’t have valid reverse lookup entries.

Reverse Lookup Zones

Normal DNS queries ask a name server to provide IP address that corresponds to a fully qualified domain name. This kind of query is a forward lookup. A reverse lookup is the opposite of a forward lookup It returns the fully qualified domain name of a host based on its IP address.

Reverse lookups are possible because of a special domain called the in-addr .arpa domain. which provides a separate fully qualified domain name for every possible IP address on the internet. To enable a reverse lookup for a particular IP address, all you have to do is create a PTR record in the reverse lookup zone. The PTR record maps the in-addr .arpa domain name for the address to the host’s actual domain name.

The technique used to create the reverse domain name for a given IP address is pretty clever. It creates the subdomains beneath the in-addr.arpa domain by using the octets of the IP address, listing them in reverse order. because that correlates the network portions of the IP address with the subdomain structure of DNS names

  • The 255 possible values for the first octet of an IP address each have a subdomain beneath the in-addr.arpa domain. For example, any IP address that begins with 207 can be found in the 207.in-addr.arpa domain.
  • Within this domain, each of the possible values for the second octet can be found as a subdomain of the first octet’s domain. Thus, any address that begins with 207.126 can be found in the 126.207.in-addr.arpa domain
  • The same holds true for the third octet, so any address that begins with 207.126.67 can be found in the 67.126.207.in-addr.arpa domain
  • By the time you get to the fourth octet, you have pinpointed a specific host. The fourth octet completes the fully qualified reverse domain name. This 207.126.67.129 is mapped to 129.67.126.207.in-addr.arpa.