Course 4

System Administration & IT Infrastructure Services

Sys admin role · Network & infrastructure services · Software & platform services · Directory services · Data recovery & backups

▶

1 · What is System Administration?

Complete

The Role of a Sys Admin

System administration is the field in IT responsible for maintaining reliable, multi-user computing environments — the hardware, networks, and services that a company depends on.
IT infrastructure encompasses hardware, networking, and services such as email, file storage, and web hosting. These typically run on servers.
Machines that consume server resources are called clients. The sys admin manages the servers so clients can reliably reach them.
Common server form factors:
- Tower servers — look like desktop PCs; standalone units
- Rack servers — slim units that slide into a rack; space-efficient for data centers
- Blade servers — ultra-thin modules that share a common chassis for power and networking
KVM switch (Keyboard-Video-Mouse) — allows one keyboard, monitor, and mouse to control many servers. Critical when the network is down and remote access is unavailable.
Cloud computing — remote servers and data centers managed by a third party (e.g., AWS, Azure, GCP). The company rents compute resources instead of owning physical hardware.

Sys Admin Responsibilities

Policy & security — in small companies the sys admin often sets organizational policies and owns computer security decisions end-to-end.
Managing services — installing, configuring, updating, patching, and maintaining servers and the services running on them.
Hardware lifecycle — four-stage process every piece of hardware goes through:
1. Procurement — purchasing hardware, often in bulk from a vendor
2. Deployment — setting up and configuring the hardware for use
3. Maintenance — routine updates, patches, and repairs
4. Retirement — decommissioning, wiping data, disposing or recycling
Batch updates — rolling out security patches to many machines at once during scheduled maintenance windows.
Vendor lifecycle — understanding manufacturer support windows so you know when hardware or software reaches end-of-life and bulk replacements need to be planned.
Troubleshooting — reading logs is step 1; gathering information from the user (customer service) is step 2. The goal is to reproduce and isolate the problem before acting.

Applying Changes Safely

Admin privilege mindset — admin rights give you enormous power. Key rules: respect privacy, think before you type, minimize time spent in elevated accounts, document everything.
Session logging — record exactly what you did:
- Linux: script session.log — captures the session in ANSI format; exit with exit
- Windows PowerShell: Start-Transcript / Stop-Transcript — plain-text output
Rollback plan — before changing anything, know how to undo it. Document the rollback steps as part of the change record.
IT Change Management — a standardized process for planning, communicating, and implementing technical changes. A change record typically documents:
- Person / team responsible
- Priority and urgency
- Description and purpose of the change
- Scope and systems affected
- Date, time, and expected duration
- Rollback / backout plan
- Risk level and anticipated impact
- Resources and training needed
Production vs test environment — never test in production. Production is the live environment end users rely on. Use a test environment (usually a VM mirroring production settings) to validate changes before rolling them out.
Reproduction case — when troubleshooting, document: (1) the steps taken that led to the issue, (2) the unexpected result, (3) the expected result. Always reproduce on a test instance, not production.

What are the three most common server form factors?
What is a KVM switch and why is it particularly useful for a sys admin?
List the four stages of the hardware lifecycle in order.
What is the difference between production and a test environment?
Which Linux command records a terminal session to a log file?
What is the Windows PowerShell equivalent of Linux's script command?
Name four things that should be included in a change management record.
What is a reproduction case and what three things does it document?

Tower servers, rack servers, and blade servers.
A KVM switch lets one keyboard, monitor, and mouse control multiple servers — critical when the network is down and remote access isn't available.
Procurement → Deployment → Maintenance → Retirement.
Production is the live environment end users depend on. A test environment mirrors production settings but is isolated, so changes can be validated safely before going live.
script session.log — records the session in ANSI format until you type exit.
Start-Transcript (and Stop-Transcript to end it). Output is plain text.
Any four of: responsible person/team, change priority, description and purpose, scope, affected systems, date/time/duration, rollback plan, risk level, anticipated impact, resources/training needed.
A reproduction case documents: (1) the steps taken to reach the issue, (2) the unexpected result observed, (3) the expected result. It must always be reproduced on a test instance.

▶

2 · Network & Infrastructure Services

Complete

IT Infrastructure Overview

IT infrastructure has four main pillars: physical (hardware), network, software, and directory services.
Cloud delivery models — what the provider manages for you:
- IaaS (Infrastructure as a Service) — raw compute, storage, networking (e.g., Amazon EC2, Linode, Azure VMs)
- NaaS (Network as a Service) — networking infrastructure delivered over the cloud
- SaaS (Software as a Service) — fully managed applications (e.g., Gmail, Dropbox)
- PaaS (Platform as a Service) — runtime environment for developers (e.g., Heroku, Google App Engine)
- DaaS (Directory as a Service) — cloud-hosted directory (e.g., Azure AD, JumpCloud)

Physical Infrastructure: Servers & Remote Access

Servers run their own specific operating systems (often Linux or Windows Server).
Dedicated hardware — one service per physical machine. Better performance, but costly and underutilizes resources.
Virtualized instances — multiple services share one physical machine via a hypervisor. Maximizes resource usage and allows live migration during hardware maintenance.
OpenSSH — the most popular remote access tool on Linux. Requires an SSH client on your machine and an SSH server on the target.
- Install: sudo apt-get install openssh-client / openssh-server
- Connect: ssh user@ipaddress

Network Services

Service	What it does	Notes
FTP	File Transfer Protocol — transfers files between machines	No encryption; primarily used to share web content
SFTP	Secure FTP — FTP over SSH	Encrypted; use this instead of plain FTP
TFTP	Trivial FTP — simplified file transfer	No encryption, no authentication; used for PXE boot
PXE boot	Boot a machine from the network instead of local storage	Uses TFTP to deliver the OS image
NTP	Network Time Protocol — synchronizes clocks across machines	Install NTP server, connect all machines to it
Intranet	Private internal network within a company	Common in large enterprises
Proxy server	Intermediary between clients and the internet	Provides privacy, content filtering, access control
DNS server	Resolves domain names to IP addresses	Own DNS server lets you control name resolution internally
DHCP	Assigns IP addresses automatically to machines on the network	Eliminates manual IP assignment

OS Imaging & Mass Deployment

When a company is growing fast, manually setting up each new computer one-by-one is too slow. The solution is disk imaging — create one perfectly configured machine (the golden image), capture a snapshot of its entire disk, then flash that snapshot onto any number of new machines in minutes.
A golden image (also called a base image or reference image) contains:
- The operating system, fully installed and activated
- Standard company software (Office, antivirus, VPN client, etc.)
- Company-wide settings and security baselines
- Any drivers needed for your hardware fleet
Once a machine is imaged, it is domain-joined and receives any remaining per-user settings via Group Policy (GPO) automatically at login.

How PXE Boot Powers Mass Deployment

PXE (Preboot eXecution Environment) — a machine boots from the network instead of its local disk. It contacts a DHCP server which tells it where to find a TFTP server, then downloads a boot loader and OS image over the network. No USB drive or DVD needed.
The full flow for flashing a new computer:
1. Technician plugs the machine into the network and powers it on.
2. Machine sends a PXE broadcast — DHCP replies with the TFTP server address.
3. Machine downloads a lightweight boot loader via TFTP.
4. Boot loader pulls the golden image from the deployment server.
5. Image is written to the local disk. Machine reboots into a fully configured OS.
TFTP (Trivial FTP) is used here because it is extremely simple and requires no authentication — ideal for bare-metal machines that have no OS or credentials yet.

Deployment Tools

Tool	Platform	What it does
WDS (Windows Deployment Services)	Windows Server	Hosts and delivers Windows images over PXE; built into Windows Server
MDT (Microsoft Deployment Toolkit)	Windows	Extends WDS — automates driver injection, app installs, and domain join; uses answer files
Clonezilla	Linux / cross-platform	Open-source disk cloning tool; can push one image to many machines simultaneously
Foreman / Cobbler	Linux	Open-source provisioning servers that manage PXE, DHCP, DNS, and kickstart files in one place

Unattended Installs & Answer Files

An unattended install automates every prompt that would normally require a human to click through during OS setup — language, timezone, disk partitioning, product key, admin password, and more.
Windows — uses an XML answer file called unattend.xml (created with the Windows System Image Manager tool). MDT reads this file during deployment and answers every setup question automatically.
Linux — uses a kickstart file (Red Hat / CentOS) or a preseed file (Debian / Ubuntu). The file is referenced by the PXE boot loader and drives a fully automated install.
Combined with PXE, an unattended install means a technician can rack a machine, plug in a network cable, and walk away — the machine configures itself end-to-end.

Scenario — rapidly growing startup: You just received 50 identical laptops for new hires starting Monday. Instead of spending a week manually setting up each one, you:

Build a golden image on one laptop with Windows, company apps, and security settings.
Upload the image to WDS/MDT running on your deployment server.
PXE-boot all 50 laptops simultaneously — each pulls the image over the network.
Unattend.xml handles all setup prompts automatically.
Machines reboot, join the domain, and pull GPOs. Total hands-on time: ~30 minutes.

Managing System Services (Daemons)

A daemon is a background process (service) that runs continuously. Daemons have one or more config files that control their behavior.
Services are usually configured to start automatically at boot and on restart.
Config files for installed services live in the /etc directory on Linux.

Linux command	What it does
`service ntp status`	Check whether the NTP daemon is running
`sudo service ntp stop`	Stop the NTP daemon
`sudo service ntp restart`	Stop then start NTP (calls both stop and start)
`sudo date -s "YYYY-MM-DD HH:MM:SS"`	Manually set the system date/time (to test NTP)
`sudo service --status-all`	List all services: `[+]` active, `[-]` inactive, `[?]` unknown
`sudo apt install vsftpd`	Install the vsftpd FTP server (starts automatically)
`lftp localhost`	Connect to the local FTP server via the lftp client

NTP daemon auto-corrects only if the clock is within 128 ms of the true time; it adjusts gradually at ~0.5 ms/step. Outside 128 ms it does nothing — you must manually set the time first.

Windows PowerShell command	What it does
`Get-Service wauserv`	Check if Windows Update service is running
`Get-Service wauserv \| Format-List *`	Show full details of the service
`Stop-Service wauserv`	Stop a service (admin only)
`Start-Service wauserv`	Start a service (admin only)

Windows service configuration is stored in the registry. The Services Management Console (services.msc) provides a GUI alternative to PowerShell commands.

Configuring Network Services with Dnsmasq

Dnsmasq is a lightweight package that provides DNS, DHCP, TFTP, and PXE services in a single program — ideal for small networks or lab environments.
Install: sudo apt install dnsmasq — comes with sensible defaults and starts immediately.

Command	What it does
`dig www.example.com @localhost`	Query the local DNS server to test resolution
`sudo service dnsmasq stop`	Stop dnsmasq
`sudo dnsmasq -d -q`	Run dnsmasq in debug mode, logging all queries
`sudo dnsmasq -d -q -H myhosts.txt`	Run with a custom hosts file mapping names to IPs
`sudo dnsmasq -d -q -C dhcp.conf`	Start dnsmasq with a DHCP config file
`ip address show <interface>`	Show the IP address of a network interface
`sudo dhclient -i <interface> -v`	Request a DHCP lease on the client side

Dnsmasq caches DNS responses, speeding up repeat queries.
NXDomain — the DNS response when a domain does not exist.
DHCP client and server typically run on separate machines. The client uses dhclient to request an IP lease; the server uses dhcp.conf to define address ranges and options.

Troubleshooting Network Services

Can't resolve a domain name? Work through this sequence:
1. ping 8.8.8.8 — ping a known IP (like Google's DNS) to confirm basic connectivity
2. nslookup <domain> — check DNS resolution; returns the name server and IP for the domain
3. Compare the returned IP against what you expect — a mismatch points to a DNS misconfiguration
If ping to an IP succeeds but DNS resolution fails, the problem is DNS-specific, not network-wide.

What does IaaS stand for and give two real-world examples?
What is the key difference between FTP and SFTP?
Why would you use TFTP instead of SFTP?
What is a daemon?
Where are service config files stored on Linux?
Within what clock-offset range will the NTP daemon automatically correct the system time?
What does Dnsmasq provide in a single package?
What command tests DNS resolution against a local dnsmasq instance?
You can ping Google's IP but can't reach google.com by name. What type of problem is this and what command helps diagnose it?
What is the difference between a dedicated server and a virtualized instance?

Infrastructure as a Service — raw compute, storage, and networking managed by a provider. Examples: Amazon EC2, Azure Virtual Machines, Linode.
FTP transmits data in plain text (no encryption). SFTP tunnels FTP over SSH, encrypting the connection.
TFTP is used for PXE booting — its simplicity (no auth, no encryption) makes it lightweight enough to deliver OS boot images over the network before an OS is loaded.
A daemon is a background process that runs continuously, usually starting at boot. It has config files an admin uses to control its behavior.
The /etc directory.
Within 128 ms. If the offset is larger, NTP won't auto-correct — you must manually set the time first.
DNS, DHCP, TFTP, and PXE services.
dig www.example.com @localhost
It's a DNS problem, not a network problem. Use nslookup google.com to check what the DNS server returns.
A dedicated server runs one service on one physical machine (better performance, higher cost, underutilized resources). A virtualized instance runs multiple services on one machine via a hypervisor (efficient, flexible, supports live migration).

▶

3 · Software & Platform Services

Complete

Communication Services

IRC (Internet Relay Chat) — free, open-source chat protocol. A self-hosted alternative to Slack or Teams for internal team communication.
XMPP (Extensible Messaging and Presence Protocol) — popular open standard for instant messaging. Used under the hood by many chat systems.
For email you can either run your own mail server or pay an email service provider (Google Workspace, Microsoft 365). Self-hosting gives full control; a provider reduces operational burden.
DNS records for email:
- A record — maps a hostname to an IP address
- MX record — Mail Exchange; tells the internet which server accepts email for a domain

Protocol	Direction	Key behaviour
SMTP	Sending only	The only protocol for sending email — client → server and server → server
POP3	Receiving	Downloads email to local device and removes it from the server — private/offline model
IMAP	Receiving	Downloads to device but keeps a copy on the server — synced across multiple devices

Spam mitigation — three DNS-based mechanisms that verify the sender is who they claim to be:
- SPF (Sender Policy Framework) — lists which IP addresses are authorized to send email for your domain
- DKIM (DomainKeys Identified Mail) — cryptographically signs outgoing messages so receivers can verify they weren't tampered with
- DMARC (Domain-Based Message Authentication, Reporting & Conformance) — ties SPF and DKIM together and tells receivers what to do with mail that fails (quarantine, reject, or do nothing)
User Productivity Services — business licensing for productivity suites (Office 365, Google Workspace) is separate from personal licensing. Enterprise versions include admin controls, compliance tooling, and user management.

Security Services — HTTPS & TLS

Any service you run must guarantee to users that their information is handled securely — especially anything accepting credentials or personal data.
HTTPS — secure version of HTTP. Encrypts all communication between the browser and the web server so data cannot be read or tampered with in transit.
TLS (Transport Layer Security) — the cryptographic protocol that powers HTTPS. It is the current standard for securing network communication.
SSL (Secure Sockets Layer) — the predecessor to TLS. SSL v3.0 was effectively renamed TLS v1.0. SSL is deprecated and should never be used for new systems.
To enable HTTPS you need a TLS certificate issued by a trusted Certificate Authority (CA). The CA vouches that the certificate belongs to the domain it claims to represent. Let's Encrypt is a free, automated CA widely used for this purpose.

File Services

Choosing the right file system or sharing protocol depends on who needs to access the files and from what OS.

Technology	What it is	Best for
FAT32	Old filesystem format	Cross-platform USB drives; limited by 4 GB max file size and 8 TB volume size
NFS (Network File System)	Protocol for sharing files over a network	Linux/Unix environments; fast and native on Linux, slow on Windows
Samba / SMB	Open-source implementation of the SMB protocol	Windows-heavy or mixed environments; integrates with Active Directory
NAS (Network Attached Storage)	Dedicated storage device with a stripped-down OS	Simple, high-capacity shared file storage for a team or office

On Linux, NFS clients mount a remote share using the host path: e.g., mount host:/nfs /mnt/shared.
Mobile device file sync — always sync mobile data to a cloud server or local IT infrastructure. If the device is lost, data is not lost with it.

Print Services

Large organizations need a print server to manage a fleet of printers centrally — handling queues, drivers, and permissions from one place.
CUPS (Common Unix Printing System) — the built-in Linux print server. Most modern OSes ship with a print server included.
When provisioning new computers, have printer drivers and default settings (orientation, quality, tray, duplex) pre-staged so users can print immediately.
Cloud print management services let both users and admins manage printers through a web interface — useful for distributed offices.

Language	Creator	Device-dependent?	Notes
PCL (Printer Control Language)	HP	Yes	Optimized for speed on HP hardware; output may differ across printers
PostScript (PS)	Adobe	No	Device-independent; output is identical on any PS-compatible printer — preferred for graphics/publishing

Platform Services — Web Servers & Databases

Platform services give developers an environment to build and deploy software without managing the underlying hardware or OS — this is the PaaS model.
A web server is a program (or the machine running it) that listens for HTTP requests and responds with the requested files or data.

Web Server	Model	Notes
Apache HTTP Server	Process/thread per request (event-driven config available)	Most widely deployed; runs on localhost by default; extensive module ecosystem
NGINX	Asynchronous, non-blocking	Handles high concurrency efficiently; also used as a reverse proxy and load balancer
Microsoft IIS	Windows-native	Tight Windows integration; preferred in Microsoft-stack environments

Load balancer — sits in front of multiple servers and distributes incoming traffic evenly. Can be hardware or software. Prevents any single server from being overwhelmed and enables horizontal scaling.
Databases — store, retrieve, and manage structured data. The two most popular open-source relational databases are MySQL and PostgreSQL.

Troubleshooting Platform Services — HTTP Status Codes

When something goes wrong with a web service, HTTP status codes are your first diagnostic signal — they tell you immediately whether the problem is on the client side or the server side.

Range	Category	Common examples
1xx	Informational	100 Continue — server received the request headers, client should proceed
2xx	Success	200 OK — request succeeded and response contains the result
3xx	Redirection	301 Moved Permanently, 302 Found (temporary redirect)
4xx	Client error	400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found
5xx	Server error	500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable

4xx = the client sent a bad request. 5xx = the server failed to fulfill a valid request. Start your investigation on whichever side the code points to.

Managing Cloud Services

SaaS — the software is fully pre-configured; you choose from a small set of admin options. You don't manage the underlying infrastructure at all.
IaaS — you configure and manage your services on rented hardware. Start small; scale as you understand your actual usage patterns.
Cloud deployment models:
- Public — infrastructure owned and operated by a provider, shared across many customers (AWS, GCP, Azure)
- Private — cloud infrastructure run entirely by your own company; more control, higher cost
- Hybrid — a mix of public and private; sensitive workloads stay on-premises, burst capacity moves to public cloud
Regions — cloud providers organize their data centers into geographic regions. Choose regions close to your users to minimize latency; use multiple regions for redundancy.
Load balancer — in a cloud context, ensures each VM instance receives a balanced share of queries as traffic scales up.
Autoscaling — automatically increases or reduces VM capacity in response to actual demand. Ideal for workloads with variable traffic (seasonal spikes, marketing events).
Cloud infrastructure avoids large upfront capital investment — you pay for what you use. This makes it especially cost-effective when demand varies significantly throughout the year.

What is the only email protocol used for sending email?
What is the key difference between POP3 and IMAP?
Name the three DNS-based spam mitigation mechanisms and describe what each one does.
What is the difference between SSL and TLS?
What do you need to enable HTTPS on a web server?
When would you choose Samba over NFS for file sharing?
What is a NAS device?
What is the difference between PCL and PostScript?
What does CUPS stand for and what OS is it used on?
A user reports they're getting a 403 when visiting your internal web app. Is this a client-side or server-side error? What does it mean?
What is the difference between a Public, Private, and Hybrid cloud?
What does autoscaling do and when is it most useful?
Which web server is best suited for high-concurrency async workloads?

SMTP (Simple Mail Transfer Protocol).
POP3 downloads email to your local device and removes it from the server — it's a private, offline model. IMAP downloads email but keeps a copy on the server, so your inbox stays in sync across multiple devices.
SPF — lists which IP addresses are allowed to send email for a domain. DKIM — cryptographically signs outgoing messages so receivers can verify they weren't modified. DMARC — ties SPF and DKIM together and tells receivers what to do with mail that fails verification (quarantine, reject, or allow).
SSL (Secure Sockets Layer) is the deprecated predecessor. TLS (Transport Layer Security) is the current standard. SSL v3.0 was effectively renamed TLS v1.0. Never use SSL for new systems.
A TLS certificate issued by a trusted Certificate Authority (CA). The CA vouches that the certificate legitimately belongs to your domain.
When the environment is Windows-heavy or mixed Windows/Linux. Samba (SMB) integrates natively with Windows and Active Directory. NFS is fast on Linux but slow on Windows.
A Network Attached Storage device is a dedicated storage appliance running a stripped-down OS designed solely for file delivery — essentially a plug-and-play shared file server.
PCL (HP) is device-dependent — output can vary across different printers. PostScript (Adobe) is device-independent — output looks identical on any PS-compatible printer, making it preferred for graphics and publishing work.
Common Unix Printing System — the built-in print server on Linux.
Client-side error (4xx range). 403 Forbidden means the server understood the request but is refusing to fulfill it — typically a permissions problem. Check whether the user's account has access to that resource.
Public — infrastructure shared across customers and operated by a provider. Private — infrastructure owned and operated entirely by your own company. Hybrid — a mix of both, where sensitive workloads stay on-premises and burst capacity or less-sensitive services use the public cloud.
Autoscaling automatically adds or removes compute instances in response to actual demand. It's most useful for workloads with variable traffic — seasonal spikes, flash sales, or marketing campaigns — so you only pay for capacity when you need it.
NGINX — its asynchronous, non-blocking architecture handles large numbers of concurrent connections efficiently.

▶

4 · Directory Services

Complete

Introduction to Directory Services

A directory server is a lookup service that maps network resources (users, groups, devices, phone numbers) to their network addresses — like a phone book for your IT infrastructure.
Replication — directory data is copied across multiple physically distributed servers but still appears as one unified system for queries. Benefits:
- No single point of failure — if one replica goes down, others serve requests
- Reduced latency — clients query the geographically nearest replica
Data is organized hierarchically like a filesystem: company domain → department OUs → individual objects. Organizational Units (OUs) can contain objects or other OUs.
The sysadmin owns setup, configuration, and ongoing maintenance of the directory.
Standard directory protocols come from the X.500 family: DAP, DSP, DISP, DOP. These are complex and heavyweight.
LDAP (Lightweight Directory Access Protocol) is the simplified, practical alternative used everywhere today. It gives clients network access to directory data.
Two dominant implementations: Microsoft Active Directory and OpenLDAP (open-source).

Centralized Management & AAA

Centralized management — a single service that issues instructions and enforces policy across the entire IT infrastructure. Without it, each machine is an island with its own local accounts.
Directory services provide centralized AAA:
- Authentication — prove who you are
- Authorization — determine what you're allowed to do
- Accounting — log what you actually did
RBAC (Role-Based Access Control) — permissions are assigned to roles, not individuals. A user's access is determined by their role in the org.
When someone changes roles, leaves, or joins — update their group membership. No need to touch individual machine accounts.
Centralized policy can be as simple as logon scripts that map drives or printers when a user signs in.

LDAP — Protocol & Notation

LDAP is the wire protocol used to read and write data in a directory service. Both Active Directory and OpenLDAP speak LDAP.
Every directory entry has a distinguished name (DN) — a unique, comma-separated path that identifies its location in the hierarchy:

Abbreviation	Meaning	Example
`dn`	Distinguished Name — full path	`dn: uid=jane,ou=sales,dc=example,dc=com`
`cn`	Common Name — object's name	`cn=Jane Smith`
`ou`	Organizational Unit	`ou=engineering`
`dc`	Domain Component	`dc=example,dc=com`

Common LDAP operations: add entry, modify entry, delete entry, search.
The Bind operation authenticates a client to the directory server before it can perform operations. Three authentication modes:
- Anonymous — no credentials; read-only public data
- Simple — username and password in plain text (use only over TLS)
- SASL (Simple Authentication and Security Layer) — pluggable auth; typically uses TLS and supports Kerberos
Kerberos — a network authentication protocol used to prove identity without sending passwords over the wire. Widely used with Active Directory.

Active Directory (AD)

Active Directory is Microsoft's native directory service for Windows environments. It speaks LDAP, so it can interact with non-Windows hosts as well.
ADAC (Active Directory Administrative Center) is the primary GUI for managing AD objects.
Key structural concepts:
- Domain — the basic administrative boundary; a collection of computers, users, and resources
- Forest — one or more domains that share a schema and trust each other
- Organizational Unit (OU) — a folder-like container inside a domain; used to organize objects and delegate admin control
- Domain Controller (DC) — the server running AD; it authenticates logins and enforces policy
- FSMO (Flexible Single Master Operation) — certain AD tasks require one authoritative DC to avoid conflicts (PDC Emulator, RID Master, etc.)
- SAM (Security Account Manager) — stores usernames and hashed passwords

Built-in Group	Scope	Notes
Domain Admins	Domain	Full control of the domain — not for daily use
Enterprise Admins	Forest	Full control across all domains in the forest
Domain Users	Domain	All standard user accounts
Domain Computers	Domain	All computers joined to the domain
Domain Controllers	Domain	All DCs in the domain

Never use Domain Admin or Enterprise Admin accounts for day-to-day work — use a standard account and elevate only when necessary.

AD has three group scopes — choose based on what you're grouping and where it's used:
- Domain Local — assign permissions to a specific resource (e.g., a shared folder)
- Global — group accounts by role within a domain (e.g., "RnD" global group contains "Researchers" sub-group)
- Universal — group global roles across an entire forest
Security groups can contain user accounts, computer accounts, or other security groups — used to grant access to resources.
Distribution groups are email-only — they cannot be used to assign permissions.
Admins should never know a user's password. Only reset when you've absolutely verified the request is legitimate. One person, one authenticator.
Tip: ADAC shells out to PowerShell. Watch the command history pane to see the exact commands it runs — then script those for bulk operations.

PowerShell command	What it does
`Add-Computer -DomainName 'example.com' -Server 'dc1'`	Join a computer to the domain via command line
`Get-AdForest`	View forest info including functional level
`Get-AdDomain`	View domain info including functional level (year/version)

Group Policy Objects (GPOs)

A GPO is a set of policies and preferences that can be applied to a group of AD objects (users, computers, or both).
GPOs must be linked to a domain, site, or OU to take effect — creating a GPO alone does nothing.
Each GPO has two sections:
- Computer Configuration — applied when the machine boots
- User Configuration — applied when the user logs on
Policies — settings that are re-enforced every few minutes; users cannot override them.
Preferences — settings that act as templates; users can change them after the fact.
GPO data is stored in the SYSVOL folder, which is replicated to all DCs.
The Windows Registry is the hierarchical database where GPO policy settings ultimately land on the client machine.

Before editing any GPO, back it up first. If something breaks, restoring from a backup is much faster than manually reversing settings. Always test changes in a non-production OU first.

GPO Processing Order (LSDOU)

GPOs are applied in this order — later wins when there are conflicts:

Local GPO — settings stored directly on the machine
Site-linked GPOs
Domain-linked GPOs
OU-linked GPOs — most specific container is applied last (and wins)

Within an OU, each GPO has a link order. Higher link-order number = applied first = lowest precedence. The GPO applied last wins.
An upstream GPO can be set to Enforced to prevent child OUs from overriding it.
An OU can be set to Block Inheritance to ignore GPOs from parent containers (unless a parent GPO is Enforced).

GPO Tooling & Troubleshooting

Tool / Command	What it does
`gpupdate /force /sync`	Force-apply all GPOs immediately on the client
`gpresult /R`	Summary of which GPOs applied or were denied
`gpresult /z`	Verbose RSOP output
`gpresult /s <host> /u <user>`	Remote RSOP for a specific machine and user
`gpedit`	Local Group Policy editor
GPMC	Group Policy Management Console — full forest-wide GPO view
RSOP report	Resultant Set of Policy — the combined effect of all applied GPOs
Group Policy Modeling	Predicts which policies will apply before you make changes
`w32tm /resync`	Force time resync (fix Kerberos auth failures caused by clock skew)
`Resolve-DNSName -Type SRV -Name _ldap._tcp.dc._msdcs.DOMAIN.NAME`	Verify DC SRV records in DNS (key AD health check)

Kerberos authentication fails if the client clock differs from the DC by more than 5 minutes. If users suddenly can't log in after a power event, check the time first with w32tm /resync.

GPO Troubleshooting Checklist

Check the GPO scope — is it linked to the right OU/domain?
Check security filtering — does the target user/computer have Read and Apply permission?
Verify Read and Apply permissions on the GPO
Check Group Policy delegation settings
Is the relevant config section (Computer or User) enabled?
Confirm the LSDOU processing order — is a higher-priority GPO overriding?
Ensure the GPO link is enabled (not just created)
Is an upstream GPO set to Enforced, blocking your override?
Is the affected OU set to Block Inheritance?
Is Loopback processing enabled? (changes how User Config applies to shared machines)
Check WMI filters — an unexpected filter may exclude the target
Verify your expectations match the GPO setting's actual behavior

MDM (Mobile Device Management) is the equivalent of GPOs for mobile devices and cloud-enrolled devices. It can enforce policies and trigger a remote wipe if a device is lost or stolen.

OpenLDAP

OpenLDAP is the most popular open-source LDAP implementation. It works on any OS and operates similarly to Active Directory's LDAP layer.
Two management interfaces:
- phpLDAPadmin — web-based GUI; easiest for browsing and editing entries
- CLI tools — scriptable; required for automation

Command	What it does
`sudo apt-get install slapd ldap-utils`	Install the OpenLDAP server (`slapd`) and CLI utilities
`sudo dpkg-reconfigure slapd`	Run the interactive setup wizard (domain, admin password, etc.)
`ldapadd`	Add an entry from an LDIF file
`ldapmodify`	Modify an existing entry using an LDIF file
`ldapdelete`	Delete an entry by DN
`ldapsearch`	Search the directory and return matching entries

LDIF (LDAP Data Interchange Format) — plain-text files that describe directory entries or changes. The workflow is: write an LDIF file → run the appropriate ldap* command. That's it.

Sample LDIF — adding a user entry:

dn: uid=jane,ou=people,dc=example,dc=com
objectClass: inetOrgPerson
cn: Jane Smith
sn: Smith
uid: jane
mail: jane@example.com

What is a directory server and what does it map?
What two benefits does directory replication provide?
What does AAA stand for in the context of centralized management?
What does the LDAP Bind operation do?
Name the three LDAP authentication modes and describe each briefly.
What is the difference between a Security Group and a Distribution Group in Active Directory?
What are the three AD group scopes and when would you use each?
What is LSDOU and why does it matter?
A GPO exists and is linked but isn't applying to a user. Name three things you'd check first.
What clock-skew limit causes Kerberos authentication to fail, and what command fixes it?
What is an LDIF file and how is it used with OpenLDAP?
What is the difference between a GPO Policy setting and a GPO Preference setting?

A directory server is a lookup service that maps network resources (users, groups, devices) to their network addresses — like a searchable phone book for IT infrastructure.
Replication eliminates single points of failure (if one server goes down, replicas keep serving) and reduces latency (clients query the nearest replica).
Authentication (prove identity), Authorization (determine what's allowed), Accounting (log what was done).
The Bind operation authenticates the client to the directory server, establishing the identity the client will act under for subsequent operations.
Anonymous — no credentials, read-only public access; Simple — username and password (plaintext, use only over TLS); SASL — pluggable auth layer supporting Kerberos and TLS for secure authentication.
Security Groups can be granted permissions to resources. Distribution Groups are email-only — they cannot be used to assign access rights.
Domain Local — assign permissions to a specific resource; Global — group accounts into a role within a domain; Universal — group global roles across an entire forest.
LSDOU is the GPO processing order: Local → Site → Domain → OU. Later settings override earlier ones when there are conflicts, so the most specific (OU-linked) GPO wins.
Check the GPO link is enabled; check security filtering (Read + Apply permissions); check whether an upstream GPO is set to Enforced and overriding it.
More than 5 minutes of clock skew causes Kerberos to fail. Fix it with w32tm /resync.
An LDIF (LDAP Data Interchange Format) file is a plain-text file that describes a directory entry or a change to one. You write the LDIF, then pass it to a command like ldapadd or ldapmodify.
Policies are enforced settings re-applied every few minutes — users cannot override them. Preferences are template defaults — users can change them after the GPO applies them.

▶

5 · Data Recovery & Backups

Complete

Planning for Data Recovery

Data recovery — the process of restoring data after an unexpected event involving data loss or corruption. The main objective is to resume normal operations as soon as possible.
The most important technique is backing up regularly. Restoring from backup is always the best solution when one exists.
Backup systems should prioritize important/essential data and account for future growth.
On-site backups — data is physically nearby, low bandwidth to access. Risk: a local disaster destroys all copies.
Off-site backups — data is safer across multiple locations. Requires security, encryption (TLS preferred), and sufficient bandwidth. Best practice is to maintain both on-site and off-site backups.
Long-term archival — magnetic tape is the standard medium: inexpensive and long-lasting.
Hard disks will fail eventually — redundancy and regular backups are non-negotiable.
User backups — can be handled via SaaS solutions (Dropbox, iCloud, etc.) that are easy for users to configure.
OS built-in tools:
- macOS → Time Machine
- Windows → Backup & Restore
- Linux → rsync — a file-transfer utility designed to efficiently copy and sync files, transferring only changed data
Regular testing — backups must be tested regularly. Restoration procedures must be documented and exercised at least once a year (disaster recovery testing).

Recovery Objectives

Term	Definition	Example
RPO — Recovery Point Objective	Maximum acceptable amount of data loss measured in time — how far back can you afford to go?	RPO of 4 h means you back up every 4 h; losing up to 4 h of data is tolerable
RTO — Recovery Time Objective	Maximum acceptable downtime — how quickly must systems be back online?	RTO of 2 h means you must restore service within 2 h of the incident

Backup Types

Full backup — copies all data every time. Simple to restore but slow and storage-heavy for largely-static data.
Differential backup — copies everything changed since the last full backup. Saves storage vs. repeated full backups; restore requires last full + latest differential.
Incremental backup — copies only data changed since the last backup of any type. Most storage-efficient; restore requires last full + every incremental in order.
Best practice: infrequent full backups combined with more frequent differential or incremental backups.
Compression — backups can be compressed to save space, but not all data types compress well. Compressed files must be decompressed before restoration.

The 3-2-1 rule: keep 3 copies of data, on 2 different media types, with 1 copy stored off-site.

RAID Levels

RAID (Redundant Array of Independent Disks) is an inexpensive way to create large storage with redundancy — but it is a storage solution, not a backup solution. A RAID array does not protect against accidental deletion, ransomware, or site disasters.

Level	Name	Min Drives	Fault Tolerance	Notes
RAID 0	Striping	2	None — any drive failure = total loss	Best performance; no redundancy
RAID 1	Mirroring	2	1 drive	Exact copy on all drives; 50% storage efficiency
RAID 5	Striping + Distributed Parity	3	1 drive	Parity distributed across all drives; good balance of speed, capacity, redundancy
RAID 6	Striping + Double Parity	4	2 drives	Like RAID 5 but calculates two sets of parity; safer for large arrays
RAID 10	Mirroring + Striping (1+0)	4	1 per mirrored pair	Mirrors first, then stripes the mirrored sets; best performance + redundancy, highest cost

Disaster Recovery Plans

A disaster recovery plan (DRP) is a collection of documented procedures for reacting to an emergency or disaster scenario, with the goal of minimizing disruption to business operations.
A DRP covers three categories of measures:
1. Preventative — actions taken to reduce the likelihood of a disaster
2. Detection — monitoring and alerting systems that identify an incident quickly
3. Corrective / Recovery — steps to restore operations after an incident
Risk assessment — identify high-risk systems; pay special attention to systems that lack redundancy.
Determine which data/systems are highest priority and have a data recovery plan ready for each.
Verify that all operational documentation is current and accessible.
Define and test detection and alert measures so the team knows about an incident immediately.
Document and test recovery procedures in detail.

DR Site Type	Description	Recovery Time
Hot site	Fully operational duplicate environment, always running and in sync	Minutes
Warm site	Hardware and connectivity ready; data must be restored from backup before going live	Hours
Cold site	Physical space and power only; all equipment and data must be provisioned from scratch	Days–weeks

Post-Mortems

A post-mortem is created after an incident, outage, or event — or at the end of a project — to analyze what happened and how to improve.
Purpose: to learn, not to punish or shame. Understanding is the goal.
Standard post-mortem structure:
1. Brief summary — what the incident was, how long it lasted, what the impact was, and how it was resolved
2. Detailed timeline — every significant event and every attempt to fix the issue, with exact dates, times, and time zones
3. Root cause — the underlying reason the incident occurred and what can be learned from it
4. Resolution & recovery — detailed account of the steps taken to fix the problem and restore service
5. Action items — concrete steps to prevent the same scenario in the future
Highlight both what went wrong and what went well — cover everything.
Post-mortems also follow the end of a project to capture lessons learned even when nothing went wrong.

What is the difference between RPO and RTO?
What does the 3-2-1 backup rule say?
What is the difference between a differential and an incremental backup?
Why is RAID not considered a backup solution?
What is the standard medium for long-term archival storage and why?
Describe the three types of disaster recovery sites.
What are the five sections of a standard post-mortem report?
What does rsync do on Linux?
How often should disaster recovery procedures be formally tested?
What are the three categories of measures a disaster recovery plan covers?

RPO (Recovery Point Objective) = the maximum acceptable amount of data loss measured in time. RTO (Recovery Time Objective) = the maximum acceptable downtime — how quickly systems must be restored after an incident.
Keep 3 copies of data, on 2 different media types, with 1 copy stored off-site.
Differential: copies everything changed since the last full backup. Incremental: copies only what changed since the last backup of any type — most storage-efficient but requires the full + every incremental to restore.
RAID provides storage redundancy but doesn't protect against accidental deletion, ransomware, or a site-wide disaster. It's a storage solution — all copies live together and can be destroyed together.
Magnetic tape — it is inexpensive and long-lasting compared to hard drives, making it cost-effective for large volumes of rarely-accessed archive data.
Hot site — fully operational duplicate, minutes to fail over. Warm site — hardware ready, data must be restored from backup, hours to bring online. Cold site — physical space and power only, everything must be provisioned from scratch, days to weeks.
1) Brief summary, 2) Detailed timeline, 3) Root cause, 4) Resolution & recovery, 5) Action items to prevent recurrence.
A file-transfer utility that efficiently copies and syncs files, transferring only data that has changed since the last sync.
At least once a year.
Preventative measures, Detection measures, and Corrective/Recovery measures.

▶

6 · Final Project

Complete

How to use these case studies

Read the company snapshot and the problems identified — then try to work out your own approach before revealing mine.
The analysis is my own, written during the course. It isn't a model answer or AI-generated — it's how I thought through each scenario at the time. Your approach is most likely different so don't take mine as gospel! Lots of different approaches work!
Each point is tagged § Section N to show which part of the course I was drawing on.

Scenario 1 — Network Funtime Company

Factor	Current state
Industry	Open source software
Employees	~100 (engineers, designers, HR, sales)
Hardware	HR buys cheapest available laptop per hire — every machine is a different model, nothing is labeled or inventoried
Onboarding	HR hands a blank laptop to the new hire; employee installs their own OS and software
IT support	HR is the de-facto IT contact — loses several hours per new hire
Passwords	No requirements and no recovery system — lost passwords result in a full reimage
Services	Cloud SaaS (email, docs, sheets) + Slack; no in-house infrastructure

Problems identified

Procurement is reactive and inefficient — one laptop at a time at the cheapest price, no vendor relationship, no bulk deal
Every employee has different hardware — impossible to standardize support, drivers, or images
No asset inventory — no way to audit what the company owns or track machine lifecycle
Employees self-installing software risks bloatware, misconfiguration, and security holes
HR is doing IT work and burning hours per hire — not scalable and not their job
No password policy and no recovery path — lost credential = wipe and start over
Cloud service access handed out ad-hoc with no central directory or SSO

Establish a vendor relationship and standardize hardware by role. Engineers and designers need stronger specs than sales; negotiate two or three laptop tiers with a vendor and buy in bulk. Saves money and makes support predictable. §1 Hardware Lifecycle
Create an asset inventory. Label every machine and log it in a directory — track owner, model, age, maintenance history, and lifecycle stage. Nothing disappears silently. §4 Directory Services
Build pre-configured disk images per department role. A developer image ships with the right OS, dev tools, and security baselines pre-installed. A designer image ships with creative tools. HR flashes a new hire in minutes — no self-installs, no surprises. §2 OS Imaging
Move IT onboarding off HR's plate. IT owns onboarding. New employees submit support tickets — HR is no longer the IT contact. Saves HR hours per hire and improves response quality. §1 Sys Admin Responsibilities
Implement a directory service with password policy and self-service recovery. Active Directory or a cloud equivalent enforces complexity requirements and lets IT reset credentials without reimaging the machine. §4 Active Directory
Provision cloud services through the directory (SSO). Instead of HR handing out individual logins, integrate SaaS tools with a central identity provider so accounts are created and revoked automatically. §4 Directory Services
Audit SaaS costs once the above is in place. You need a baseline before you can tell if you're overspending. Without seeing a balance sheet now, defer this until operations are stable. §3 Managing Cloud Services

Scenario 2 — W.D. Widgets

Factor	Current state
Industry	Widget sales (client-facing, revenue-generating)
Employees	80–100 now; expecting hundreds of new hires within a year
Hardware	Ordered directly from a business vendor; one or two spare machines kept on hand — solid process
OS & directory	Windows only; managed via Windows Active Directory
Onboarding	IT manually sets up each machine and installs a long list of sales applications — time consuming
File server	All customer data on a single file server mapped to each salesperson's machine; folder creator owns it — anyone can delete anything
Backups	None
IT support	Email only — no ticketing system
Services	Fully in-house: email server, local software, instant messenger — no cloud

Problems identified

Manual machine setup with long app installs will not scale to hundreds of new hires — IT will be a bottleneck
File server has no access controls — any employee can permanently delete shared customer data
Zero backups — a single failure wipes everything, with no recovery path
Email-only IT support will be overwhelmed as headcount grows; no prioritization or tracking
In-house everything at rapid scale means IT team needs to grow or infrastructure will buckle

Build a pre-configured sales disk image and deploy it via PXE. One image per role ships with the OS and every required sales application pre-installed. New machines image themselves over the network — no manual installs, no room for human error. At hundreds of hires a year, this is the only viable path. §2 OS Imaging & PXE
Lock down the file server with GPOs immediately. Salespeople should have read/write access but not delete. Assign delete/ownership permissions only to department leads. This protects customer data from accidental or malicious wipes. §4 Group Policy Objects
Implement a disaster recovery plan — starting with backups. With no backups on a single server, one hardware failure ends the business. Set up on-site and off-site backup servers on a regular schedule, and run recovery drills. The company has revenue — use it. §5 Data Recovery & RAID
Replace email support with a priority ticketing system. At scale, email is untrackable and unprioritizable. A ticketing system logs every request, tracks resolution, and lets IT triage by severity. §1 Sys Admin Responsibilities
Grow the IT team. Running in-house email, IM, software, and file servers at hundreds of employees requires dedicated headcount. Understaffed IT on in-house infra is a single point of failure. §1 Sys Admin Responsibilities

Scenario 3 — Dewgood (Non-Profit)

Factor	Current state
Industry	Local non-profit, not planning to grow
Employees	~50
Hardware	Purchased from a physical retail store on the day the hire starts — whatever is in stock that day
OS & directory	Windows Active Directory — but departing employees are never disabled
Onboarding	IT logs them in, installs software, maps the file server to their machine manually
Infrastructure	Single server running file services and email; company website also hosted on this same server
Website	Static single-page site (mission, contact info) — goes down frequently, nobody knows how to fix it
Backups	Nightly backups to a disk that IT takes home every day
IT support	Open source ticketing system exists but nobody uses it — employees contact IT directly

Problems identified

Day-of retail procurement is the most expensive and least reliable option — whatever's in stock wins
Departing employees stay active in AD — security risk and directory clutter
Backup disk goes home with IT every night — one lost or damaged drive means data loss
Single server for files, email, and website — no redundancy; one crash takes everything offline
Static website going down on shared hardware is an unmonitored single point of failure
Ticketing system exists but has zero adoption — IT support is informal and untracked

Improve procurement — give HR advance notice and find cheaper sourcing. Ask HR to flag hires before the start date so you can shop around. Refurbished hardware or a small vendor relationship beats retail on price every time. At 50 employees with a tight budget, every dollar matters. §1 Hardware Lifecycle
Create a disk image for new hires — but keep the human touch. At this size, face-to-face onboarding is a genuine benefit — a small company values that relationship. Still, use a pre-built image so you're not manually installing software and mapping drives every time. Flash, then walk them through it together. §2 OS Imaging
Create an AD offboarding group and enforce account disabling on departure. Departed employees with active accounts are a security hole. A simple policy — HR notifies IT, IT disables the account that day — closes it immediately. §4 Active Directory
Move backups to a proper offsite location — not your home. A personal residence is not a secure storage environment. Set up a second drive at a different physical location (a trusted partner's office, a rented storage unit) and automate the rotation. Nightly frequency is fine; the storage location is the problem. §5 Disaster Recovery
Move the static website to a free cloud host. The site is a single static HTML page with no backend. Services like GitHub Pages, Netlify, or Cloudflare Pages host it for free with 99.9%+ uptime — no hardware to maintain, no mysterious downtime. Free up the server for what actually needs to be on-site. §3 Platform Services
Invest in getting the ticketing system adopted. At 50 employees, informal IT is manageable but tickets add real value: they log patterns, track resolution time, and let you prioritize. Make the case kindly — explain the benefit to employees, not just IT. If the current system is too confusing, explore a simpler alternative or ask finance for a small budget. §1 Sys Admin Responsibilities
Single server is acceptable at this scale — but monitor it closely. Schedule maintenance windows (late nights, weekends) and set up monitoring so you know before something fails. If the server does go down, you have a plan. §2 Managing System Services