System Administration & IT Infrastructure Services
Sys admin role · Network & infrastructure services · Software & platform services · Directory services · Data recovery & backups
1 · What is System Administration?
The Role of a Sys Admin
- System administration is the field in IT responsible for maintaining reliable, multi-user computing environments — the hardware, networks, and services that a company depends on.
- IT infrastructure encompasses hardware, networking, and services such as email, file storage, and web hosting. These typically run on servers.
- Machines that consume server resources are called clients. The sys admin manages the servers so clients can reliably reach them.
- Common server form factors:
- Tower servers — look like desktop PCs; standalone units
- Rack servers — slim units that slide into a rack; space-efficient for data centers
- Blade servers — ultra-thin modules that share a common chassis for power and networking
- KVM switch (Keyboard-Video-Mouse) — allows one keyboard, monitor, and mouse to control many servers. Critical when the network is down and remote access is unavailable.
- Cloud computing — remote servers and data centers managed by a third party (e.g., AWS, Azure, GCP). The company rents compute resources instead of owning physical hardware.
Sys Admin Responsibilities
- Policy & security — in small companies the sys admin often sets organizational policies and owns computer security decisions end-to-end.
- Managing services — installing, configuring, updating, patching, and maintaining servers and the services running on them.
- Hardware lifecycle — four-stage process every piece of hardware goes through:
- Procurement — purchasing hardware, often in bulk from a vendor
- Deployment — setting up and configuring the hardware for use
- Maintenance — routine updates, patches, and repairs
- Retirement — decommissioning, wiping data, disposing or recycling
- Batch updates — rolling out security patches to many machines at once during scheduled maintenance windows.
- Vendor lifecycle — understanding manufacturer support windows so you know when hardware or software reaches end-of-life and bulk replacements need to be planned.
- Troubleshooting — reading logs is step 1; gathering information from the user (customer service) is step 2. The goal is to reproduce and isolate the problem before acting.
Applying Changes Safely
- Admin privilege mindset — admin rights give you enormous power. Key rules: respect privacy, think before you type, minimize time spent in elevated accounts, document everything.
- Session logging — record exactly what you did:
- Linux:
script session.log— captures the session in ANSI format; exit withexit - Windows PowerShell:
Start-Transcript/Stop-Transcript— plain-text output
- Linux:
- Rollback plan — before changing anything, know how to undo it. Document the rollback steps as part of the change record.
- IT Change Management — a standardized process for planning, communicating, and implementing technical changes. A change record typically documents:
- Person / team responsible
- Priority and urgency
- Description and purpose of the change
- Scope and systems affected
- Date, time, and expected duration
- Rollback / backout plan
- Risk level and anticipated impact
- Resources and training needed
- Production vs test environment — never test in production. Production is the live environment end users rely on. Use a test environment (usually a VM mirroring production settings) to validate changes before rolling them out.
- Reproduction case — when troubleshooting, document: (1) the steps taken that led to the issue, (2) the unexpected result, (3) the expected result. Always reproduce on a test instance, not production.
- What are the three most common server form factors?
- What is a KVM switch and why is it particularly useful for a sys admin?
- List the four stages of the hardware lifecycle in order.
- What is the difference between production and a test environment?
- Which Linux command records a terminal session to a log file?
- What is the Windows PowerShell equivalent of Linux's
scriptcommand? - Name four things that should be included in a change management record.
- What is a reproduction case and what three things does it document?
- Tower servers, rack servers, and blade servers.
- A KVM switch lets one keyboard, monitor, and mouse control multiple servers — critical when the network is down and remote access isn't available.
- Procurement → Deployment → Maintenance → Retirement.
- Production is the live environment end users depend on. A test environment mirrors production settings but is isolated, so changes can be validated safely before going live.
script session.log— records the session in ANSI format until you typeexit.Start-Transcript(andStop-Transcriptto end it). Output is plain text.- Any four of: responsible person/team, change priority, description and purpose, scope, affected systems, date/time/duration, rollback plan, risk level, anticipated impact, resources/training needed.
- A reproduction case documents: (1) the steps taken to reach the issue, (2) the unexpected result observed, (3) the expected result. It must always be reproduced on a test instance.
2 · Network & Infrastructure Services
IT Infrastructure Overview
- IT infrastructure has four main pillars: physical (hardware), network, software, and directory services.
- Cloud delivery models — what the provider manages for you:
- IaaS (Infrastructure as a Service) — raw compute, storage, networking (e.g., Amazon EC2, Linode, Azure VMs)
- NaaS (Network as a Service) — networking infrastructure delivered over the cloud
- SaaS (Software as a Service) — fully managed applications (e.g., Gmail, Dropbox)
- PaaS (Platform as a Service) — runtime environment for developers (e.g., Heroku, Google App Engine)
- DaaS (Directory as a Service) — cloud-hosted directory (e.g., Azure AD, JumpCloud)
Physical Infrastructure: Servers & Remote Access
- Servers run their own specific operating systems (often Linux or Windows Server).
- Dedicated hardware — one service per physical machine. Better performance, but costly and underutilizes resources.
- Virtualized instances — multiple services share one physical machine via a hypervisor. Maximizes resource usage and allows live migration during hardware maintenance.
- OpenSSH — the most popular remote access tool on Linux. Requires an SSH client on your machine and an SSH server on the target.
- Install:
sudo apt-get install openssh-client/openssh-server - Connect:
ssh user@ipaddress
- Install:
Network Services
| Service | What it does | Notes |
|---|---|---|
| FTP | File Transfer Protocol — transfers files between machines | No encryption; primarily used to share web content |
| SFTP | Secure FTP — FTP over SSH | Encrypted; use this instead of plain FTP |
| TFTP | Trivial FTP — simplified file transfer | No encryption, no authentication; used for PXE boot |
| PXE boot | Boot a machine from the network instead of local storage | Uses TFTP to deliver the OS image |
| NTP | Network Time Protocol — synchronizes clocks across machines | Install NTP server, connect all machines to it |
| Intranet | Private internal network within a company | Common in large enterprises |
| Proxy server | Intermediary between clients and the internet | Provides privacy, content filtering, access control |
| DNS server | Resolves domain names to IP addresses | Own DNS server lets you control name resolution internally |
| DHCP | Assigns IP addresses automatically to machines on the network | Eliminates manual IP assignment |
OS Imaging & Mass Deployment
- When a company is growing fast, manually setting up each new computer one-by-one is too slow. The solution is disk imaging — create one perfectly configured machine (the golden image), capture a snapshot of its entire disk, then flash that snapshot onto any number of new machines in minutes.
- A golden image (also called a base image or reference image) contains:
- The operating system, fully installed and activated
- Standard company software (Office, antivirus, VPN client, etc.)
- Company-wide settings and security baselines
- Any drivers needed for your hardware fleet
- Once a machine is imaged, it is domain-joined and receives any remaining per-user settings via Group Policy (GPO) automatically at login.
How PXE Boot Powers Mass Deployment
- PXE (Preboot eXecution Environment) — a machine boots from the network instead of its local disk. It contacts a DHCP server which tells it where to find a TFTP server, then downloads a boot loader and OS image over the network. No USB drive or DVD needed.
- The full flow for flashing a new computer:
- Technician plugs the machine into the network and powers it on.
- Machine sends a PXE broadcast — DHCP replies with the TFTP server address.
- Machine downloads a lightweight boot loader via TFTP.
- Boot loader pulls the golden image from the deployment server.
- Image is written to the local disk. Machine reboots into a fully configured OS.
- TFTP (Trivial FTP) is used here because it is extremely simple and requires no authentication — ideal for bare-metal machines that have no OS or credentials yet.
Deployment Tools
| Tool | Platform | What it does |
|---|---|---|
| WDS (Windows Deployment Services) | Windows Server | Hosts and delivers Windows images over PXE; built into Windows Server |
| MDT (Microsoft Deployment Toolkit) | Windows | Extends WDS — automates driver injection, app installs, and domain join; uses answer files |
| Clonezilla | Linux / cross-platform | Open-source disk cloning tool; can push one image to many machines simultaneously |
| Foreman / Cobbler | Linux | Open-source provisioning servers that manage PXE, DHCP, DNS, and kickstart files in one place |
Unattended Installs & Answer Files
- An unattended install automates every prompt that would normally require a human to click through during OS setup — language, timezone, disk partitioning, product key, admin password, and more.
- Windows — uses an XML answer file called
unattend.xml(created with the Windows System Image Manager tool). MDT reads this file during deployment and answers every setup question automatically. - Linux — uses a kickstart file (Red Hat / CentOS) or a preseed file (Debian / Ubuntu). The file is referenced by the PXE boot loader and drives a fully automated install.
- Combined with PXE, an unattended install means a technician can rack a machine, plug in a network cable, and walk away — the machine configures itself end-to-end.
- Build a golden image on one laptop with Windows, company apps, and security settings.
- Upload the image to WDS/MDT running on your deployment server.
- PXE-boot all 50 laptops simultaneously — each pulls the image over the network.
- Unattend.xml handles all setup prompts automatically.
- Machines reboot, join the domain, and pull GPOs. Total hands-on time: ~30 minutes.
Managing System Services (Daemons)
- A daemon is a background process (service) that runs continuously. Daemons have one or more config files that control their behavior.
- Services are usually configured to start automatically at boot and on restart.
- Config files for installed services live in the
/etcdirectory on Linux.
| Linux command | What it does |
|---|---|
service ntp status | Check whether the NTP daemon is running |
sudo service ntp stop | Stop the NTP daemon |
sudo service ntp restart | Stop then start NTP (calls both stop and start) |
sudo date -s "YYYY-MM-DD HH:MM:SS" | Manually set the system date/time (to test NTP) |
sudo service --status-all | List all services: [+] active, [-] inactive, [?] unknown |
sudo apt install vsftpd | Install the vsftpd FTP server (starts automatically) |
lftp localhost | Connect to the local FTP server via the lftp client |
| Windows PowerShell command | What it does |
|---|---|
Get-Service wauserv | Check if Windows Update service is running |
Get-Service wauserv | Format-List * | Show full details of the service |
Stop-Service wauserv | Stop a service (admin only) |
Start-Service wauserv | Start a service (admin only) |
Configuring Network Services with Dnsmasq
- Dnsmasq is a lightweight package that provides DNS, DHCP, TFTP, and PXE services in a single program — ideal for small networks or lab environments.
- Install:
sudo apt install dnsmasq— comes with sensible defaults and starts immediately.
| Command | What it does |
|---|---|
dig www.example.com @localhost | Query the local DNS server to test resolution |
sudo service dnsmasq stop | Stop dnsmasq |
sudo dnsmasq -d -q | Run dnsmasq in debug mode, logging all queries |
sudo dnsmasq -d -q -H myhosts.txt | Run with a custom hosts file mapping names to IPs |
sudo dnsmasq -d -q -C dhcp.conf | Start dnsmasq with a DHCP config file |
ip address show <interface> | Show the IP address of a network interface |
sudo dhclient -i <interface> -v | Request a DHCP lease on the client side |
- Dnsmasq caches DNS responses, speeding up repeat queries.
- NXDomain — the DNS response when a domain does not exist.
- DHCP client and server typically run on separate machines. The client uses
dhclientto request an IP lease; the server usesdhcp.confto define address ranges and options.
Troubleshooting Network Services
- Can't resolve a domain name? Work through this sequence:
ping 8.8.8.8— ping a known IP (like Google's DNS) to confirm basic connectivitynslookup <domain>— check DNS resolution; returns the name server and IP for the domain- Compare the returned IP against what you expect — a mismatch points to a DNS misconfiguration
- If ping to an IP succeeds but DNS resolution fails, the problem is DNS-specific, not network-wide.
- What does IaaS stand for and give two real-world examples?
- What is the key difference between FTP and SFTP?
- Why would you use TFTP instead of SFTP?
- What is a daemon?
- Where are service config files stored on Linux?
- Within what clock-offset range will the NTP daemon automatically correct the system time?
- What does Dnsmasq provide in a single package?
- What command tests DNS resolution against a local dnsmasq instance?
- You can ping Google's IP but can't reach google.com by name. What type of problem is this and what command helps diagnose it?
- What is the difference between a dedicated server and a virtualized instance?
- Infrastructure as a Service — raw compute, storage, and networking managed by a provider. Examples: Amazon EC2, Azure Virtual Machines, Linode.
- FTP transmits data in plain text (no encryption). SFTP tunnels FTP over SSH, encrypting the connection.
- TFTP is used for PXE booting — its simplicity (no auth, no encryption) makes it lightweight enough to deliver OS boot images over the network before an OS is loaded.
- A daemon is a background process that runs continuously, usually starting at boot. It has config files an admin uses to control its behavior.
- The
/etcdirectory. - Within 128 ms. If the offset is larger, NTP won't auto-correct — you must manually set the time first.
- DNS, DHCP, TFTP, and PXE services.
dig www.example.com @localhost- It's a DNS problem, not a network problem. Use
nslookup google.comto check what the DNS server returns. - A dedicated server runs one service on one physical machine (better performance, higher cost, underutilized resources). A virtualized instance runs multiple services on one machine via a hypervisor (efficient, flexible, supports live migration).
3 · Software & Platform Services
Communication Services
- IRC (Internet Relay Chat) — free, open-source chat protocol. A self-hosted alternative to Slack or Teams for internal team communication.
- XMPP (Extensible Messaging and Presence Protocol) — popular open standard for instant messaging. Used under the hood by many chat systems.
- For email you can either run your own mail server or pay an email service provider (Google Workspace, Microsoft 365). Self-hosting gives full control; a provider reduces operational burden.
- DNS records for email:
- A record — maps a hostname to an IP address
- MX record — Mail Exchange; tells the internet which server accepts email for a domain
| Protocol | Direction | Key behaviour |
|---|---|---|
| SMTP | Sending only | The only protocol for sending email — client → server and server → server |
| POP3 | Receiving | Downloads email to local device and removes it from the server — private/offline model |
| IMAP | Receiving | Downloads to device but keeps a copy on the server — synced across multiple devices |
- Spam mitigation — three DNS-based mechanisms that verify the sender is who they claim to be:
- SPF (Sender Policy Framework) — lists which IP addresses are authorized to send email for your domain
- DKIM (DomainKeys Identified Mail) — cryptographically signs outgoing messages so receivers can verify they weren't tampered with
- DMARC (Domain-Based Message Authentication, Reporting & Conformance) — ties SPF and DKIM together and tells receivers what to do with mail that fails (quarantine, reject, or do nothing)
- User Productivity Services — business licensing for productivity suites (Office 365, Google Workspace) is separate from personal licensing. Enterprise versions include admin controls, compliance tooling, and user management.
Security Services — HTTPS & TLS
- Any service you run must guarantee to users that their information is handled securely — especially anything accepting credentials or personal data.
- HTTPS — secure version of HTTP. Encrypts all communication between the browser and the web server so data cannot be read or tampered with in transit.
- TLS (Transport Layer Security) — the cryptographic protocol that powers HTTPS. It is the current standard for securing network communication.
- SSL (Secure Sockets Layer) — the predecessor to TLS. SSL v3.0 was effectively renamed TLS v1.0. SSL is deprecated and should never be used for new systems.
- To enable HTTPS you need a TLS certificate issued by a trusted Certificate Authority (CA). The CA vouches that the certificate belongs to the domain it claims to represent. Let's Encrypt is a free, automated CA widely used for this purpose.
File Services
- Choosing the right file system or sharing protocol depends on who needs to access the files and from what OS.
| Technology | What it is | Best for |
|---|---|---|
| FAT32 | Old filesystem format | Cross-platform USB drives; limited by 4 GB max file size and 8 TB volume size |
| NFS (Network File System) | Protocol for sharing files over a network | Linux/Unix environments; fast and native on Linux, slow on Windows |
| Samba / SMB | Open-source implementation of the SMB protocol | Windows-heavy or mixed environments; integrates with Active Directory |
| NAS (Network Attached Storage) | Dedicated storage device with a stripped-down OS | Simple, high-capacity shared file storage for a team or office |
- On Linux, NFS clients mount a remote share using the host path: e.g.,
mount host:/nfs /mnt/shared. - Mobile device file sync — always sync mobile data to a cloud server or local IT infrastructure. If the device is lost, data is not lost with it.
Print Services
- Large organizations need a print server to manage a fleet of printers centrally — handling queues, drivers, and permissions from one place.
- CUPS (Common Unix Printing System) — the built-in Linux print server. Most modern OSes ship with a print server included.
- When provisioning new computers, have printer drivers and default settings (orientation, quality, tray, duplex) pre-staged so users can print immediately.
- Cloud print management services let both users and admins manage printers through a web interface — useful for distributed offices.
| Language | Creator | Device-dependent? | Notes |
|---|---|---|---|
| PCL (Printer Control Language) | HP | Yes | Optimized for speed on HP hardware; output may differ across printers |
| PostScript (PS) | Adobe | No | Device-independent; output is identical on any PS-compatible printer — preferred for graphics/publishing |
Platform Services — Web Servers & Databases
- Platform services give developers an environment to build and deploy software without managing the underlying hardware or OS — this is the PaaS model.
- A web server is a program (or the machine running it) that listens for HTTP requests and responds with the requested files or data.
| Web Server | Model | Notes |
|---|---|---|
| Apache HTTP Server | Process/thread per request (event-driven config available) | Most widely deployed; runs on localhost by default; extensive module ecosystem |
| NGINX | Asynchronous, non-blocking | Handles high concurrency efficiently; also used as a reverse proxy and load balancer |
| Microsoft IIS | Windows-native | Tight Windows integration; preferred in Microsoft-stack environments |
- Load balancer — sits in front of multiple servers and distributes incoming traffic evenly. Can be hardware or software. Prevents any single server from being overwhelmed and enables horizontal scaling.
- Databases — store, retrieve, and manage structured data. The two most popular open-source relational databases are MySQL and PostgreSQL.
Troubleshooting Platform Services — HTTP Status Codes
- When something goes wrong with a web service, HTTP status codes are your first diagnostic signal — they tell you immediately whether the problem is on the client side or the server side.
| Range | Category | Common examples |
|---|---|---|
| 1xx | Informational | 100 Continue — server received the request headers, client should proceed |
| 2xx | Success | 200 OK — request succeeded and response contains the result |
| 3xx | Redirection | 301 Moved Permanently, 302 Found (temporary redirect) |
| 4xx | Client error | 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found |
| 5xx | Server error | 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable |
Managing Cloud Services
- SaaS — the software is fully pre-configured; you choose from a small set of admin options. You don't manage the underlying infrastructure at all.
- IaaS — you configure and manage your services on rented hardware. Start small; scale as you understand your actual usage patterns.
- Cloud deployment models:
- Public — infrastructure owned and operated by a provider, shared across many customers (AWS, GCP, Azure)
- Private — cloud infrastructure run entirely by your own company; more control, higher cost
- Hybrid — a mix of public and private; sensitive workloads stay on-premises, burst capacity moves to public cloud
- Regions — cloud providers organize their data centers into geographic regions. Choose regions close to your users to minimize latency; use multiple regions for redundancy.
- Load balancer — in a cloud context, ensures each VM instance receives a balanced share of queries as traffic scales up.
- Autoscaling — automatically increases or reduces VM capacity in response to actual demand. Ideal for workloads with variable traffic (seasonal spikes, marketing events).
- Cloud infrastructure avoids large upfront capital investment — you pay for what you use. This makes it especially cost-effective when demand varies significantly throughout the year.
- What is the only email protocol used for sending email?
- What is the key difference between POP3 and IMAP?
- Name the three DNS-based spam mitigation mechanisms and describe what each one does.
- What is the difference between SSL and TLS?
- What do you need to enable HTTPS on a web server?
- When would you choose Samba over NFS for file sharing?
- What is a NAS device?
- What is the difference between PCL and PostScript?
- What does CUPS stand for and what OS is it used on?
- A user reports they're getting a 403 when visiting your internal web app. Is this a client-side or server-side error? What does it mean?
- What is the difference between a Public, Private, and Hybrid cloud?
- What does autoscaling do and when is it most useful?
- Which web server is best suited for high-concurrency async workloads?
- SMTP (Simple Mail Transfer Protocol).
- POP3 downloads email to your local device and removes it from the server — it's a private, offline model. IMAP downloads email but keeps a copy on the server, so your inbox stays in sync across multiple devices.
- SPF — lists which IP addresses are allowed to send email for a domain. DKIM — cryptographically signs outgoing messages so receivers can verify they weren't modified. DMARC — ties SPF and DKIM together and tells receivers what to do with mail that fails verification (quarantine, reject, or allow).
- SSL (Secure Sockets Layer) is the deprecated predecessor. TLS (Transport Layer Security) is the current standard. SSL v3.0 was effectively renamed TLS v1.0. Never use SSL for new systems.
- A TLS certificate issued by a trusted Certificate Authority (CA). The CA vouches that the certificate legitimately belongs to your domain.
- When the environment is Windows-heavy or mixed Windows/Linux. Samba (SMB) integrates natively with Windows and Active Directory. NFS is fast on Linux but slow on Windows.
- A Network Attached Storage device is a dedicated storage appliance running a stripped-down OS designed solely for file delivery — essentially a plug-and-play shared file server.
- PCL (HP) is device-dependent — output can vary across different printers. PostScript (Adobe) is device-independent — output looks identical on any PS-compatible printer, making it preferred for graphics and publishing work.
- Common Unix Printing System — the built-in print server on Linux.
- Client-side error (4xx range). 403 Forbidden means the server understood the request but is refusing to fulfill it — typically a permissions problem. Check whether the user's account has access to that resource.
- Public — infrastructure shared across customers and operated by a provider. Private — infrastructure owned and operated entirely by your own company. Hybrid — a mix of both, where sensitive workloads stay on-premises and burst capacity or less-sensitive services use the public cloud.
- Autoscaling automatically adds or removes compute instances in response to actual demand. It's most useful for workloads with variable traffic — seasonal spikes, flash sales, or marketing campaigns — so you only pay for capacity when you need it.
- NGINX — its asynchronous, non-blocking architecture handles large numbers of concurrent connections efficiently.
4 · Directory Services
Introduction to Directory Services
- A directory server is a lookup service that maps network resources (users, groups, devices, phone numbers) to their network addresses — like a phone book for your IT infrastructure.
- Replication — directory data is copied across multiple physically distributed servers but still appears as one unified system for queries. Benefits:
- No single point of failure — if one replica goes down, others serve requests
- Reduced latency — clients query the geographically nearest replica
- Data is organized hierarchically like a filesystem: company domain → department OUs → individual objects. Organizational Units (OUs) can contain objects or other OUs.
- The sysadmin owns setup, configuration, and ongoing maintenance of the directory.
- Standard directory protocols come from the X.500 family: DAP, DSP, DISP, DOP. These are complex and heavyweight.
- LDAP (Lightweight Directory Access Protocol) is the simplified, practical alternative used everywhere today. It gives clients network access to directory data.
- Two dominant implementations: Microsoft Active Directory and OpenLDAP (open-source).
Centralized Management & AAA
- Centralized management — a single service that issues instructions and enforces policy across the entire IT infrastructure. Without it, each machine is an island with its own local accounts.
- Directory services provide centralized AAA:
- Authentication — prove who you are
- Authorization — determine what you're allowed to do
- Accounting — log what you actually did
- RBAC (Role-Based Access Control) — permissions are assigned to roles, not individuals. A user's access is determined by their role in the org.
- When someone changes roles, leaves, or joins — update their group membership. No need to touch individual machine accounts.
- Centralized policy can be as simple as logon scripts that map drives or printers when a user signs in.
LDAP — Protocol & Notation
- LDAP is the wire protocol used to read and write data in a directory service. Both Active Directory and OpenLDAP speak LDAP.
- Every directory entry has a distinguished name (DN) — a unique, comma-separated path that identifies its location in the hierarchy:
| Abbreviation | Meaning | Example |
|---|---|---|
dn | Distinguished Name — full path | dn: uid=jane,ou=sales,dc=example,dc=com |
cn | Common Name — object's name | cn=Jane Smith |
ou | Organizational Unit | ou=engineering |
dc | Domain Component | dc=example,dc=com |
- Common LDAP operations: add entry, modify entry, delete entry, search.
- The Bind operation authenticates a client to the directory server before it can perform operations. Three authentication modes:
- Anonymous — no credentials; read-only public data
- Simple — username and password in plain text (use only over TLS)
- SASL (Simple Authentication and Security Layer) — pluggable auth; typically uses TLS and supports Kerberos
- Kerberos — a network authentication protocol used to prove identity without sending passwords over the wire. Widely used with Active Directory.
Active Directory (AD)
- Active Directory is Microsoft's native directory service for Windows environments. It speaks LDAP, so it can interact with non-Windows hosts as well.
- ADAC (Active Directory Administrative Center) is the primary GUI for managing AD objects.
- Key structural concepts:
- Domain — the basic administrative boundary; a collection of computers, users, and resources
- Forest — one or more domains that share a schema and trust each other
- Organizational Unit (OU) — a folder-like container inside a domain; used to organize objects and delegate admin control
- Domain Controller (DC) — the server running AD; it authenticates logins and enforces policy
- FSMO (Flexible Single Master Operation) — certain AD tasks require one authoritative DC to avoid conflicts (PDC Emulator, RID Master, etc.)
- SAM (Security Account Manager) — stores usernames and hashed passwords
| Built-in Group | Scope | Notes |
|---|---|---|
| Domain Admins | Domain | Full control of the domain — not for daily use |
| Enterprise Admins | Forest | Full control across all domains in the forest |
| Domain Users | Domain | All standard user accounts |
| Domain Computers | Domain | All computers joined to the domain |
| Domain Controllers | Domain | All DCs in the domain |
- AD has three group scopes — choose based on what you're grouping and where it's used:
- Domain Local — assign permissions to a specific resource (e.g., a shared folder)
- Global — group accounts by role within a domain (e.g., "RnD" global group contains "Researchers" sub-group)
- Universal — group global roles across an entire forest
- Security groups can contain user accounts, computer accounts, or other security groups — used to grant access to resources.
- Distribution groups are email-only — they cannot be used to assign permissions.
- Admins should never know a user's password. Only reset when you've absolutely verified the request is legitimate. One person, one authenticator.
- Tip: ADAC shells out to PowerShell. Watch the command history pane to see the exact commands it runs — then script those for bulk operations.
| PowerShell command | What it does |
|---|---|
Add-Computer -DomainName 'example.com' -Server 'dc1' | Join a computer to the domain via command line |
Get-AdForest | View forest info including functional level |
Get-AdDomain | View domain info including functional level (year/version) |
Group Policy Objects (GPOs)
- A GPO is a set of policies and preferences that can be applied to a group of AD objects (users, computers, or both).
- GPOs must be linked to a domain, site, or OU to take effect — creating a GPO alone does nothing.
- Each GPO has two sections:
- Computer Configuration — applied when the machine boots
- User Configuration — applied when the user logs on
- Policies — settings that are re-enforced every few minutes; users cannot override them.
- Preferences — settings that act as templates; users can change them after the fact.
- GPO data is stored in the SYSVOL folder, which is replicated to all DCs.
- The Windows Registry is the hierarchical database where GPO policy settings ultimately land on the client machine.
GPO Processing Order (LSDOU)
- GPOs are applied in this order — later wins when there are conflicts:
- Local GPO — settings stored directly on the machine
- Site-linked GPOs
- Domain-linked GPOs
- OU-linked GPOs — most specific container is applied last (and wins)
- Within an OU, each GPO has a link order. Higher link-order number = applied first = lowest precedence. The GPO applied last wins.
- An upstream GPO can be set to Enforced to prevent child OUs from overriding it.
- An OU can be set to Block Inheritance to ignore GPOs from parent containers (unless a parent GPO is Enforced).
GPO Tooling & Troubleshooting
| Tool / Command | What it does |
|---|---|
gpupdate /force /sync | Force-apply all GPOs immediately on the client |
gpresult /R | Summary of which GPOs applied or were denied |
gpresult /z | Verbose RSOP output |
gpresult /s <host> /u <user> | Remote RSOP for a specific machine and user |
gpedit | Local Group Policy editor |
| GPMC | Group Policy Management Console — full forest-wide GPO view |
| RSOP report | Resultant Set of Policy — the combined effect of all applied GPOs |
| Group Policy Modeling | Predicts which policies will apply before you make changes |
w32tm /resync | Force time resync (fix Kerberos auth failures caused by clock skew) |
Resolve-DNSName -Type SRV -Name _ldap._tcp.dc._msdcs.DOMAIN.NAME | Verify DC SRV records in DNS (key AD health check) |
w32tm /resync.GPO Troubleshooting Checklist
- Check the GPO scope — is it linked to the right OU/domain?
- Check security filtering — does the target user/computer have Read and Apply permission?
- Verify Read and Apply permissions on the GPO
- Check Group Policy delegation settings
- Is the relevant config section (Computer or User) enabled?
- Confirm the LSDOU processing order — is a higher-priority GPO overriding?
- Ensure the GPO link is enabled (not just created)
- Is an upstream GPO set to Enforced, blocking your override?
- Is the affected OU set to Block Inheritance?
- Is Loopback processing enabled? (changes how User Config applies to shared machines)
- Check WMI filters — an unexpected filter may exclude the target
- Verify your expectations match the GPO setting's actual behavior
OpenLDAP
- OpenLDAP is the most popular open-source LDAP implementation. It works on any OS and operates similarly to Active Directory's LDAP layer.
- Two management interfaces:
- phpLDAPadmin — web-based GUI; easiest for browsing and editing entries
- CLI tools — scriptable; required for automation
| Command | What it does |
|---|---|
sudo apt-get install slapd ldap-utils | Install the OpenLDAP server (slapd) and CLI utilities |
sudo dpkg-reconfigure slapd | Run the interactive setup wizard (domain, admin password, etc.) |
ldapadd | Add an entry from an LDIF file |
ldapmodify | Modify an existing entry using an LDIF file |
ldapdelete | Delete an entry by DN |
ldapsearch | Search the directory and return matching entries |
- LDIF (LDAP Data Interchange Format) — plain-text files that describe directory entries or changes. The workflow is: write an LDIF file → run the appropriate
ldap*command. That's it.
dn: uid=jane,ou=people,dc=example,dc=com objectClass: inetOrgPerson cn: Jane Smith sn: Smith uid: jane mail: jane@example.com
- What is a directory server and what does it map?
- What two benefits does directory replication provide?
- What does AAA stand for in the context of centralized management?
- What does the LDAP Bind operation do?
- Name the three LDAP authentication modes and describe each briefly.
- What is the difference between a Security Group and a Distribution Group in Active Directory?
- What are the three AD group scopes and when would you use each?
- What is LSDOU and why does it matter?
- A GPO exists and is linked but isn't applying to a user. Name three things you'd check first.
- What clock-skew limit causes Kerberos authentication to fail, and what command fixes it?
- What is an LDIF file and how is it used with OpenLDAP?
- What is the difference between a GPO Policy setting and a GPO Preference setting?
- A directory server is a lookup service that maps network resources (users, groups, devices) to their network addresses — like a searchable phone book for IT infrastructure.
- Replication eliminates single points of failure (if one server goes down, replicas keep serving) and reduces latency (clients query the nearest replica).
- Authentication (prove identity), Authorization (determine what's allowed), Accounting (log what was done).
- The Bind operation authenticates the client to the directory server, establishing the identity the client will act under for subsequent operations.
- Anonymous — no credentials, read-only public access; Simple — username and password (plaintext, use only over TLS); SASL — pluggable auth layer supporting Kerberos and TLS for secure authentication.
- Security Groups can be granted permissions to resources. Distribution Groups are email-only — they cannot be used to assign access rights.
- Domain Local — assign permissions to a specific resource; Global — group accounts into a role within a domain; Universal — group global roles across an entire forest.
- LSDOU is the GPO processing order: Local → Site → Domain → OU. Later settings override earlier ones when there are conflicts, so the most specific (OU-linked) GPO wins.
- Check the GPO link is enabled; check security filtering (Read + Apply permissions); check whether an upstream GPO is set to Enforced and overriding it.
- More than 5 minutes of clock skew causes Kerberos to fail. Fix it with
w32tm /resync. - An LDIF (LDAP Data Interchange Format) file is a plain-text file that describes a directory entry or a change to one. You write the LDIF, then pass it to a command like
ldapaddorldapmodify. - Policies are enforced settings re-applied every few minutes — users cannot override them. Preferences are template defaults — users can change them after the GPO applies them.
5 · Data Recovery & Backups
Planning for Data Recovery
- Data recovery — the process of restoring data after an unexpected event involving data loss or corruption. The main objective is to resume normal operations as soon as possible.
- The most important technique is backing up regularly. Restoring from backup is always the best solution when one exists.
- Backup systems should prioritize important/essential data and account for future growth.
- On-site backups — data is physically nearby, low bandwidth to access. Risk: a local disaster destroys all copies.
- Off-site backups — data is safer across multiple locations. Requires security, encryption (TLS preferred), and sufficient bandwidth. Best practice is to maintain both on-site and off-site backups.
- Long-term archival — magnetic tape is the standard medium: inexpensive and long-lasting.
- Hard disks will fail eventually — redundancy and regular backups are non-negotiable.
- User backups — can be handled via SaaS solutions (Dropbox, iCloud, etc.) that are easy for users to configure.
- OS built-in tools:
- macOS → Time Machine
- Windows → Backup & Restore
- Linux →
rsync— a file-transfer utility designed to efficiently copy and sync files, transferring only changed data
- Regular testing — backups must be tested regularly. Restoration procedures must be documented and exercised at least once a year (disaster recovery testing).
Recovery Objectives
| Term | Definition | Example |
|---|---|---|
| RPO — Recovery Point Objective | Maximum acceptable amount of data loss measured in time — how far back can you afford to go? | RPO of 4 h means you back up every 4 h; losing up to 4 h of data is tolerable |
| RTO — Recovery Time Objective | Maximum acceptable downtime — how quickly must systems be back online? | RTO of 2 h means you must restore service within 2 h of the incident |
Backup Types
- Full backup — copies all data every time. Simple to restore but slow and storage-heavy for largely-static data.
- Differential backup — copies everything changed since the last full backup. Saves storage vs. repeated full backups; restore requires last full + latest differential.
- Incremental backup — copies only data changed since the last backup of any type. Most storage-efficient; restore requires last full + every incremental in order.
- Best practice: infrequent full backups combined with more frequent differential or incremental backups.
- Compression — backups can be compressed to save space, but not all data types compress well. Compressed files must be decompressed before restoration.
RAID Levels
| Level | Name | Min Drives | Fault Tolerance | Notes |
|---|---|---|---|---|
| RAID 0 | Striping | 2 | None — any drive failure = total loss | Best performance; no redundancy |
| RAID 1 | Mirroring | 2 | 1 drive | Exact copy on all drives; 50% storage efficiency |
| RAID 5 | Striping + Distributed Parity | 3 | 1 drive | Parity distributed across all drives; good balance of speed, capacity, redundancy |
| RAID 6 | Striping + Double Parity | 4 | 2 drives | Like RAID 5 but calculates two sets of parity; safer for large arrays |
| RAID 10 | Mirroring + Striping (1+0) | 4 | 1 per mirrored pair | Mirrors first, then stripes the mirrored sets; best performance + redundancy, highest cost |
Disaster Recovery Plans
- A disaster recovery plan (DRP) is a collection of documented procedures for reacting to an emergency or disaster scenario, with the goal of minimizing disruption to business operations.
- A DRP covers three categories of measures:
- Preventative — actions taken to reduce the likelihood of a disaster
- Detection — monitoring and alerting systems that identify an incident quickly
- Corrective / Recovery — steps to restore operations after an incident
- Risk assessment — identify high-risk systems; pay special attention to systems that lack redundancy.
- Determine which data/systems are highest priority and have a data recovery plan ready for each.
- Verify that all operational documentation is current and accessible.
- Define and test detection and alert measures so the team knows about an incident immediately.
- Document and test recovery procedures in detail.
| DR Site Type | Description | Recovery Time |
|---|---|---|
| Hot site | Fully operational duplicate environment, always running and in sync | Minutes |
| Warm site | Hardware and connectivity ready; data must be restored from backup before going live | Hours |
| Cold site | Physical space and power only; all equipment and data must be provisioned from scratch | Days–weeks |
Post-Mortems
- A post-mortem is created after an incident, outage, or event — or at the end of a project — to analyze what happened and how to improve.
- Purpose: to learn, not to punish or shame. Understanding is the goal.
- Standard post-mortem structure:
- Brief summary — what the incident was, how long it lasted, what the impact was, and how it was resolved
- Detailed timeline — every significant event and every attempt to fix the issue, with exact dates, times, and time zones
- Root cause — the underlying reason the incident occurred and what can be learned from it
- Resolution & recovery — detailed account of the steps taken to fix the problem and restore service
- Action items — concrete steps to prevent the same scenario in the future
- Highlight both what went wrong and what went well — cover everything.
- Post-mortems also follow the end of a project to capture lessons learned even when nothing went wrong.
- What is the difference between RPO and RTO?
- What does the 3-2-1 backup rule say?
- What is the difference between a differential and an incremental backup?
- Why is RAID not considered a backup solution?
- What is the standard medium for long-term archival storage and why?
- Describe the three types of disaster recovery sites.
- What are the five sections of a standard post-mortem report?
- What does
rsyncdo on Linux? - How often should disaster recovery procedures be formally tested?
- What are the three categories of measures a disaster recovery plan covers?
- RPO (Recovery Point Objective) = the maximum acceptable amount of data loss measured in time. RTO (Recovery Time Objective) = the maximum acceptable downtime — how quickly systems must be restored after an incident.
- Keep 3 copies of data, on 2 different media types, with 1 copy stored off-site.
- Differential: copies everything changed since the last full backup. Incremental: copies only what changed since the last backup of any type — most storage-efficient but requires the full + every incremental to restore.
- RAID provides storage redundancy but doesn't protect against accidental deletion, ransomware, or a site-wide disaster. It's a storage solution — all copies live together and can be destroyed together.
- Magnetic tape — it is inexpensive and long-lasting compared to hard drives, making it cost-effective for large volumes of rarely-accessed archive data.
- Hot site — fully operational duplicate, minutes to fail over. Warm site — hardware ready, data must be restored from backup, hours to bring online. Cold site — physical space and power only, everything must be provisioned from scratch, days to weeks.
- 1) Brief summary, 2) Detailed timeline, 3) Root cause, 4) Resolution & recovery, 5) Action items to prevent recurrence.
- A file-transfer utility that efficiently copies and syncs files, transferring only data that has changed since the last sync.
- At least once a year.
- Preventative measures, Detection measures, and Corrective/Recovery measures.
6 · Final Project
How to use these case studies
- Read the company snapshot and the problems identified — then try to work out your own approach before revealing mine.
- The analysis is my own, written during the course. It isn't a model answer or AI-generated — it's how I thought through each scenario at the time. Your approach is most likely different so don't take mine as gospel! Lots of different approaches work!
- Each point is tagged § Section N to show which part of the course I was drawing on.
Scenario 1 — Network Funtime Company
| Factor | Current state |
|---|---|
| Industry | Open source software |
| Employees | ~100 (engineers, designers, HR, sales) |
| Hardware | HR buys cheapest available laptop per hire — every machine is a different model, nothing is labeled or inventoried |
| Onboarding | HR hands a blank laptop to the new hire; employee installs their own OS and software |
| IT support | HR is the de-facto IT contact — loses several hours per new hire |
| Passwords | No requirements and no recovery system — lost passwords result in a full reimage |
| Services | Cloud SaaS (email, docs, sheets) + Slack; no in-house infrastructure |
Problems identified
- Procurement is reactive and inefficient — one laptop at a time at the cheapest price, no vendor relationship, no bulk deal
- Every employee has different hardware — impossible to standardize support, drivers, or images
- No asset inventory — no way to audit what the company owns or track machine lifecycle
- Employees self-installing software risks bloatware, misconfiguration, and security holes
- HR is doing IT work and burning hours per hire — not scalable and not their job
- No password policy and no recovery path — lost credential = wipe and start over
- Cloud service access handed out ad-hoc with no central directory or SSO
- Establish a vendor relationship and standardize hardware by role. Engineers and designers need stronger specs than sales; negotiate two or three laptop tiers with a vendor and buy in bulk. Saves money and makes support predictable. §1 Hardware Lifecycle
- Create an asset inventory. Label every machine and log it in a directory — track owner, model, age, maintenance history, and lifecycle stage. Nothing disappears silently. §4 Directory Services
- Build pre-configured disk images per department role. A developer image ships with the right OS, dev tools, and security baselines pre-installed. A designer image ships with creative tools. HR flashes a new hire in minutes — no self-installs, no surprises. §2 OS Imaging
- Move IT onboarding off HR's plate. IT owns onboarding. New employees submit support tickets — HR is no longer the IT contact. Saves HR hours per hire and improves response quality. §1 Sys Admin Responsibilities
- Implement a directory service with password policy and self-service recovery. Active Directory or a cloud equivalent enforces complexity requirements and lets IT reset credentials without reimaging the machine. §4 Active Directory
- Provision cloud services through the directory (SSO). Instead of HR handing out individual logins, integrate SaaS tools with a central identity provider so accounts are created and revoked automatically. §4 Directory Services
- Audit SaaS costs once the above is in place. You need a baseline before you can tell if you're overspending. Without seeing a balance sheet now, defer this until operations are stable. §3 Managing Cloud Services
Scenario 2 — W.D. Widgets
| Factor | Current state |
|---|---|
| Industry | Widget sales (client-facing, revenue-generating) |
| Employees | 80–100 now; expecting hundreds of new hires within a year |
| Hardware | Ordered directly from a business vendor; one or two spare machines kept on hand — solid process |
| OS & directory | Windows only; managed via Windows Active Directory |
| Onboarding | IT manually sets up each machine and installs a long list of sales applications — time consuming |
| File server | All customer data on a single file server mapped to each salesperson's machine; folder creator owns it — anyone can delete anything |
| Backups | None |
| IT support | Email only — no ticketing system |
| Services | Fully in-house: email server, local software, instant messenger — no cloud |
Problems identified
- Manual machine setup with long app installs will not scale to hundreds of new hires — IT will be a bottleneck
- File server has no access controls — any employee can permanently delete shared customer data
- Zero backups — a single failure wipes everything, with no recovery path
- Email-only IT support will be overwhelmed as headcount grows; no prioritization or tracking
- In-house everything at rapid scale means IT team needs to grow or infrastructure will buckle
- Build a pre-configured sales disk image and deploy it via PXE. One image per role ships with the OS and every required sales application pre-installed. New machines image themselves over the network — no manual installs, no room for human error. At hundreds of hires a year, this is the only viable path. §2 OS Imaging & PXE
- Lock down the file server with GPOs immediately. Salespeople should have read/write access but not delete. Assign delete/ownership permissions only to department leads. This protects customer data from accidental or malicious wipes. §4 Group Policy Objects
- Implement a disaster recovery plan — starting with backups. With no backups on a single server, one hardware failure ends the business. Set up on-site and off-site backup servers on a regular schedule, and run recovery drills. The company has revenue — use it. §5 Data Recovery & RAID
- Replace email support with a priority ticketing system. At scale, email is untrackable and unprioritizable. A ticketing system logs every request, tracks resolution, and lets IT triage by severity. §1 Sys Admin Responsibilities
- Grow the IT team. Running in-house email, IM, software, and file servers at hundreds of employees requires dedicated headcount. Understaffed IT on in-house infra is a single point of failure. §1 Sys Admin Responsibilities
Scenario 3 — Dewgood (Non-Profit)
| Factor | Current state |
|---|---|
| Industry | Local non-profit, not planning to grow |
| Employees | ~50 |
| Hardware | Purchased from a physical retail store on the day the hire starts — whatever is in stock that day |
| OS & directory | Windows Active Directory — but departing employees are never disabled |
| Onboarding | IT logs them in, installs software, maps the file server to their machine manually |
| Infrastructure | Single server running file services and email; company website also hosted on this same server |
| Website | Static single-page site (mission, contact info) — goes down frequently, nobody knows how to fix it |
| Backups | Nightly backups to a disk that IT takes home every day |
| IT support | Open source ticketing system exists but nobody uses it — employees contact IT directly |
Problems identified
- Day-of retail procurement is the most expensive and least reliable option — whatever's in stock wins
- Departing employees stay active in AD — security risk and directory clutter
- Backup disk goes home with IT every night — one lost or damaged drive means data loss
- Single server for files, email, and website — no redundancy; one crash takes everything offline
- Static website going down on shared hardware is an unmonitored single point of failure
- Ticketing system exists but has zero adoption — IT support is informal and untracked
- Improve procurement — give HR advance notice and find cheaper sourcing. Ask HR to flag hires before the start date so you can shop around. Refurbished hardware or a small vendor relationship beats retail on price every time. At 50 employees with a tight budget, every dollar matters. §1 Hardware Lifecycle
- Create a disk image for new hires — but keep the human touch. At this size, face-to-face onboarding is a genuine benefit — a small company values that relationship. Still, use a pre-built image so you're not manually installing software and mapping drives every time. Flash, then walk them through it together. §2 OS Imaging
- Create an AD offboarding group and enforce account disabling on departure. Departed employees with active accounts are a security hole. A simple policy — HR notifies IT, IT disables the account that day — closes it immediately. §4 Active Directory
- Move backups to a proper offsite location — not your home. A personal residence is not a secure storage environment. Set up a second drive at a different physical location (a trusted partner's office, a rented storage unit) and automate the rotation. Nightly frequency is fine; the storage location is the problem. §5 Disaster Recovery
- Move the static website to a free cloud host. The site is a single static HTML page with no backend. Services like GitHub Pages, Netlify, or Cloudflare Pages host it for free with 99.9%+ uptime — no hardware to maintain, no mysterious downtime. Free up the server for what actually needs to be on-site. §3 Platform Services
- Invest in getting the ticketing system adopted. At 50 employees, informal IT is manageable but tickets add real value: they log patterns, track resolution time, and let you prioritize. Make the case kindly — explain the benefit to employees, not just IT. If the current system is too confusing, explore a simpler alternative or ask finance for a small budget. §1 Sys Admin Responsibilities
- Single server is acceptable at this scale — but monitor it closely. Schedule maintenance windows (late nights, weekends) and set up monitoring so you know before something fails. If the server does go down, you have a plan. §2 Managing System Services