2 videos 📅 2025-01-29 09:00:00 Africa/Blantyre
1:23:30
2025-01-29 11:33:43
3:10:55
2025-01-30 10:51:38

Course recordings on DaDesktop for Training platform

Visit NobleProg websites for related course

Summary

Overview

This session is a technical course on MongoDB deployment, focusing on sharding, replica sets, config servers, and troubleshooting common setup errors. The instructor guides learners through hands-on configuration of a MongoDB sharded cluster, including creating data directories, starting config and shard servers, initiating replica sets, enabling sharding, and using mongos as a query router. The session also includes a significant segment on contract review and employment terms, unrelated to the technical content, indicating a mixed-format recording (course + personal/administrative discussion). The core instructional goal is to equip learners with practical skills to deploy and manage a production-grade MongoDB sharded environment, emphasizing logical sequencing, port management, indexing requirements for shard keys, and the distinction between replica sets and deprecated master-slave replication.

Topic (Timeline)

1. MongoDB Setup Troubleshooting and Log Monitoring [00:00:06.130 - 00:03:45.780]

  • Learners encounter connection failures to MongoDB due to misconfigured paths or missing dependencies.
  • Instructor guides installation of lnav (log analyzer) via apt install lnav.
  • Demonstrates tailing the MongoDB log file: /var/log/mongodb/mongod.log.
  • Identifies configuration override error: MongoDB config override environment variable set to 1, causing startup issues.
  • Uses control C to interrupt processes and systemctl status to check service state.
  • Emphasizes correct file path syntax: /var/log/mongodb/, not /var/mongo/.

2. Authentication and Password Configuration for Sharding [00:03:45.780 - 00:08:29.250]

  • Instructor takes over learner’s session to resolve authentication failure during sharding setup.
  • Identifies root cause: user failed to authenticate with correct credentials.
  • Guides user to locate password change instructions in “Exercise Day One” documentation.
  • Instructs use of mongo --host <host> --port <port> --username admin --password <new_password> to authenticate.
  • Reinforces that authentication errors are self-explanatory: “It was actually telling you that it needs authentication.”
  • Confirms successful login after applying correct password from documentation.

3. Sharded Cluster Architecture: Config Servers and Shard Initialization [00:08:29.510 - 00:17:35.170]

  • Clarifies that creating data directories (/data/db, /data/rs1, etc.) is necessary but insufficient.
  • Emphasizes that config servers must be explicitly started on dedicated ports (e.g., 27019, 27020, 27021).
  • Demonstrates starting config server instances with mongod --configsvr --port <port> --dbpath <path>.
  • Guides user to connect to a config server using mongo --port <port> and initiate replica set with rs.initiate().
  • Confirms successful replica set formation with rs.status() showing one primary and two secondaries.
  • Notes that learners mistakenly attempted to initiate replica set on port 20 instead of 22 (correct primary).

4. Shard Replica Set Configuration and Cluster Integration [00:17:37.670 - 00:23:14.790]

  • Learners create shard replica sets on ports 27022, 27023, 27024, 27033.
  • Instructor confirms rs.status() output: 27022 (primary), 27023/24/33 (secondaries).
  • Identifies critical error in instructions: learners were instructed to connect to port 20 (invalid) instead of initiating on 27022.
  • Corrects documentation: remove erroneous step to log into port 20; initiate replica set on first shard node (27022).
  • Confirms config server replica set is separate from shard replica sets.
  • Verifies that mongos (query router) is not yet running; must be started after config and shard servers.

5. mongos Router Configuration and Sharding Enablement [00:23:14.790 - 00:29:09.840]

  • Starts mongos process with mongos --configdb <config_repl_set_name>/<host:port> pointing to config server replica set.
  • Connects to mongos using mongo --port 27010 (default mongos port).
  • Enables sharding on target database: sh.enableSharding("<db_name>").
  • Creates hashed shard key on collection: sh.shardCollection("<db_name>.<collection>", { <field>: "hashed" }).
  • Notes prerequisite: must create index on shard key field before sharding (db.collection.createIndex({<field>: "hashed"})).
  • Confirms sharding status with sh.status(): shows shards, balancer state, chunk distribution, and config database metadata.
  • Identifies documentation typo: “shard two replica set” should be “shard one replica set” for consistency.

6. Monitoring and Differentiating mongos from Replica Set Nodes [00:29:09.840 - 00:34:55.910]

  • Clarifies confusion between mongostat (monitoring tool) and mongo shell.
  • Instructs to run mongostat --host localhost:27010 (on mongos router) vs. mongostat --host localhost:27022 (on shard node).
  • Explains key diagnostic difference: mongostat output on mongos shows no replica set name; on shard node, it shows “shard one replica set”.
  • Reinforces that mongos is not a replica set — it’s a stateless router that queries config servers to route operations.
  • Uses db.isMaster() on mongos to confirm it’s not a replica set member.

7. Replica Set Fundamentals: Config Server vs. Shard Server [00:34:55.910 - 00:39:16.140]

  • Distinguishes between config server replica set (stores metadata) and shard replica set (stores user data).
  • Confirms config server uses ports 27019–27021; shard replica sets use higher ports (e.g., 27022+).
  • Notes that mongos does not require a replica set — it only needs connection to config server replica set.
  • Emphasizes that replica sets provide high availability and automatic failover, unlike deprecated master-slave replication.
  • Mentions lunch break at 12:30, pausing technical instruction.

8. Contract Review and Employment Terms Discussion [00:39:16.140 - 01:04:18.460]

  • Session shifts to personal contract negotiation with employer (CyberFox).
  • Learner reviews fixed-term contract terms: 12-month duration, probation (3 months), 30-day notice period.
  • Disputes clause requiring daily reporting to CyberFox office; clarifies initial understanding was 2–3 days/week at client site (Investech), rest remote.
  • Confirms understanding: “I work based on what the client wants” — remote unless client requires in-office presence.
  • Notes non-compete clause: 6-month restriction from working with CyberFox clients after separation.
  • Discusses confidentiality, intellectual property, data protection, and damages for premature contract termination.
  • Learner seeks clarification on “place of work” clause, asserting misalignment with verbal agreement.

9. Sharding Prerequisites: Indexing and Port Conflicts [01:04:24.220 - 01:13:42.620]

  • Learner fails to shard collection due to missing index on shard key field.
  • Instructor demonstrates: db.collection.createIndex({ _id: "hashed" }) must precede sh.shardCollection().
  • Identifies port conflicts: ports 27017, 27019 already in use by default or config servers; new shard replicas must use unused ports (e.g., 27040, 27041, 27042).
  • Reinforces: create data directories first (mkdir -p /data/rs1, etc.), then start mongod with --shardsvr, --port, --dbpath, and --replSet.
  • Confirms replica set initiation on first shard node (e.g., rs.initiate() on port 27040).

10. Replica Set vs. Master-Slave Replication and Failover [01:13:44.780 - 01:20:12.610]

  • Clarifies that master-slave replication is deprecated; replica sets are the modern standard with automatic failover.
  • Explains that in master-slave, manual intervention is required to promote slave to master if primary fails.
  • In replica sets, election protocol automatically promotes a secondary to primary if the primary becomes unreachable.
  • Emphasizes that replica sets ensure data consistency via write concern: primary waits for acknowledgment from secondaries before confirming write.
  • Uses analogy: Oracle (master-slave) vs. MongoDB (replica set) — same concept, but MongoDB adds automation and durability.

11. Sharded Cluster Architecture Deep Dive [01:20:12.610 - 01:31:18.290]

  • Visualizes architecture: config servers (metadata) → mongos (router) → shard replica sets (data).
  • Explains chunk distribution: data split into ranges (e.g., A–D, E–H) across shards; config server tracks which shard holds which range.
  • When mongos receives a query, it consults config server to route to correct shard.
  • Notes that replica sets within shards ensure each shard’s data is highly available (3+ nodes per shard).
  • Emphasizes scalability: 10+ physical servers can be configured as 2 config servers + 8 shards (each a 3-node replica set) + 1 mongos.
  • Reinforces: “Cluster” = group of servers working together; “shard” = distributed data partition; “replica set” = data redundancy within shard.

12. Post-Course Access, Next Steps, and Wrap-up [01:31:18.290 - 02:50:00.080]

  • Confirms learners retain access to virtual machine and GitHub repository for self-paced practice.
  • Encourages learners to revisit replica set and sharding concepts, especially in context of Docker/Kubernetes.
  • Offers to schedule follow-up session to reinforce concepts with larger data sets.
  • Notes learner fatigue; acknowledges E2 and Winnie as engaged participants.
  • Concludes with personal and political discussion on South African military deployment in DRC, conscription, and national service — unrelated to technical content.
  • Ends with unrelated audio fragments (personal conversation, podcast-style banter) — not part of course.

Appendix

Key Principles

  • Sharding requires indexing: A hashed index on the shard key must be created before calling sh.shardCollection().
  • Config servers are replica sets: Must be started with --configsvr and initialized with rs.initiate(); they store metadata, not user data.
  • mongos is a router: It is not a database; it routes queries to shards based on config server metadata. Do not run rs.initiate() on mongos.
  • Port isolation: Avoid default ports (27017) and config server ports (e.g., 27019) when creating shard servers. Use unique, non-conflicting ports.
  • Replica sets > master-slave: Modern MongoDB uses replica sets for automatic failover and data redundancy; master-slave is deprecated.

Tools Used

  • apt install lnav — log file analyzer
  • mongod — MongoDB daemon (for config servers and shards)
  • mongos — MongoDB query router
  • mongo — MongoDB shell (for connecting to instances)
  • rs.initiate() — initiates a replica set
  • sh.enableSharding() — enables sharding on a database
  • sh.shardCollection() — shards a collection
  • sh.status() — displays sharding configuration
  • rs.status() — displays replica set status
  • mongostat — real-time performance monitor (run on mongos or shard nodes)

Common Pitfalls

  • Forgetting to create data directories before starting mongod.
  • Using ports already in use (e.g., 27017, 27019) for shard servers.
  • Attempting to shard a collection without a pre-created index on the shard key.
  • Logging into a shard node to run sh.status() — must connect to mongos.
  • Misinterpreting mongostat output: presence of “replica set” label indicates connection to a shard node, not mongos.
  • Confusing config server replica set with shard replica set — they serve different purposes.

Practice Suggestions

  • Recreate the entire sharded cluster from scratch using different port numbers.
  • Simulate a primary node failure: shut down the primary shard node and observe automatic failover via rs.status().
  • Use mongostat to compare output when connected to mongos vs. a shard node.
  • Create a 3-shard cluster with 2 config servers and 1 mongos, then insert 10,000+ documents to observe chunk migration.
  • Review MongoDB’s official sharding documentation to compare with the steps followed in this course.