Mongo DB for Administrators


language: EN

                  WEBVTT
Faith, please log on again into the database.
So you need to do Mongo SH first.
Then you can run that stuff.
So Mongo SH and then you can rerun the stuff, Mongo SH.
Yep.
So you can rerun whatever you want to run.
Hi Kunkulani, after that delete, do I need to?
Yes, continue.
I had Austin saying that we need to copy something from the top again.
No, no, no.
When you drop the database.
So the next one, number two, the last command.
Yeah, it's the DB.
When you drop the database.
You go to the first one.
Then you go to 4.0.
Then you copy that and paste it again.
So yeah.
Yeah.
So for you, Faith, you still can be able to do 2.1 up until the dropping of the database.
After the dropping of the database, then you go back to what you call this.
You go back to number one, 4.1.
Then you'll be able to read the data.
Just a quick question for me.
Yes.
Is cause a collection?
So if I was to want to sort or find all the subjects in a course, will it be db.course.find?
So if you log into your database and show collections, was it made part of the collections?
So what you can do is just go to your terminal.
Yes.
What were you asking, Kunkulani?
Okay.
Sorry.
I didn't realize I'm on mute.
Yeah.
You were asking.
You can continue.
I was asking if cause was a collection.
Let's say I wanted to list all the courses.
Will it be db.course.find and then dot sort?
Not.
I don't think cause is a collection.
But what you can do is while you're still in the university, just do show collections
and then be able to see if it is actually a collection.
It would actually be a data that's an array that's in part of a document, right?
So there's courses, there's students, right?
So now when you want to find a specific course, you say db.course, which is the
collection name, and then find whatever it is course that you want.
I'm trying to list to see what courses are available.
Okay.
So just do db.course.find and then your, what you call this, your brackets.
So just do your brackets and then exclamation.
Yeah.
Oh, sorry.
Sorry.
Not exclamation by semicolon.
Okay.
So this means this is the data that's there.
So it's got a course name.
It's got the credits, got the duration.
What's her name?
Color fellow.
You don't need to type MongoSH again because you are already in MongoSH anyway.
So you just need to go down and recreate that script.
Don't have to worry about the user.
Just go and just create the collection first.
Just go up a bit.
You probably need to create a collection first, which is students and then run that script.
So you're already in the, in the database.
So you don't have to worry.
So just create that.
Yeah.
And then you can run the script.
Yeah.
Yeah.
You can paste.
Hi, do I need to complete the 2.4?
2.4.
Yeah.
You can do it if you want.
It's just creating a user so you can do it.
That's not a problem.
Okay.
Yeah.
We haven't activated authentication.
So it's just like editing documents that you're doing.
You won't really see the effect of the user because we haven't gotten to the point of
authenticating.
After running the stats, what are we looking at?
It just looks like metrics.
What is that?
Just enlarge that.
And then let's go to the top.
So it will show you different information.
Just, just up a bit.
So any creation strings that have happened, so any access patterns, any metadata.
Some of this stuff is not really important unless you have to be troubleshooting.
And this is when you're troubleshooting deep, deep stuff.
Right.
So it's, it says here it's wired, the wired tiger, which is the type of storage that
came with the MongoDB.
And then it also shows things like your auto commit, your backups.
Did you even do any backups?
Was there any incremental backups that were done?
You can see where it says backup, total modified incremental blocks with
compressed data.
Because you didn't do any backup.
You didn't do any incremental backups.
Then it will show zero.
Right.
So it's just information related to different things in essence.
Right.
Your block managers, how many blocks have been allocated?
How many blocks are free?
You know, things like that.
File allocation unit size, which is four gig.
So it's just statistics about a whole lot of things in relation to the
university database.
Right.
And to whatever collection you have put to say, I want this, I want stats about
this specific collection.
I'll go down a bit.
Some transaction.
Did you do any rollbacks, you know, any rollbacks in terms of stable
history store keys that would have been swept in non-dry RAM node.
So it's, you can use it while it's troubleshooting, wanting to see
probably what could be happening with the specific collection.
Right.
In terms of is there any backups, for example, if there was backups when
last was it done?
How many backups were done when it comes to transaction, right?
How many transactions were done?
How many stable keys were removed?
How many stable keys were restored?
You know, it's that type of stats, but this would be when you're now
doing advanced MongoDB because now you want to, besides the
performance level that you're looking at, you've managed to index
your fields and all that stuff.
Now you want to look deeper into the collection.
Why is it there's always a problem with this specific collection when
you're collecting data.
Let's go have a look at that.
That's when you can be able to use this.
Sharded.
It's not sharded.
We're going to get to a point where it's sharded.
The size in terms of what you call the size of the file and then
the count, how many are they?
I think if you remember, we're adding 2,000 students.
Total index size.
As we said, index, what indexing does is the one that you index,
we are saying that's the frequently accessed data or frequently
created data.
It takes all that information and puts it on the side.
It creates a separate file where you've got the indexes so that
when you need to create that information, that's got the index.
It doesn't really need to go and look for that information in the
whole documents, right?
Because when there's no index, what the database does, what
MongoDB does is it goes and scans the whole collection, right?
But if you've got, let's say, ID number as your index, it will just
go to the indexing, look for that specific ID number, and then
pull out that information from that specific ID number.
It's not going to need to go into the collection and scan the
whole collection to actually look for that one specific ID number,
if that makes sense.
Happy with that?
I think I'm done on my side.
All right, cool.
It's 12 o'clock anyway, so we can take our lunch break, and then
we'll continue at 1 o'clock for those that are done.
Thank you.
Where is done?
After doing number two?
Yes, number two up until you get to 2.4.
You'll be done.
Yeah.
All right.
So if you're done with 2.4, then yeah, you can take your
lunch, then we'll be back at 1.
Thank you.
We will not be proceeding with the application anymore.
I'm not quite certain we took up to the investigations.
We took up to this matter, and we would like, if possible,
if all investigations have already been completed, that
disclosure be given to us for the defense because of our
situation.
However, we have come to the realization that for us to
pursue the application at this stage, while there's another
meta painting on the other side, would honestly be a
futile exercise for us to attend the application at this
stage.
That is
That is
I'm not.
Yeah, you're not.
Oops.
Who is it?
And we must see it as a
f***er.
I don't know, I don't know, I don't know.
Hey lovely souls, spend the rest of my day with me.
I had such a slow morning, I had no power, no energy to do anything.
But, by the grace of the Lord, what happened then?
F***er.
F***er.
Who your core audience is, such you can figure out what to sell to them.
Makes you understand that his followers, the people that actually engage in this content, are loyal, die-hard fans that are, to be honest, yeah.
So he calls them chillers after his podcast.
He made something for the chillers, which is why the brand named me.
And the choice of the product is honestly incredible.
Before we get into the marketing.
So you already had the fan base, but the product itself, anyone can say it's saturated,
because you're already competing with these ones, because they have the same price cap.
And maybe even Black Label, but I don't think someone who drinks Black Label would care for Chilla's Punch.
So he understood his audience.
He knew I'm going mass market, I'm not going luxury, I'm not selling something too expensive.
That's what Granger's for.
But Chilla's Punch is going to be something that my fans can hate.
And honestly, from what I've seen on social media, and the way he's structuring the product, he has perfected everything about marketing.
The major secret he has is his podcast.
Chilla's Punch is there on every single table, no matter who he's interviewing.
And that helps him because even when people cut out clips for the podcast, they post them on their stories, they share them with their friends.
Chilla's Punch is always in those videos.
So he has a ton of social media reach organically.
And number two, even with people who aren't directly affiliated with Chilla's Punch, meaning he doesn't have to pay them to be influencers.
There's tons of content floating around of them with Chilla's Punch.
When you think of Oscar MBO, we know that he's sponsored by Juicy.
But here's a video of him on his podcast with Chilla's Punch.
That video has like 40,000 likes, meaning people are viewing, liking, and seeing one of their favorite artists with this product.
Another major thing he's done is that he wasn't afraid to completely remove himself or not make himself the center of attention for Chilla's Punch, unlike some other alcohol fathers.
When you look at Chilla's Punch's Instagram page, it's not just MacG.
There's new flavors dropping, there's people making fun, and I give them the activation.
Actual advertisements that you see on TV, he has an entire creative team that's working on this thing, meaning it's not just, oh, I'm MacG so I'm enough to sell this thing.
He understands where to succeed even beyond him.
I need to not be the face, but allow it to run on its own.
And it's clearly spreading like crazy.
Millions of people across the world watch Big Brother.
Him partnering up with Big Brother puts Chilla's Punch in front of people who never watch the podcast, never even see MacG, don't know who he is.
But because they follow Big Brother and they see the contestants enjoying Chilla's Punch, it gives it reach, it gives it legs, where he himself doesn't have to be a part of it.
You yourself, right now, you can just search Chilla's Punch and you'll see people making their own content, tasting it, tasting the different flavors, buying six-packs cases, showing that it's something that people want to try, want to make content about.
Even if MacG's not paying them, they don't even know him personally, just because of the popularity of the drink itself.
To summarize everything MacG has done for Chilla's Punch, it's being omnipresent without centering it around him.
So being able to remove himself from the center of the brand, but have people make content about the brand.
Have this podcast and every single celebrity who comes on, if people are interested in the celebrity, we see the celebrity with Chilla's Punch.
Being able to make connections with Big Brother, getting huge distribution deals with places like Pick and Pay, all of these things that he has done to push his brand to the point where even if he stopped talking about it, the sales would still account.
Because when you think about it, who is Samara's main influence? Who is the founder of Black Label?
You don't need to know the people behind it for it to sell in huge numbers.
And for something that was launched just in 2023 to accomplish so much in two years, clearly MacG nobody's doing.
First things to do.
And the patients wouldn't have been so surprised.
I don't know those people.
So don't come here and waste your time and space to push your own business.
Your own business is worth it.
There are thousands of years for financial interest.
I'm not saying it's a joke.
You can make the most of my work.
I would always say that if you're a celebrity, you're not in my business.
Don't come here and waste your time.
I'm not in my business.
I'm not in my business.
We are the ones who are in my business.
I'm not in my business.
My business is that because I'm in my business.
Every single one of you people are in my business.
And I can't help you.
Never mind the people who are in my business.
I'm not in my business.
I'm not in my business.
People who are taking care of the kids.
They're making a fool out of themselves
and they're all in their own business.
So don't do this.
I'm not in my business.
The men are not going to work here.
Nobody's going to get into your business.
Everybody is crying.
Kids are screaming.
Okay, bro.
And in.
So there's no point.
Okay.
Okay.
Okay.
Okay.
Okay.
Okay.
Okay.
Okay.
Okay.
Now, it's gone is enforcing stricter regulation.
Okay.
Okay.
Okay.
Okay.
Okay.
Hundreds of people.
Okay.
Okay.
And, you know, we got here today at seven and we sit in line and we'll start with you.
Here we go.
Those are the hues.
I'd like to give a shout out to an amazing entrepreneur that I found in Jusloongway
One of the things he did is he thanked me for a video I made a couple of years ago saying
please look into flying because sometimes flying can cost almost the same as catching a taxi or bus.
And just recently I realized that some of the cheapest flights from Johannesburg to German or the other way,
you're looking at about 470 rand minimum, 570, 650 rand.
What do I need to do?
These ladies, if you don't mean this in a funny way, these kind of people inside,
there's one thing that you're not going to leave out on the things that you're buying in.
Will I buy my car? Will I buy my whatever?
This must be powerful to buy me.
If you feel like you know me, help me.
Let's be financially responsible.
Good bye, welcome to a day in my life of an anti-compliance auditor working from the office petition.
First of all, my commute to work is not very demure because tell me why I wake up so early in the morning
and come here for me to get to work very late.
Well, this is because I stay very far from my office,
but this is not a big deal because I only go to the office once a week.
My biggest problem is that I have to take an Uber to the train station and the traffic, oh my god.
Yeah, my office and my needs are so busy that I couldn't make it.
When we say date somebody who is kind,
we mean somebody who knows how to manage their emotions and is not going to snap at you or speak to you in an ill manner
because they do not manage their emotions.
We mean someone who is empathetic and has compassion and is able to put themselves in other people's shoes.
We mean someone who is honest.
My friend and I went to this other meeting.
Then after the meeting, I decided that I must have something to eat,
which is what my treat to get.
I think we are on a tight budget.
One was almost hijacked of work.
You know, the gun pointed at me.
So instinct take over.
The training comes to the fore.
So you were in service.
I was not at work.
I was not at work.
I was just traveling around.
How did that end?
It was a bad day at work.
No, it's one of those things.
It's one of those things.
Hello guys, are we back?
Are we all back?
Thumbs up if you're back.
I can see Austin is back.
Faith, Caula, Phil, are you back?
Okay, let's give them probably five minutes.
Are we all back?
Yes.
Okay, faith is back.
Colorful.
Are you back?
Okay, cool.
Okay.
Anyone with questions with what we did earlier on?
Before we went for lunch.
Okay, take silence as no questions.
No question.
That's fine.
So let's continue with some theory.
So now I want to get into monitoring.
We want to get into the few tools that you can be able to use to
monitor our database or mongoDB.
So firstly, let's talk about mongo stats, which is a command line
to wait sort of provides statistics about the mongoDB instance, right?
It's basic usage is just mongo stats and you can be able to then
add some options.
So for example, you can add a remote host.
You want to monitor remote host.
You can add a remote host.
Be mindful that when you add a host, you need to add the specific
part and then you can set the threshold intervals.
How often do you want the statistics to refresh?
It could be five seconds.
It will be 10 seconds, you know, and in the case of this
example, it's connecting to a remote host and it's going to
display five rows of statistics and it's going to refresh every
every one.
Let me take the pen every one second, but it's going to
showcase five rows at a time and then you can also customize
your output, right?
When you customize the output, if you put the dash, dash
discover, then you can be able to specify which foods you
want to be able to do, but we're going to look at it and be
able to see what is it that it shows us.
But the key metrics is that you need to look at the metrics
that you need to look at when it comes to mongo stats is
your insert because then it's your insert operations per
second, your query operations per second, your updates
operations per second and your delete options per second.
So this information in terms of your your operations, right?
You can be able to see if there is something that is
delaying, right?
Let's say it's an update.
You can then be able to ask yourself why is it an update
is delaying, right?
Or if it's just a normal query, it's just a query that
needs to pull information.
Why is it delay?
Is it probably it's going through a scan?
Is it probably the query that's running doesn't really
is not querying any field that's got an index to it,
right?
You can be able to see all that stuff.
You can be able to get a number of commands executed
per seconds per second.
This is in the instance where you've got an application
that's running that's got high.
What you call this DB querying that's involved.
You can also see the percentage of dirty cash dirty
in the cash, which means there is information in the
cash that is not really sort of used.
So you call that dirty cash the amount of percentage
used in regards to cash and then the number of
per second and then some network in and network out,
right?
All those metrics can be able to determine so it can
point you to a few pointers right of what is it that
you need to look at if it's a networking issue,
then your network traffic in might not be making
sense as compared to the network traffic out for
example, right?
Percentage of cash that's being used if it's about
to be full, which means probably it's going to be
slower, you know things like that.
If your delete is taking forever to happen, why is
it it's taking forever to happen?
Is it because it's going through a whole scan or is
it because the query is not sort of referencing
data that's not indexed?
So those are some of the things that you can
look at or could it be an input or could it be
a disk IO situation where probably the disk is
failing.
So now the IO is now bad.
It's now very very high because you also want an
IO that's very low the read and write should be
very very quick rather than taking forever.
So it can point to this metrics can be able to
point you to the different sort of directions
that you can be able to look at.
That's usually the what you use Mongo stats for
right and then a wireless on that, you know,
these memory right for caching data and indexes
and the key metrics to monitor there is the
working set.
So the working set is the portion of data and
indexes actively used by the application right
when you go to the memory usage, right?
There's that part where it actually has the
indexes and the data that's actively being used
by the application.
It's very what you got the sickness to be ready
right?
And then you've got the cash you should cash
catch you say, right, which is spent percentage
of memory used by the wire.
Tiger cash wire tag is more or less a storage
type that is used by MongoDB.
Usually when you install MongoDB it comes
with wire tiger.
But then in essence when you see the percentage
of memory used by that you can be able to
determine.
Okay, is there problems there blockage in
regards to what you call this in regards to
what you call this the usage right the
percentage is it too high with the memory
that's available now you need to look at is
it that there's too much dirty cash or you
need to increase the amount of memory
available because probably is now failing to
handle right.
You can also check memory usage by doing
your DB server status and then look for
the bytes right currently indication maximum
bytes configured right if you configured
2000 megabytes right and your cash is sitting
at 1900 then you need to why then where is
the dedication sitting is it sitting on out
of that 1900 is it 500 megabytes you need
to worry about a high level of dirty cash
right.
You can also use mongo stats to monitor
your cash as I said on the previous slide
that you the key metrics is that you'll be
able to see things like your daddy cash
your used cash so you can be able to use
mongo stats to be able to see the dirty
and used fields right then let's look at
some IO performance in which is your
input output which is your right read right
right.
It's a very critical performance when it
comes to mongo DB because mongo DB is
meant for high read write operations
right so because this data is coming on
manufactured are mostly applications
that have unstructured data right read
and write a lot right because it's very
very unstructured right.
So the key metrics is that you need to
look at is disc latency the time taken
for any read or write operations right
and then the input output operations
per second.
So disc latency is how long does it
take to read how long does it take to
write if it is taking long then there
might be an issue with the disc itself
right.
That could be another issue number one
number two it could be writing a whole
lot of data right.
So it also these are metrics is that
sort of give you pointers toward where
to look at is our storage right is
our disk the hardware part of it
right.
Do we need to change disk and upgrade
to SSDs if they're SSDs then what
could be happening.
Let's look at the out of how much
brothers the disk is that the disk in
themselves right.
And how do you check IO performance
you can be able to run your DB status
and then Wyatt Tiger dot concurrent
transactions.
So your concurrent transaction will
be able to tell you if there's a
delay in terms of the writing or
the reading right.
So you look at your read and write
ticket usage.
That's a way that you can be able
to look at and then you've got
IO stats which you use to monitor
your disk IO right.
It can be used to track your disk
activities show that the performance
how is it performing and whatnot.
Any questions on the monitoring we
we have a practical for some
monitoring.
So any questions on that okay cool
silence is no questions.
Now let's talk about also monitoring
with the web console right.
So MongoDB has two versions there
is the enterprise version which is
the paid one and then there's
the community one.
Now the community one doesn't come
with a web interface.
So you have to look at open source
web interfaces that are out there
right.
But when it comes to the
enterprise version of MongoDB
right.
It comes with MongoDB Ops Manager
right.
And on the MongoDB Ops Manager
or whichever one that you be able
to use it comes with features like
real-time monitoring.
So you can be able to see metrics
in terms of real-time such as your
operations your memory usage your
network activity.
You can also set up alerts for
critical metric metrics right.
Your the high CPU usage your low
memory and that it can also help
in terms of analyzing query
performance right.
You can be able to see the
slow running queries.
You can identify them if they need
to be improved you need to do
any indexing to be able to
improve if there are any things
like any summing or any aggregation
that needs to happen.
How can you then improve and then
you can also be able to do your
backup and restoration from the
web console right.
The real-time monitoring alerts
and performance analysis.
You can also be able to incorporate
external tools right.
Like your Prometheus and Grafana
your Nagios what else your new
Lake of the top of my head.
I think up dynamics or data dog
actually does that it's got that
part where you can be able to
install it.
So those tools also can be also
be incorporated to be part of
monitoring and you also have
that web console that you can
be able to take your real-time
monitoring your alerts your
performance analysis and all
that stuff.
Any questions on monitoring.
Let's talk about indexing right
and what is in an index right?
It's basically a data structure
that helps in improving the
speed of data retrieval.
So what happens is when you
create an index using let's
say if a field let's say your
ID right.
It takes that ID food and
places it somewhere out of what
you call this out of the
whole set of data right and
what it does is it references
anything that actually that
actually what you call this that
actually is related to that
field.
So if you've got let's say ID
number for someone you work
in a bank and they index an
ID number it means it's going
to be easy doesn't really need
to go through the whole scan
of the whole collection to be
able to actually what you call
this to get that ID number it
just goes into the indexes
which is a separate sort of
file right where that data is
placed and then it's easy to
retrieve anything that's related
to that specific index, right?
So how does indexes work?
So it uses a b3 so a balanced
tree to store indexes.
So you've got a mother index
and then you've got child
right.
So you've got the mother which
is the index field and then
child child being anything that
you are going to be able to
pull in the relation to the
mother right and it contains
the value from the index
field and the pointer to the
corresponding document, right?
And then when a query is
executed, it uses the index
to quickly locate the
relevant documents.
So because you are using ID
number, right for CUMBLAN
the moment you punch in that
ID number it quickly rushes
and gets because already it
has the references that lead
to that document as compared
if you don't index the ID
number then it means it needs
to come through the whole
collection to look for the
ID number amongst all the
documents that are in there.
Right and then there's types
of indexes.
So firstly, there's the
single field index.
You're going to talk about
it, but basically that's how
you create it.
So which means there's a
single field in it, right?
And then there's the
compound index where you can
actually have what you call
this you can you can you
can actually have many
fields right multiple foods
being you where you can be
able to to index that and
then at the same time you
can also create what you
call a multi-key index a
multi-key index is there's
nested documents are in
array right for example
subjects you can index the
subject because there's things
under it.
There is a whole lot of
subjects that are in it.
I think it was color fellow
that was trying to list the
number of courses if I'm
not mistaken.
So this can be an example
where you you index right
courses right and be able
to the sort of it will be
easy for you to be able
to actually reference the
courses or it could be
subjects right where you've
got your studying it and
then the subjects below if
you index subjects, it
means that you'll be able
to index everything that's
under it, which is an
array which is a list of
subjects or a nested
documents where nested
documents is where you've
got sort of documents that
speak to each other, right?
And then you've got your
text text index.
Let's say you want it
supports search queries on
string content.
You've got the name you
want to index the name
which comes as a text,
right?
So it then becomes easy
for you to be able to
search your text.
You also have geospatial
indexes where it supports
your geospatial queries or
coordinates.
These ones are used mainly
by your Ubers and your
boats and your e-hailing
platform or anything that's
delivery related.
They store geospatial
coordinates data right
data where you can be
able to use or you can
be able to index
coordinates right to be
able to use them maybe for
delivery or for pickup or
for whatever.
Then you've got what you
call hashed index, right?
This hashed index is mainly
used for sharding when
you're doing sharding,
right?
So you hash the you index
the hash of the field,
right?
The contents of a field and
it would be in a hashed
format, right?
That's when you especially
when you're doing sharding
is very very good, but
we'll get to the point
where we actually do the
speak about the sharding
and then be able to explain
this hashed indexing and
then you've got TTL index
where automatically you
remove documents for after
a specified time, right?
To the document should
expire after 3,600
seconds or after so many
years or whatever, right?
But usually TTL index is
where you don't really need
to store the document for
too long, right?
And you don't want that
manual part of you having
to log into the database
to actually do some manual
work, then you can be
able to do your TTL
index, right?
How do you manage index?
So there's the creation
of the index, right?
So it's db dot the
collection and then the
index operation that you
want to do, right?
So in the example there
you've got db dot students
students being the
collection and then you
want to create an index
of the name, right?
Of understudents.
So when you go into the
collection student,
when you look at it
at the table on the
relational database side,
it's a table.
Then what you call this
you want to use name
as an index, right?
You can also be able
to list index which
is get indexes and then
you can also be able
to drop index, right?
You can also be able
to rebuild an index, right?
Let's say there's some
complications that
would have happened.
You had multi-key or
you had confound.
Now you have to break
them and now you want
to rebuild the
whole collection, right?
It's a matter of just
running db dot the
collection name and
then dot reindex, right?
Now some tree structure.
So tree structure is used
to store indexes
as I say, right?
Which is a balance tree
that allow efficient
insertion, deletion
and search operations, right?
And each node in the
B-tree contains multiple
keys, right?
And pointers to child nodes, right?
So your key is your index
and then your node is
the document that you
need to reference to, right?
So indexes are also stored
in a separate data file
within the db path
directory, right?
So you can set that up.
You can add to where
you want your indexes to be.
You can set that up, right?
And each index entry
contains two things.
It contains the indexed
field value, right?
Let's say you are indexing
ID, right?
And then a point to
the corresponding document,
which means to whatever
document it is that it
sort of the index value
is connected to, right?
Then there's index selectivity
where it refers to
how unique the values
in the index field are.
They are always unique, right?
They are very, very highly
selective, which means
they are unique values,
which obviously results in
better query performance.
And then there is the
low selectivity, which
has got many duplicate values.
But what it tries to do
is it avoids mostly
when it's indexing,
avoid low selectivity
where you've got
what you got is many
duplicate values, right?
Don't use that.
Rather use high selectivity
where you use unique values,
right?
Unique values could be your
user ID, for example, right?
It's going to be unique
or an email.
It's meant to be unique.
There's no way that you
find two people with the
same email and then they
receive mail differently.
I don't think there's
any of that set up.
It might differ just one part,
you know?
And then there's also what
you call index cardinality,
where it refers to the
number of unique values
in the index field, right?
So high cardinality fields
are very, very good.
Things like your email,
your ID number,
passport number,
because they are unique.
They are never the same.
You never find two people
with the same email.
You never find two people
with the same passport number.
One has to be fake.
If you find somebody,
two people with the same ID number,
one has to be fake.
So those are the high cardinal,
what you call this,
candidates for indexing, right?
Low cardinal ones are your gender.
You've got a million mail, right?
And you don't want to be using
that because if you're going
to index your gender,
then you won't benefit that much.
It will still need to scroll
through the whole,
if there's 1,000 mail, right?
It needs to scroll through
that whole 1,000 mail.
And then indexes also
consume size, right?
They also consume,
because remember,
when you create an index,
it creates a data, right?
That it separates from the main DB, right?
And when you then reference that index,
obviously it's pointing to there.
So indexes can actually
consume additional storage
because there is the issue
of the size of the index
field in itself,
and then the number
of documents in the collection.
What indexing is meant to do
is it's meant to bring
whatever data close by
so that it doesn't have to
go through a whole scan
when it needs to find a document.
So you can imagine bringing
that data close by
so it creates that file layer,
that index information,
and all that stuff, right?
Some best practices for indexing.
Frequently created fields
are very much important.
That's number one.
The ones that are frequently created
is what you need
to look at indexing, right?
And also create compound indexing, right?
Focus that need to filter
on multiple fields.
But also, do it wisely.
Don't just do it.
You can't just index a field
because probably you feel like,
what do you call this?
It will need my multiple and whatnot.
But is it frequently accessed?
That's number one.
Number two, do you really need to index it?
Because sometimes it's not wise.
It's going to impact performance
when it comes to indexing compound indexes.
And there's a whole lot of documents
that it needs to go through
or a whole lot of fields
that it needs to go through.
So you need to be careful about that.
The other thing is don't over-index, right?
Because over-indexing
slows down your write operations, right?
And it also consumes storage, right?
Then monitor your index, right?
It's very, very much important
to monitor index,
whether you're going to run your index stats
or you're going to be having a tool
that is going to be doing
the real-time monitoring.
You need to be very, very careful of that.
And then there's a query that is covered.
The query that's covered is,
we are saying that this query
is basically based on,
it's optimized, right?
And in essence, you can be able to,
let's say, index it as a whole,
create a compound index,
but for the benefit of performance, in essence.
So you need to use covered queries
where, in essence, you can be able to use
indexes in a very wise way.
Now, single-field index, right?
Where you've got a single-field as an index
in a collection, right?
It helps speed up your queries
where you need to filter,
where you need to sort,
where you need to aggregate.
And then how does it work?
Obviously, it creates a P tree structure
from the index field
and then points to the relevant
corresponding document, right?
An example there is when you create an index
of the name in the collection students, right?
One indicates ascending order,
minus one indicates descending order, right?
And then some use cases is when you need to filter
or sort by a single field.
Finding what you call this,
all students with a specific name
or certain students by age, right?
And then you also have compound index
where it says it's a combination
of multiple fields in a collection.
It helps when you do your filter sort
and aggregate also.
And the order of the fields in the index meta, right?
Queries can use the index
if they include a prefix of the indexed fields, right?
An example being an index on name and age, right?
Can be used for queries on name or name and age,
but not just age, if that makes sense.
So in this order, you've got name and age.
So you can use a query to look using the name
or using the name and the age, right,
as your condition.
But you can't just use the age, right?
And then use cases when you need to filter
or sort by multiple fields.
Obviously, for example, finding all students
with a specific name and age range.
And then geospatial, which is more or less for maps,
mostly for maps, used for your UBAs and whatnot.
Supports two types, which is the 2D for flat 2D coordinates,
your latitude and longitude.
And then you've got 2D sphere, right,
for your spherical geometry, right?
Geo JSON objects and all that stuff.
How it works?
It utilizes specialized data structures,
which is, for example, geoaching,
where it indexes the geospatial data.
And then queries can find documents near a point, right?
Or an intersection line, right?
And then an example is obviously creating
a 2D sphere index on a location field.
For example, the db.places being places being the collection
and then create index on location to this sphere, type to this sphere.
And then the use cases, location-based queries,
finding nearby restaurants, for example,
or finding all places within a certain distance from a point.
So when you do your Google search,
these are some of the indexing that really happens.
Some best practices.
Single-field indexes are used for queries
that you need to filter or sort by a single field, right?
Compound when you need to sort by multiple fields.
Geospatial when you use for location-based queries.
And then avoid over-indexing.
They can be very difficult to manage, right?
There is the part of it slowing down some operations.
There is the part of it consuming storage.
But also, there's the part where it can,
if you're not very careful enough, right,
it consumes storage.
It will take bigger space, right?
And then also monitor your index usage, right?
Then there's something called a query profiler, right?
A query profiler basically is used to be able to identify
suboptimal queries.
And what are these suboptimal queries?
These are queries where they take long time to execute.
They consume excess CPU, memory, or disk I.O.
They retain large amounts of data unnecessarily
and do not use indexes effectively, right?
And some common causes of this is
where a query needs to do a full scan, right?
It needs to go through a whole collection
or whole database, right?
But mainly a whole collection, right?
Where it queries an entire collection
instead of using indexes.
And then an example is where you have to create
a non-indexed field, right?
And then inefficient index usage,
where you've got queries that use indexes
but still perform poorly due to low selectivity,
the type of field that you've chosen.
There's many duplicate values, right?
It's number one.
Incorrect index order in compound indexes, right?
Where you're starting with the age and then the name,
for example, in the example that we had, right?
Now it's going to save through everybody that's got 20 years
to look for a name kumbulani on it, right?
And yet if you do vice versa,
to look for kumbulani that's indexed
and then the age is 20.
So it becomes easy, you know?
It doesn't have to go through a whole lot.
And then large results sets where you've got queries
that retain a large number of documents, right?
Where it then causes high network and memory usage.
That's also another common cause
of suboptimal queries.
And then complex aggregation pipelines
where you need to do a whole lot of summing
and this and that and that, you know?
When they are complex, usually they have,
what you call this,
they need more power to be able to run that.
So those are some of the sort of
common causes of suboptimal,
what you call this, suboptimal queries, right?
And then how to use the query profiler.
It's an inbuilt MongoDB tool, right?
So it helps you identify your slow, inefficient queries,
collect detailed information about that,
your execution time, your index usage,
the scan documents and all that stuff.
So you need to enable the profiler, right?
And enabling the profiler,
it sort of has three profiling levels.
Zero, it's off default.
One is log slow operations, right?
Which are threshold in milliseconds.
And then two, which is log operations.
So logging all operations would help
in terms of when you really need to troubleshoot,
but you can leave it to say log slow operations.
And you need to set a profiling level, right?
So for example here,
you've got DB set profiling level to one
and anything slower than a hundred milliseconds.
So you can set to say our target time
for query execution is a hundred milliseconds, right?
Then you run that and it should be able to tell you
which ones are the queries that are running slow
at the end of the day, right?
And then you can also be able to verify profiling level
by doing the DB get profiling status.
And then after that,
you can then be able to sort of change anything
that you want to change or make changes, right?
The profile data, right,
is stored in the system profile collection.
So if you want to see anything,
it would be good to go into the db.system.profile.find
and then you can be able to sort, right,
with the limit of 10 in essence.
Some key fields in profiler data.
Operation, what is the operation time?
What is the namespace?
Usually it's namespace is constituted by the database
and the collection.
There was a time that we did statistics,
I think it was Color Fellow.
If you looked at the bottom, very bottom,
there were lines, a few lines
that were not part of the brackets.
You would find that there's somewhere where it says NS
and it will have the database name
and the collection on it, right?
Then execution time in milliseconds
and then some plan summary, the query plan, you know?
Call scan for collection scan,
Ix scan for index scan.
It will show you that.
Number of keys examined, number of index keys examined
or docs examined, the number of docs examined
and then number of documents returned.
So you need to be mindful of that.
Any questions when it comes to these topics?
Okay, seems no questions.
I think we can go to our practical
and then move over to the next one,
which is early.
Where is my machine?
Okay, there it is.
Where is that?
Where is my window now?
Okay, there we are.
Cool.
So we've done one, two.
Three, as I said, is going to be optional.
And then what we need to look at is the security part.
Now to do a bit of explaining, right?
Please be mindful of this.
So this part, you don't need to do it
because it's setting up of the MongoDB from Sketch.
You've done that already, right?
And then you come to the creation of MongoSH.
You've created a user earlier on.
But also be mindful that on number two,
there's a time that you actually deleted the user
as part of it.
Are we together?
I think there's a time that you created a user,
and then it was deleted as of exercise number two, right?
So if you had deleted, then you need to recreate.
Now there's this part that I want to explain, right?
So you see it says, edit etcmongod.com, right?
So what you then do is you go to your terminal,
log out of the DB if you are logged into the DB, right?
And then what you need to do,
whether you use Vim or you use Nano, it's up to you.
I use Vim.
So now you need to go to the location,
and then mongod.com, right?
If you go into it, it's got different subcategories.
So for example, the DB part is sorted already, right?
This is where you store your data.
If you are using a specific engine,
you then can be able to put it there
and obviously add some details to it.
Then where to write the logging data, right?
The destination is the file append is equal to true,
so it doesn't need to remove some stuff.
It just needs to add on top,
and then the path to it, right,
which is your mongod.log.
Usually most applications
have got that log file sitting in var log, right?
So they'll have var log and then create a file,
a directory that will have a specific file, right?
And you also have your network interfaces
and then how the process runs, process management.
And then when you come here, when it comes to security,
for you to enable security,
you need to come and then remove that part
and then go below one space,
not one tab, one space, right?
And I think it's security, if I'm not mistaken.
Authorization, sorry.
Authorization and then equals to true, right?
This is how it should be, right?
And then you save your file,
and then you need to restart Mongo
for that to take effect, right?
You need to restart Mongo for that to take effect,
or else it would not take effect,
so you then come and then be able to then restart that,
right, which is good.
Sometimes if it fails to restart,
all you need to do is just do status,
because sometimes it will be an issue within your,
oops, it actually failed.
Okay, that's weird.
MongoDB server, let's see,
failed, control error during global initialization.
Status, restart MongoDB.
Let me have a look again.
Romango.d, so security.
Is it true or enabled?
I think it's enabled.
Yeah, it's enabled, that's fine.
Enabled, and then restart that,
and then status, okay, it's happy now.
So just be mindful of that.
Now, after you restart it,
which means now anything that you're going to need to do
is got authorization, so you need to log on.
So be mindful, I didn't create the user,
I didn't even create any user,
but in essence, when you then need to log on, right,
it's a matter of just, you don't need,
because you've already created the user admin,
you probably don't need to do this whole process.
If you just do this, right, authentication database admin,
it will be able to ask you for admin,
because that's the only user that you have.
But in the case where you have different users already
in the database admin, in the admin database,
then you need to do this and be mindful of the password
that you used earlier on to be able to create the user.
Happiness, understood?
Are we understood with that?
Sorry, no one's answering.
Mina joined later, so I am trying to catch up.
There is a recording, so it's okay.
Yeah, I'll check it out again.
It's just I'm struggling to connect to where you are now.
Oh, is it?
Yeah.
I'm sharing a screen.
I'm not sure if you'll be able to see that.
Okay, I can see that.
Oh, we're not going on this link, right?
It's only the other one that we're working on.
Which one?
The link that there's one that I sent in the chat, right?
It's for you to create your virtual machine,
like the one that I have on the screen.
And then what do you call this?
That GitHub link I'm going to share also
so that you can be able to go into it
and then be able to follow number one, two, three as you go.
Okay, then.
All right, cool.
No problem.
I actually wanted to explain number.
So when it comes to secure deployment,
this part really works when you've got three different machines.
You're not using local host.
So the part of the enabling of the SSH, the binding to local,
it's already done.
But then the enabling of SSL or TLS,
you don't really need to do it because it's just one machine
that we're using.
So it wouldn't be right.
But the rest of the stuff, I think people can be able to do it.
I'm just there when you were explaining,
I think you got quiet because you were going too fast there
when you're explaining where to authorize it.
Are those steps included as part of this?
Say that again?
No, I'm saying the part where you're issuing us now,
whereby you're including as part of security.
So you're including authorize.
Yes.
Is this part of the steps that we have here or something on the side?
It's part of the steps.
Now, the question would be on number one,
there was creation of a user.
Right.
On number two, there was a part where you create a user and delete it.
Now, if you created a user and deleted it on the number one,
where you created the user or created a second user
and you deleted it on number two,
then you still need to recreate the user here.
Right.
So you do your MongoSH, you create your user,
and then you enable your security within the mongod.conf file.
Right.
And then restart your MongoDB,
and then you'll be able to use this.
Got it?
Okay.
Now, got it.
Okay.
Got it.
So for just one thing,
for number three there, so you're not doing that, the web console.
The web console takes a bit of time to install.
That's what I'm thinking about.
I would rather you guys can do it at your spare time,
whether it's later on in the week or whatever,
because you still have access to this virtual machine, I think,
for a few weeks, if I'm not mistaken.
If it's not a month or two months, I'm not mistaken.
But you can always do it on your own, on your virtual machine.
It takes a lot of time to set up.
That's why I sort of left it on the side.
That's why I said number three, don't really do it,
because it takes a bit of time.
That's number one.
Number two, Ops Manager is easy to install, but it's enterprise,
which means there is paying that's involved.
Now, when you then go to the community edition
and you want to install the open source of it,
they are not easy to install.
That's why I'm saying it takes some time.
Oh, okay.
How can I not take that?
On number four, the only one that you need to skip is this part,
the networking part.
Why I say networking or this part is because
when you use TLS and SSL authentication,
you need to use it amongst two nodes, right?
MongoDB nodes, where you've got,
it's not one machine like this.
It's not on localhost.
It's got different other machines available.
And you want to add that machine to the cluster.
That's when you can be able to use your TLS,
because then there's that secure communication
between the nodes in the cluster.
That's why I'm saying you don't really need to do it,
because you won't be able to see the effect of it,
if that makes sense.
Okay, I know that makes sense.
Yeah.
But the rest of the stuff, you should be able to do it.
So if you do number four,
and then when you go to number five,
it's creation of what you call these indexes.
Geospatial, you might not be able to get any results,
because you are not doing any location-based,
but it's to help for information purposes,
to know how it is done.
And then you can then be able to look at the rest of the stuff,
some optimal queries,
looking at that profiling,
analyzing the profile data,
and some advanced techniques in terms of covered indexes,
running cover query,
and that should be all.
And then after you're done with number four and number five,
you can then look at exercise day one.
I'm just going to use the bathroom quickly.
I'll be back in a sec.
Quick question, guys.
Are we supposed to install,
are we supposed to set up MongoDB Ubuntu?
Because I'm getting an error here.
Yes, you're supposed to,
but did you exit the database first?
If you're still showing in the city database,
you must exit the database first.
Oh, that will explain how complete it is.
Yeah, just say exit, and then you just do that.
Should be fine.
Oh, okay.
Yeah, because that's just getting some error.
Okay, exit.
Okay, I'm back.
Anyone with a question or getting errors?
Or breaking stuff?
Faith, you don't need to set up again MongoDB.
Don't worry about that.
Because it's already set up.
Okay, should I?
Yeah, you can go from number two.
So resetting up doesn't need,
you don't need to worry about that.
Okay, so we can skip the first-
The number one, yes, and then go to number two.
Okay.
Where you've got the authentication party.
Who is it?
Asanda, didn't I say I don't want to be disturbed?
About what?
Go and sleep.
Go and sleep, your time is slow.
Anybody else trying to exit the compliance group
after adding the security organization enabled?
I'm trying to write and quit.
I'm getting an error E212.
Let's have a look.
But after the security, right?
Enter, enter, let's see something.
Let me see where it should be.
I almost said it's fine, I didn't get that error.
So you're able to quit and write?
Please, please press enter.
Wait, are we supposed to unhash security?
Yes.
Oh, okay.
And then do escape quit without saving.
I want to see something quickly.
That's you, color fella.
Use the override, so use the Q and exclamation.
They're not pseudo.
Right.
Yeah, that's why it should be good to go now.
I'm getting a notification soon when I get to after restarting.
Who is that now?
It's cool fella.
Okay.
Let's see.
Okay, go, let's see.
You've authorized that.
That's fine.
Have you been running?
Have you been running MongoSH as a student all this while?
Okay, I did pseudo.
Yeah.
Did you just go up?
Just go up again.
And the status says everything is okay, correct?
So just do a status.
Okay, everything is fine.
Control C.
And then try and log in from here.
Let's see.
MongoSH minus U, something, something.
Then enter with the authentication failed.
Okay.
Just remove minus U admin minus P secure admin.
The password in that.
Just, yeah.
Remove that.
Even the user.
Yeah.
Remove the user.
Enter.
So there's probably something with your password.
I'm assuming so.
Okay.
Yeah.
So if you created a user from number one, I think the password is actually different.
The other password should be I think admin, admin one, two, three, if I'm not mistaken, on number one.
So if that user was still there, then it would definitely want to be able to log in.
But now because you're not specifying the password, then it's just going direct and just looking for the admin user.
Just a reminder, you said we can skip the network section, right?
Yeah, you can skip the networking part.
And then go on to number four and so on.
Yeah.
And continue.
I'm sorry.
So after skipping network, we go away.
We went to four.
We go to four, right?
Yes, number four.
Yeah, because I'm at number four there.
I've run the first one and just go through it again here.
I've run the first one here whereby I go to the terminal.
Yes.
Right.
I've run this one.
This one.
Where is it?
Yeah, this one.
I think this is an output.
So but it has a set as the output there.
And that output that has been set in it.
In said squeeze update that command that he used.
No.
So look out of your database.
Look out of your database.
Exit.
Just to exit.
Yeah.
No, I'm typing.
It's not showing.
OK.
Why?
So I just read it sometime.
Refresh it.
Disconnecting.
OK.
Refresh the page and then exit from there and then run it.
So what you can also do is remove the watch.
Well, it's just run mongo stats.
And that's it.
You don't need to point it anyway.
So as a start, just run mongo stats and that's it.
Yeah.
OK.
So let me try.
Let me refresh the page.
All right.
That's right.
Colorful, you're not supposed to run that mongo stats when you're in the database.
Please exit and then run it.
What's her name again?
Zandri, let me share the link.
You don't really need to log into GitHub.
OK.
Thanks.
Are you able to type?
Are you able to tap on mine here?
OK.
OK.
Just one second.
I'll get to you just now.
If Zandri, if you can just copy that and then paste it into the web browser for the virtual machine, then you should be good to go.
And let's see interactive.
Oopsie.
Where did I go now?
OK.
There we are.
Interactive and oops.
What's going on?
You broke your machine, my friend.
Oh, but you said you can break it.
Yeah, you can break it.
So you can always restart it.
Is it?
Yeah, we can restart it.
Just one second before we restart it.
Let's see.
And then so we can close this terminal basically.
Because I also don't know why it's hanging.
OK, so I was saying this, right?
Where is that now?
So I was saying mongoose.
Mongoose.
Mongoose.
Did you reboot this machine at some point?
No.
That's weird.
OK.
Cool.
When has it been running?
Yeah, it's been from 12.
Yeah, 12 or 8.
12 or 8, yeah.
But also remember, I think these servers are two hours ahead, if I'm not mistaken.
I think they use the UK time.
Yeah, but if that's the case, then it just restarted now, yeah.
It's two hours.
Which is very weird.
Is this the password?
Is that the right password?
Yeah, OK, cool.
It's the correct password.
So what was the issue there?
Why was it like that?
Why was it hanging?
Honestly, I don't know.
I don't know.
I can't tell you anything because I didn't do any troubleshooting to it.
But yeah, so you can see number of inserts, queries, update, delete, get more commands,
dirty, used, flashes, you know, and all that stuff.
The network incoming, network outgoing.
It is constant, so you can't really say you would worry that much about it.
Yeah, but that's basically how it will show.
Or you can do this and then five rows, one second each.
Oops.
I need to see the time.
I need to set two rows.
It's doing one second.
Two, four, six.
OK, cool.
So you can beat.
Oopsie.
So creation config, path to a configuration file, host, any SSL, any authentication,
KBros, URI, stats option.
So you've got minus all, field to show, or custom that.
Row count discover.
Discover nodes and display stats for all, if there are any.
Use HTTP instead of row DB connection.
All option now fields.
Output to JSON.
It must be interactive.
So let's say if we do dash dash all.
OK.
Not unless the same thing.
And then.
Discover.
To check if there are any nodes.
You're moving a little too fast now.
Sorry, I was just told.
Longest data authorization failed.
OK, I'm coming there.
Yeah, I'm coming.
So if let's say there was many nodes.
Right.
If it was a cluster, then it would show each and every cluster information about each and every cluster, if that makes sense.
Right.
So here, because it's just local host, it's just going to show us locals.
But let's say if there was three other servers, then it would list all those and be able to tell the stats of each of them.
If that makes sense.
Yeah.
Who was saying they are stuck somewhere?
The Mongo stats.
The credentials are failing.
So I'm getting an authorization failed.
OK.
All right.
Cool.
I see.
Faith got it wrong.
I tried to follow what you were helping.
I'm still.
All right.
That's fine.
Yet.
OK.
What?
I tried to move in the credentials.
OK.
Just hold on.
What you can also do is because we're not sure of your credentials.
Remember.
Oh, it's on.
It's on view only.
Let me change it to interactive.
Remember, there was a time that I said just use admin.
Just OK.
Where's my scroll?
Just do it like this.
OK.
Cool.
Because the password, right?
We're not sure if you use admin one or used secure admin.
We need to check that for the purpose of continue.
OK.
Why is it failing now?
Mongo starts that authentication failed.
Let me do this.
I'm going to switch off authentication first.
Hold on.
Let me just go back.                

on 2025-01-29

Visit the Mongo DB for Administrators course recordings page

2 videos