Disk storage has exploded in the last 40 years. These days, even a terabyte drive is considered small. There is one downside, though. The more stuff you have, the harder it is to find it. Linux provides numerous tools to find files when you can’t remember their name. Each has plusses and minuses, and choosing between them is often difficult.
Definitions
Different tools work differently to find files. There are several ways you might look for a file:
- Find a file if you know its name but not its location.
- Find a file when you know some part of its name.
- Find a file that contains something.
- Find a file with certain attributes (e.g., larger than 100 kB)
You might combine these, too. For example, it is reasonable to query all PDF files created in the last week that are larger than 100 kB.
There are plenty of different types of attributes. Some file systems support tags, too. So, you might have a PERSONAL tag to mark files that apply to you personally. Unfortunately, tool support for tags is somewhat lacking, as you’ll see later.
Another key point is how up-to-date your search results are. If you sift through terabytes of files for each search, that will be slow. If you keep an index, that’s fast, but the index will quickly be out of date. Do you periodically refresh the index? Do you watch the entire file system for changes and then update the index? Different tools do it differently.
Find
The most common tool is, in fact, no tool at all. The find command just does what you would do. It does directory listings and searches through them for whatever you want. The most common way to use the command is:
find . -name 'hackaday.txt' -print
You can probably leave off the -print as that’s the default action. However, find can do so many things like filter by dates, attributes, and even execute commands using the file names it finds, which can be dangerous.
There’s no index to build and store which is nice, but that also means it can be slow. If you do a find / you’ll get a search across the entire file system. However, find is fast for reasonable directory depths.
If you are lazy, you can ask a website to generate your find commands for you. If you want a faster, more modern find, try fd, which is called fd-find
on Ubuntu; you execute it with fdfind
.
Locate/Rlocate/Mlocate/Plocate
If you use find a lot on entire filesystems, you’ll eventually tire of waiting for it to search everywhere. What then? Well, you aren’t the first one to get tired of it, so back in the dawn of Unix, the locate command appeared. The idea is simple: Periodically the updatedb
command builds at least one index file then locate searches that index. You can create multiple indices, say one for user files, one for system files, or maybe one for a network drive produced on the network drive’s local machine.
There have been many improved versions of locate, although the latest appears to be plocate. If you want to use locate, you should probably use this version, which is very similar to the original. There are options to search without case comparison, for example. You can use regular expressions, limit the search to the file name (and not the path), and control the output format to some extent.
No matter what version you use, you should look at /etc/updatedb.conf and try to control the indexing process. For example, you might not want to index remote filesystems. Dropping the index for transient files like browser caches is also good.
Of course, locate and its sister commands can only find what you’ve indexed. If you index once a month, you will have trouble finding recent files. Of course, you can reissue the index command manually, but still. In addition, locate doesn’t look inside your files or help you with attribute searches.
There was a time when nearly every Linux system had some form of locate preinstalled. These days, many distros make you install it manually and have a GUI-based search as the default. If you want to use a GUI with locate-like tools, there are a few options. Krusader
, one of the KDE file managers, can perform locate searches. There is also catfish
. However, the GUIs often can’t handle all the options that locate provides.
Baloo
If you use KDE, then you certainly have seen Baloo. This is the default KDE file indexer. It is very powerful but also very intrusive. Early versions were infamous for chewing up huge amounts of resources while indexing large files. Worse, there were few ways to control what it was doing.
Honestly, I use Baloo, but I have a set of scripts that only allows it to index while my computer is idle and in the wee hours of the morning. Is that still necessary? I don’t know. I’m afraid to unleash Baloo on my system.
So why use Baloo? It integrates perfectly with KDE. It also indexes file system tags and, if you don’t turn it off, file contents. It uses KDE’s metadata extractors to look inside files like archives, for example.
You can use the baloosearch
: kio to get a search from many places inside KDE. Normally, you search the Baloo database from Dolphin or KRunner, but there are command line tools, too. The balooctl
program gives you some options for working with the database and the daemon. The baloosearch
tool lets you find files from the command line. The database can be large, so even a query can take a long time. Remember that Baloo indexes content, so you will sometimes see a result that doesn’t appear to match in the file name. That probably means the search string appears in the file. You can see more about what Baloo knows about a file using the balooshow program with the -x option.
The query language is very complete. For example, you can search for MP3 files from a particular album or images with a certain aspect ratio. You can also use operators like the less than or greater than sign.
You definitely want to configure Baloo. I’ve found that any remote file system or loop in the file system will bring it to its knees.
Recoll
Recoll is another file searcher that can either update its index periodically or watch the file system constantly. Like baloo, it can decode several file types natively and with external programs. It is actively developed and tries to dig through as much as possible (although indexing inside tar files is off by default).
As noted on the program’s homepage, Recoll will index an MS Word document stored as an attachment to an e-mail message inside a Thunderbird folder archived in a Zip file. Wow.
Other Programs
There are some other search programs that are either obscure or were popular at one time but are less popular today:
- Tracker and MetaTracker
- Beagle
- SearchMonkey
- Angry Search
- FSearch
Of course, there are doubtless many more. Do you use a program we missed? Let us know in the comments. An example of a remote file system you might to exclude from indexing? Hackaday. Want to build your own system? Be sure you know about incron and the file system watches.
Linux Fu: Where’s That Darn File?
Source: Manila Flash Report
0 Comments