One of the reasons we stress the value of consistent and predictable directory and filenaming conventions is that working in this way enables you to use the computer to select files based on the characteristics of their file names. So, for example, if you have a bunch of files where the first four digits are the year and you only want to do something with files from ‘2017’, then you can. Or if you have ‘journal’ somewhere in a filename when you have data about journals, you can use the computer to select just those files, then do something with them. Equally, using plain text formats means that you can go further and select files or elements of files based on characteristics of the data within those files.
A powerful means of doing this selecting based on file characteristics is to use regular expressions, often abbreviated to regex. A regular expression is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations. For those who have not met this term before, a string is a contiguous sequence of symbols or values, for example, a word, a date, a set of numbers, such as a phone numnber, or an alphanumeric value such as a repository identifier.
See Data Intro - Regular Expressions for the bulk of this teaching.