Hi there
A nifty way of finding which tags are being used in a file (specially useful for XML files, where the tags can be anything) is using "grep" to get the tags, sort them (with "sort", what else?) and removing duplicates with "uniq":
Code:
pabloa:~$ grep -ohe "<[^/][^> ]*[ |>]" *.xml |sort|uniq
<city>
<country>
<description
<language>
<metadata
<title>
<topic>
<value>
<?xml
<year>
It's so useful that I'm going to do an alias for it. Here we are using a few very nice features of the "grep" command:
- -o: output only the matching bit instead of the whole line
- -h: don't output the file name where the pattern was found
- -e: use a regular expression (it seems that this has to be the last flag of the three, otherwise it malfunctions)
The pattern used ("<[^/][^> ]*[ |>]") can be explained in words like this: "anything starting with a '<', followed by any character different than '/' (so we avoid closing tags), followed by anything which is not a space or a '>', up to (and including) a space or a '>'"
Improve at your leisure and enjoy at your pleasure!
Cheers.
P.