All tags in XML or HTML files

**pabloa** · 06-24-2011

Hi there

A nifty way of finding which tags are being used in a file (specially useful for XML files, where the tags can be anything) is using "grep" to get the tags, sort them (with "sort", what else?) and removing duplicates with "uniq":

Code:

pabloa:~$ grep -ohe "<[^/][^> ]*[ |>]" *.xml |sort|uniq 
<city>
<country>
<description 
<language>
<metadata 
<title>
<topic>
<value>
<?xml 
<year>

It's so useful that I'm going to do an alias for it. Here we are using a few very nice features of the "grep" command:

-o: output only the matching bit instead of the whole line
-h: don't output the file name where the pattern was found
-e: use a regular expression (it seems that this has to be the last flag of the three, otherwise it malfunctions)

The pattern used ("<[^/][^> ]*[ |>]") can be explained in words like this: "anything starting with a '<', followed by any character different than '/' (so we avoid closing tags), followed by anything which is not a space or a '>', up to (and including) a space or a '>'"

Improve at your leisure and enjoy at your pleasure!

Cheers.
P.

**James Dayton** · 06-24-2011

Pabloa,

Very neat script indeed! Thank you for sharing! XML massaging is unavoidable this days when doing software localization in order to make the CAT Tools to properly digest the wide variety of structures presented by XML. I will try to share my scripts too.

Best wishes,
James

Thread: All tags in XML or HTML files

Thread Tools

All tags in XML or HTML files

Re: All tags in XML or HTML files

Thread Information

Users Browsing this Thread

Similar Threads

Meta tags en HTML

Tags en Illustrator

Tags en Indesign

HTML entities in XML files

Which software application to use for .rc files and Online Help files

Posting Permissions