Regexes
From Ggl's wiki
note: i tested regexes below with Python re module.
Hostnames are defined in the RFC952. A hostname is a list of labels separated by dots.
A simple regex that respects the requirements of the manpage hosts(5)"Host names may contain only alphanumeric characters, minus signs ("-"), and periods ("."). They must begin with an alphabetic character and end with an alphanumeric character."and the standards RFC952 and RFC1123 (allow a label to start with a digit) is:
^(?:[a-z0-9]{1,63}|[a-z0-9][-a-z0-9]{0,61}[a-z0-9])$
So nothing forbids hostnames like example---name or example.some---irrelevant--name.com.
The big picture is a list of names separated by dot ('.'). I start with this high-level regex, then define the inner regex, and finally integrate the inner regex into the high-level one.
Match a hostname:
'^([a-z0-9]+(?:\.?[a-z0-9]+)*$'
alphanumeric words separated by a dot ('.').
It is more than alphanumeric words, they are separated by a hyphen. Match hyphen ('-') separated words:
'^[a-z0-9]+(?:-?[a-z0-9]+)*$'
Final regex (i remove the begin '^' and end of line '$' characters):
'^([a-z0-9]+(?:-?[a-z0-9]+)*(?:\.?[a-z0-9]+(?:-?[a-z0-9]+)*)*)$'
note: it you need to use the grouping, just remove the silent grouping marks ':?' from the regex:
^([a-z0-9]+(-?[a-z0-9]+)*(\\.?([a-z0-9]+(-?[a-z0-9]+)*))*)$
I simplified the inner regex this way:
[a-z0-9]+(-?[a-z0-9]+)*
Should match:
example example-name example-name.com example.some-name.localdomain
However it is very cpu intensive on:
'example.some-name.com-'

