I have to parse a complicated string format. Is implementing an automaton a sensible approach? -
i struggling particularly obnoxious string format have parse. strings can contain substrings denote variable property has resolved. imagine "thisexamplestringcontainsa[variable_property]"
. also, these properties can arbitrarily nested , can have different meanings, dependending on context. if [variable_property]
in fact not valid name of variable (which of course has decided @ runtime), becomes normal part of entire string , remains unchanged , verbatim. followingly, there no invalid strings, number of opening square brackets not need match number of closing brackets! this]is[a[valid]]][exampletoo!
. there more rules, give idea.
so, @ moment unsure how approach this. first tries have ended in incredible mess of ifs , elses , noticed more , more solution should propably incorporate sort of state concept. now, thinking more , more using automaton this. however, have encountered automatons pure theoretical constructs. never came across actual implementation. furthermore, automatons traditionally used validate word, i.e. determining if belongs formally defined language. needless say, difficult me come formal definition of language.
how approach this? think implementing automaton sane approach? how model oo design point of view? project in c#, if makes difference. suggest entirely different?
/edit: description may have been bit misleading, here more details: problem me find properties in right order (from innermost outermost). once have identified next property resolve, actual substitution final value relatively easy.
let's take example above , 'll give step step example of should happen. full input string is: this]is[a[valid]]][exampletoo!
first closing bracket , last opening bracket normal characters, don't enclose anything. same goes characters not between matching bracket pair. leaves part [a[valid]]]
. innermost property has resolved first, [valid]
. brackets enclose property identifying string, valid
name of property resolve. let's say, string in fact identify property , gets replaced actual value, let's foo
. identifying string including brackets gets replaced, [valid]
becomes foo
. now, have @ [afoo]]
. let's pretend afoo
not identify property, leaves substring unchanged (including brackets). finally, second closing bracket after afoo
has no matching opening bracket , therefore character. after processing complete, entire string read: this]is[afoo]][exampletoo!
i hope example makes things bit more clear. please keep in mind, have simplified string format here! give idea, difficulties facing. don't expect working code, looking answers give me ideas on how approach problem. since parsing has done many thousands of strings solution must have reasonable performance.
how plain old recursion? seems fit here.
Comments
Post a Comment