Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and beginning April 20th, 2021 (Eastern Time) the Yahoo Answers website will be in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.

Java Regular Expression - Nested Object?

I am trying to parse the below string using regular expression.

properties={

    Prop1={boolean=null, string=null, byte=null, propertyType=integer, integer=1},

    Prop2={boolean=null, string=null, byte=null, propertyType=integer, integer=1},

    Prop3={boolean=null, string=null, byte=null, propertyType=integer, integer=1},

    Prop4={boolean=null, string=null, byte=null, propertyType=integer, integer=1},

}

OUTPUT:

Group 1 = Prop1={boolean=null, string=null, byte=null, propertyType=integer, integer=1}

Group 2 = Prop2={boolean=null, string=null, byte=null, propertyType=integer, integer=1}

Group 3 = Prop3={boolean=null, string=null, byte=null, propertyType=integer, integer=1}

Group 4 = Prop4={boolean=null, string=null, byte=null, propertyType=integer, integer=1}

2 Answers

Relevance
  • 3 months ago

    vvvvvvvvvvvvvvvv

  • 6 months ago

    You can't use repetition effectively with capturing groups.  The simplest Java style RE I can see that works here just to get the details is:

        String regex =

            "\\s*properties=\\{" +  // leading section for "properties={"

            "\\s*(\\w+=\\{[^}]*\\})," + // accept "keyword={anything},"

            "\\s*(\\w+=\\{[^}]*\\})," + //  [repeat 4 times]

            "\\s*(\\w+=\\{[^}]*\\})," +

            "\\s*(\\w+=\\{[^}]*\\})," +

            "\\s*\\}" // trailing section to accept "}"

    That doesn't attempt to recognize the individual attributes for each property, and will only match exactly 4 properties.  It needs quite a bit of work to be useful.  Note that each \ escape in the regex needs to be doubled to \\ in the string literal.  The \s* accepts 0 or more spaces, tabs or newlines.  The \\w+ accepts 1 or more "word" characters (letters or digits, basically) that make up the property name.  The [^}]* pattern between \{ and \} matches 0 or more non-right-brace characters.

    You can test that with something like:

        Pattern pat = Pattern(regex);

        Matcher matcher = pat.matcher(input_string);

        if (matcher.matches()) {

            MatchResult m = matcher.toMatchResult();

            for (int g=1; g<=m.groupCount(); ++g) {

                System.out.println("Group " + g + " : " + m.group(g));

            }

        }

        else System.out.println("No match.");

    You'll need input_string to be a pasted copy of the string in your example.

    A regex isn't very good for parsing expressions with arbitrary repetition; and is extremely NOT good at parsing expressions with arbitrary nesting.  With only two levels, though, you could take each m.group(g) group string and use a different regex to parse out the individual attribute names and values for just one property.

    I think a top-down parser is about as easy to write, a whole lot easier to read and debug, and is *certainly* a lot more flexible. 

Still have questions? Get your answers by asking now.