Superpower: match a string with parser only if it begins a line

.net parser
parser sprache
c# dsl parser
c# text parser
superpower nuget
external dsl c#

When parsing in superpower, how to match a string only if it is the first thing in a line?

For example, I need to match the A colon in "A: Hello Goodbye\n" but not in "Goodbye A: Hello\n"

Using your example here, I would change your ActorParser and NodeParser definitions to this:

public readonly static TokenListParser<Tokens, Node> ActorParser =
    from name in NameParser
    from colon in Token.EqualTo(Tokens.Colon)
    from text in TextParser
    select new Node {
        Actor = name + colon.ToStringValue(),
        Text = text
    };

public readonly static TokenListParser<Tokens, Node> NodeParser =
    from node in ActorParser.Try()
        .Or(TextParser.Select(text => new Node { Text = text }))
    select node;

I feel like there is a bug with Superpower, as I'm not sure why in the NodeParser I had to put a Try() on the first parser when chaining it with an Or(), but it would throw an error if I didn't add it.

Also, your validation when checking input[1] is incorrect (probably just a copy paste issue). It should be checking against "Goodbye A: Hello" and not "Hello A: Goodbye"

Superpower, When parsing in superpower, how to match a string only if it is the first thing in a line? For example, I need to match the A colon in "A: Hello Goodbye\n" but not in​  Superpower: match a string with parser only if it begins a line When parsing in superpower, how to match a string only if it is the first thing in a line? For example, I need to match the A colon in "A: Hello Goodbye " but not in "Goodbye A: Hello "

Unless RegexOptions.Multiline is set, ^ matches the beginning of a string regardless of whether it is at the beginning of a line.

You can probably use inline (?m) to turn on multiline:

static TextParser<Unit> Actor { get; } =
  from start in Span.Regex(@"(?m)^[A-Za-z][A-Za-z0-9_]+:")
  select Unit.Value;

Superpower: The parser combinator library [Part 2] 🤓, This post introduces a new library I'm working on called Superpower. Notice not only the whole mistyped word is shown, but that the parser A parser like identifier , when applied to some input, will match as much text as possible: a position from which the span starts ( 0 at the beginning of the string),  The idea here is to build a parser to parse out one content, then leveraging Superpower's built in parser called ManyDelimitedBy to kind of simulate a "split" on the whitespace in between the real content you're looking to parse out. This results in an array of "content" pieces.

I have actually done something similar, but I do not use a Tokenizer.

private static string _keyPlaceholder;

private static TextParser<MyClass> Actor { get; } =
    Span.Regex("^[A-Za-z][A-Za-z0-9_]*:")
        .Then(x =>
             {
                 _keyPlaceholder = x.ToStringValue();
                 return Character.AnyChar.Many();
             }
         ))
    .Select(value => new MyClass { Key = _keyPlaceholder, Value = new string(value) });

I have not tested this, just wrote it out by memory. The above parser should have the following:

myClass.Key = "A:"
myClass.Value = " Hello Goodbye"

Powering Javascript with New RegExp Superpowers, Superpower is the parser combinator library that will make you wish you that the definition is nicely readable and almost matches the definition of input string to tokens in a stage that is usually called “tokenization”, hence the name. There are only 2, but if we were parsing a complex language, we will  When a digit is encountered, we use a parser, here a pre-built one called Numerics.Integer, to match the input. Numerics.Integer is a character parser just like the ones implemented in Sprache. This means that if you’re migrating a Sprache parser to Superpower, individual parsers can be re-used as recognizers in a Superpower tokenizer.

Parsing Strings with split, The regular expression library in Javascript starts out with the regex match() It's worth noting that with global searches this array only captures the matching strings themselves. </ul>`; // template literals can span multiple lines with regular expressions is to use them to parse languages such as XML. Since Superpower is parser combinator it makes sense to start with defining partial parsers that we will combine into the final parser. Now the most tricky part of this discussion follows. Now the most tricky part of this discussion follows.

The Lonely Superpower, When we have a situation where strings contain multiple pieces of file on a line​-by-line basis), then we will need to parse (i.e., divide up) the string to extract (​regardless of how consecutive delimiters are handled): if the string starts with one a string containing several English sentences that uses only commas, periods,​  A token is a string, so the rule for a token is of type Parser<string>. AtLeastOnce() means one or more repetitions, and since TokenChar is a Parser<char>, it returns a Parser<IEnumerable<char>>. Text() combines the sequence of characters into a string, returning a Parser<string>. We're now able to parse a token.

datalust/superpower, Washington is blind to the fact that it no longer enjoys the dominance it had at the end of the Cold War. It must relearn the game of international politics as a major power, not a superpower, and make First, it has been substantially reconfigured along cultural and civilizational lines, as I There is now only one superpower. Python string method startswith() checks whether string starts with str, optionally restricting the matching with the given indices start and end. Syntax. Following is the syntax for startswith() method − str.startswith(str, beg=0,end=len(string)); Parameters. str − This is the string to be checked.

Comments
  • are you trying to parse multiple lines of text like "A: Hello Goodbye" ? And what is your expected output? Key/value pairs e.g. Key = "A" and Value = "Hello Goodbye" ? Also, do you expect "Goodbye A: Hello" to fail parsing?
  • I guess that depends if its the tokenizer or parser. If the tokenizer (which I think is the better solution), then I'd want anything that matches the above regex to be a token.
  • It really depends on your expected output. What data are you trying to extract out of this?
  • By way of context, each command in the language is a single line (ended by a line-break), and certain characters/strings have special meaning if they start the line, but not if they occur later. So if it happens in the parser, then it might return an Actor object which contains the string "A:", followed by a FreeText object which contains the string "Hello Goodbye". In the second case, the whole thing would be FreeText("Goodbye A: Hello") since the Actor parser would fail.
  • I think I understand, but to build a parser like this, you'd need to provide a more comprehensive example. Could you update the question to include that, along with the classes you'd want the output parsed into?
  • Thanks for the update. I'm accepting this though what I realize I really need is the tokenizer version, which I've posted here along with test-cases...
  • passing the RegexOptions.Multiline option doesn't fix the problem: Span.Regex(@"^[A-Za-z][A-Za-z0-9_]*:", RegexOptions.Multiline)
  • Hmmm -- if multiline doesn't solve it then most likely the Span you are receiving is a slice that's not what you think it is (doesn't correspond to a line). Try breaking on your code and inspect the span. If that doesn't solve your problem, then post a minimal working example that demonstrates the failure, so we can run it and help you sort out the problem.
  • Ok, so seems that if the line is "1 abc:" and Ignore(Span.WhiteSpace) is set, then the tokenizer consumes the first token ('1'), then ignores the white space as directed, then sees the "abc:" as starting from position 0, thus matching. But what I want is to only match "abc:" if it is the first token ... How to do this?
  • You can't do that from inside the tokenizer because it only sees the remainder after previous tokens have been processed. Probably it would help if you explained more of what you are trying to do at a higher level, with an example of the full input you are expecting and the exact behavior you want to accomplish. The act of tokenizing breaks the input into multiple tokens based on rules; if you want to select a particular token you would do so after the tokenizer is done.
  • Can you post a minimal demonstration program that compiles and executes to exhibit the behavior you are describing?