Does Java has ambiguous syntax which needs more information about an identifier?

java syntax list
java syntax cheat sheet
java syntax list pdf
java syntax examples
java syntax definition
java syntax guide
what does ?'' mean in java
context free grammar java code

NOTICE: This question is not about "Java do not have pointers"

In C language, the code identifier1 * identifier2 is ambiguous for two possible meaning:

  1. If the identifier1 is a type, then this might be a pointer declaration.
  2. If the identifier1 is a variable, then this might be a multiply statement.

The problem is that I cannot choose the right production when building the Syntax tree. I checked Clang's code and it seems that Clang has to put the type checking(by using a symbol table) to the parsing phase(correct me if I'm wrong).

Then I checked the code of javac(OpenJDK), it seems that on parsing phase, there's no semantic analysis involved. The parser can build an AST barely using the tokens.

So I'm curious if Java has the same ambiguous syntax problem? The problem that if the parser don't know an identifier's type, it can not choose the right production?

Or more generic, Does Java has syntax ambiguous that a parser cannot choose a production without other information more than a token stream?

Tokenization is always context sensitive, for languages. However Java does not have operators that are this sensitive. You can, however chain tokens in such a way, that it produces ambiguity, but not only as part of a larger syntactical statement:

A < B can be part of both public class A < B > { ... } or if (A < B) { ... }. The first is a generic class definition, the second is a comparison.

This is just the first example from the top of my hat, but I presume there are more. However, the operators are usually very narrowly defined, and cannot (as in C/C++-like languages) be overloaded. Also, other than in C/C++ there is only one accessor-operator (the dot: .), with one exception (since Java 8, the double-colon ::). In C++ there are a bunch, so it is much less chaotic.

To the specific question about whether Java is always syntactically decidable: Yes. A well-implemented compiler can always decide what token is present, depending on a token stream.

Java syntax, The syntax of Java refers to the set of rules defining how a Java program is written and The Java syntax has been gradually extended in the course of numerous major JDK releases, An identifier is the name of an element in the code. There can be more than one class with a main method, but the main class is always� Java letters and digits can be anything from the Unicode character set, which means characters in Chinese, Japanese, and other languages can be used. Spaces are not acceptable, so an underscore can be used instead. The length does not matter, so you can have a really long identifier if you choose.

I don't think so Java has this problem as Java is strongly typed. Also, Java does not support Pointers so there is no chance of the above issue. I hope this answer your question.

Is Java context free?, be a surprising question to most parser experts: is the syntax of Java a context free them as a true context-free grammar then that grammar is highly ambiguous. or as (identifier) “ifthen”, or as (identifier) “if” followed by (identifier) “then”, etc. That means it requires more than one token of lookahead to parse and (I� Because it's not a valid symbol in the language. Just as you can't do: int HELLO+*/\Variable; See 3.8.Identifiers in the Java specification: An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.

An expression like foo.bar.bla.i can not be parsed in a meaningfull way using the syntax alone. Each of foo, bar and bla can be either part of the package name, a static variable (this one does not apply to foo), or the name of a inner class.

Example:

public class Main {
    public static void main(String[] args) {
        System.out.println(foo.bar.bla.i);
    }
}

package foo;
public class bar {

    public static class bla {
        public static int i = 42;
    }

//  public static NotBla bla = new NotBla();
    public static class NotBla {
        public static int i = 21;
    }
}

This will print either 21 or 42 when the static variable bla is commented out or not.

if-else ambiguity, All that the Java compiler cares about is syntactical correctness A language allows ambiguous syntax is the same sentence can be read in more than one way Humans can often use additional information to resolve the ambiguity. ( E.g., we have additional Older programming languages does have ambiguous syntax. A Java class can implement multiple interfaces and each interface can have some variables with respective name. Now, suppose a class implements two interfaces but both have one or more variables of same type and same variable name.

Your question cannot be answered easily; this depends on the production rules you have. You say:

there's two production:
<pointer> ::= * {<type-qualifier>}* {<pointer>}?
or
<multiplicative-expression> ::= <multiplicative-expression> * <cast-expression>

But this is not the only possible parser!

With C when looking at

foo * bar;

which could either be a pointer called bar to type foo or the multiplication of foo with bar can be parsed to the token stream:

identifier_or_type ASTERISK identifier_or_type SEMICOLON

and the rest is up to the parser "business logic". So there is no ambiguity at parser level at all here, the logic behind the rule makes the difference between the two cases.

Chapter 6. Names, Every declaration that introduces a name has a scope (�6.3), which is the part of the In the absence of an access modifier, most declarations have package access, A non-generic type is not a raw type, despite the syntactic similarity. Package and module names that start with the identifier java are reserved for� java.awt.List java.util.List Both of these exist. You'll have to add the namespace in front to use one: java.util.List<String> cgxHist = new ArrayList<String>(); If you don't, it doesn't know how to interpret the List<T>: is it the awt one or util? Ergo: ambiguous.

[PDF] Answers to Selected Exercises, Some languages (C, Java) have only a remainder operation, some since more information must be kept in tables to keep track of the declarations. (It is the size of memory needed for an integer is not static (fixed prior to execution), and less use of predefined identifiers in the definition of the basic syntax of a language. For example “123geeks” is a not a valid java identifier. Java identifiers are case-sensitive. There is no limit on the length of the identifier but it is advisable to use an optimum length of 4 – 15 letters only. Reserved Words can’t be used as an identifier. For example “int while = 20;” is an invalid statement as while is a

Syntax and Grammars, A grammar is in BNF (Backus Naur Form) if each rule has exactly one nonterminal on the left A grammar is ambiguous if it generates a string that has two different parse trees. This can be more information than needed Grammar for Identifiers and Integers Languages without required brackets (ie Java, C) use else if. Readability of Java code is important because it means less time is spent trying to figure out what the code does, leaving more time to fix or modify it. To illustrate the point it's worth mentioning that most software companies will have a document that outlines the naming conventions they want their programmers to follow.

Errors and Warnings Reference Guide: List of the armcc error and , In ARM Compiler 5.02 and earlier, the IDs for the messages in the form 12 : parsing restarts here after previous syntax error A typical example of this is where a variable name has been used more than 318 : override of virtual < entity> is ambiguous 878 : Embedded C++ does not support run-time type information. Simple: Java is a simple language because its syntax is simple, clean, and easy to understand. Complex and ambiguous concepts of C++ are either eliminated or re-implemented in Java. For example, pointer and operator overloading are not used in Java. Object-Oriented: In Java, everything is in the form of the object. It means it has some data and

Comments
  • I don't quite understand the question : java doesn't have pointers, so there can't be an ambiguity here, since * is always multiplication.
  • I don't think so
  • @SanderDeDycker I think the OP is talking in general, not just about *. In other words, are there any symbols that can cause ambiguity while parsing the source that can only be solved by knowing the types used in the context.
  • Some operators are overloaded and may briefly confuse a programmer, e.g., var1 + var2 might be addition if var1 = 1 and var2 = 2 or it might be concatenation if var1 = "a" and var2 = "b". In mixed case - var1 = "a" and var2 = 2 the result is a string. However, the result of the + operator is based on the types involved and these are known at compile time, so there is no ambiguity. In the case of objects Long + Long produces a long. But Long + null will not compile unless you specify if it should be Long or String
  • @VLAZ But neither Java nor C support operator overloading?
  • In the template example, If I look ahead further, then I can checkout if this is a template declare or a compare statement, right? Can I think in this way: In Java, there's no ambiguous like this that even got the whole sentences, the parser still cannot choose a production?
  • You could think like this: In Java, there is no ambiguity of syntax, at least to my knowledge. It should always be decidable for the compiler, what kind of language element a token represents. However, there can be an ambiguity of semantics, if the compiler cannot decide on a method to be called, because two methods have ambiguous headers. This can happen with lambda-expressions and the ::-operator.
  • Nice point, but I think this is a scope priority problem, and no matter with or without the comment, the foo.bar.bla is just scopes in parser's level, right?
  • @reavenisadesk: I don't understand your point. Scoping (as in "Where does this "reference to x" really point to?") comes after parsing (i.e. the Abstract Syntax Tree has been built) and is indeed one solution to circumvent the problem. And that's exactly the answer to the question: You cannot parse correctly without additional information (for example from scoping). You cannot declare a Java-Grammar with rules like this: FullQualifiedClassName := (PackageName '.')? ClassName; PackageName := ID ('.' ID)*;.
  • I don't think so, by talking parsing, I mean to build an AST which all of it node is a certain production for sure. About what you mentioned, the parser still don't know which one to choose.
  • @reavenisadesk There is only one production here, what should it choose from?
  • No, there's two production, <pointer> ::= * {<type-qualifier>}* {<pointer>}? or <multiplicative-expression> ::= <multiplicative-expression> * <cast-expression>
  • @reavenisadesk My point is exactly that this does not have to be the parser. The rule in the answer above is an unambiguous parsing rule for both cases, the logic behind the rule makes the difference between the two cases. This removes the ambiguity from the parser level.
  • No, if you really write a parser, especially a ll(k), you wont just make "id * id" as a not for sure node, because in more generic situation, both a pointer declare or a multiply statement may has non-terminals and need a further parsing. I got you, you just point that "id * id" can be parse, but I don't think any one would think that leave this statement unknown is OK in parsing phase.