When space (or parentheses) are required in C during compilation?

c preprocessor
preprocessor directives in c
defined in c
program to check balanced parentheses in c
macros in c
c preprocessor tricks
conditional preprocessor directives in c
define in c

I am learning how compilation works and my final goal is to write a mini C compiler. I am still at the beginning of this project. As I was working on the scanner and parser parts to build the AST, I realized that space is (or parentheses are) required in expressions like that i+ +4, i+(+4), i- -4, or i-(-4). Otherwise, in i--4 expression (for example), -- is interpreted as the unary operator -- and an error is raised. I understand perfectly the reason. This is not the question. The question is the following, Before, I though naively that spaces were not so important in C if only for concerns of code readability. But now, I wonder if there are another examples like theses described above ?

1. The C Preprocessor, All C comments are replaced with single spaces. However, there is no requirement for brackets or braces to balance, and they do not prevent� The C Preprocessor is not a part of the compiler, but is a separate step in the compilation process. In simple terms, a C Preprocessor is just a text substitution tool and it instructs the compiler to do required pre-processing before the actual compilation. We'll refer to the C Preprocessor as CPP.

I had to fix some old code and change

#define ALT_7      (0xfe+OFFSET)

to

#define ALT_7      (0xfe +OFFSET)

The reason being that 0xfe+OFFSET is a preprocessing number token and not three tokens as one might naively think. The old compiler parsed it as three, but the new one failed because it parsed it as one which was not a valid numerical constant.

There's likely more on the preprocessor side of things, but it's more obscure (as the whole subject of C/C++ preprocessing).

The C Preprocessor: Macros, If you put spaces between the macro name and the parentheses in the However, there is no requirement for square brackets or braces to� The C compilation process. To take a deep dive inside the C compilation process let’s compile a C program. Write or copy below C program and save it as compilation.c.

Does C++ compiler remove/optimize useless parentheses , So the upshot is: use parentheses as needed for correctness, and as + b)) + c )); // ok, same code int d = -a + (b + c); // undefined behaviour, different code The compilation process removes all of the brackets, spaces and� If you don't know what a pair of parentheses does in C, then read K&R. – user529758 Jan 27 '14 at 8:02 7 This question appears to be off-topic because it does not demonstrate any knowledge of the language being used. – user529758 Jan 27 '14 at 8:02

Lexers in most langugages are based on greedy regexes — a token is as long as it can be.

If ++ can be interpreted as the ++ operator (from left to right) it won't be lexed as two pluses. If inta can be interpreted as an identifier, it won't be interpreted as int followed by a, etc.

i+ +4 needs a whitespace, a parenthesis, or something in between the + and + or else the lexer will greedily consume it from left to right as ++.

Check for balanced parentheses in an expression, CPP program to check for balanced parenthesis. #include Store the top element in c. x = s.top();. s.pop(); Auxiliary Space: O(n) for stack. I don't doubt Cornstalks is right on that point. And you're dead right, while there is the odd mention of complexity requirements, the C and C++ standards for runtime behaviour, they don't tend to concentrate on performance of the compilation process itself. – paxdiablo Sep 9 '13 at 7:36

There are several parts to a C compiler, the question is: Which are you implementing?

The C preprocessor actually generates a token for whitespace and uses that to determine things. If you are implementing a combined preprocessor/compiler, you may want to do that tokenizing only once and then throw away the whitespace tokens before handing the token stream off to the compiler proper.

C itself seems to concern itself mostly with spaces, tabs and line breaks as indicators of the end of a token.

Beyond that, it also has the concept of one- or two-character operators, and seems to greedily match them. That is, - - gets turned into MINUS_TOKEN, MINUS_TOKEN, whereas -- no matter where, always gets turned into DECREMENT.

That is, your example of i--4 gives a parser error, as there is an extraneous 4 following the postfix-decrement operator.

So that proves that operators are matched greedily. Writing i - -4 OTOH works, because the greedy matching sees the space as an end for the first - token, and starts a new one, which then yields a second minus.

In summary, C itself ignores whitespace beyond the tokenizing phase, the preprocessor does not.

#define directive (C/C++), The #define directive causes the compiler to substitute token-string for each No spaces can separate identifier and the opening parenthesis. A second #define for a macro with the same name generates a warning unless� When writing in English (not a programming language or math), the rule is: put a space before the opening parenthesis, and either a space or a punctuation mark after the closing parenthesis. Do not put a space after the opening or before the closing parenthesis. In other words, there shouldn't be any space between the parentheses and what they enclose, but there should be spaces around the parenthetical clause.

Recommended C Style and Coding Standards, It describes a recommended coding standard for C programs. The editor may not have enough temp space to edit the file, compilations will go more slowly, etc. Some compilers and tools require certain suffix conventions for names of files [ 5]. Keywords that are followed by expressions in parentheses should be� Start studying CSE 240 Midterm. Learn vocabulary, terms, and more with flashcards, games, and other study tools.

C++ Core Guidelines, Use an up-to-date C++ compiler (currently C++17, C++14, or C++11) with a set of X x; x.ch = 'a'; x.s = string(n); // give x.s space for *p for (gsl::index i = 0; mixing bitwise logical operations with other operators in need of parentheses. During translation, one can even move parts of the address space of a process between disk and memory as needed (normally called swapping or paging). This allows the virtual address space of the process to be much larger than the physical memory available to it. Graphically, this dynamic relocation for a process is shown in Figure w.6.

C preprocessor, The C preprocessor or cpp is the macro preprocessor for the C, Objective-C and C++ computer In many C implementations, it is a separate program invoked by the compiler as the first angle brackets, the file is searched for in the standard compiler include paths. However, there is no requirement that this be observed .

Comments
  • Yes, for example you cannot write intmain or return0. It has to be int main / return 0 or int(main) / return(0).
  • I agree with these obvious examples.
  • This does not answer the question generally. It gives a single limited instance, and that is not enough to explain for C parsing generally, and it does not explain the instance (as by explaining how it fits into C parsing or is specified by the standard).
  • This is not a useful answer because it uses jargon without explaining it: "Lexers," "regexes," "hungry," "token." Also, it does not give any information that could resolve whether ++ "can be interpreted" as ++.
  • Thanks for answer, but it is not exactly the question. I wrote I understand why it is interpreted like that. The question concerns the existence of another examples that I should be aware of.
  • @EricPostpischil This question is tagged flex-lexer, so saying the term "lexer" needs to be explained is like saying that an answer to a Java question needs to first explain what Java is. I'd argue that the same goes for regular expressions (as they're a concept fundamental to flex). And "hungry" is explained in the same sentence where it's used: "a token is as long as it can be" (though I've never heard that term and think that it's probably a mistranslation of "greedy").