I see letters on a page, the letter a, this could be many different words, the letter after it is an n. This could be an, and, answer, any number of words, so I continue.
x86 and other machine code from that era worked this way, in particular the instruction sets that it was directly derived from.
First off and most important if you just take all the bytes of a program and jump into the middle this will not make any sense, it is very very easy to get off on the wrong foot "the quick brown fox" "thequickbrownfox" "ickbrow" what is that? The processor starts and continues based on the rules of the instruction set, the processor is fairly stupid it follows the rules as defined or at least documented in the processor manuals. So long as the programmer and tools have created a properly constructed program it will not get lost, if it does it is the fault of the programmer/tools not the processor. The processor will start to decode the opcode byte as the opcode byte. That byte could be a whole instruction or just a fraction based on the specific byte. If a fraction then the first byte plus the byte that follows it may determine the whole instruction or be a fraction.
CISC in particular the opcodes themselves and in part the next bytes may or may not contain bits that mean something relevant. In a RISC like mips or arm or others 0000 in a specific please means register 0, 0001 means register 1 and so on. But in some if not many CISC instructions there isnt a bit that distinguishes register x from register y, register a from register b. The whole of the opcode had to be looked up in a table to know what it meant.
x86 is a variable length instruction set, some instructions are one byte, no other operands, others need more bytes then maybe an immediate after that. Want to move the immediate value 0x12345678 into register EAX, without looking at any documentation going to say that is either a 5 or 6 byte instruction either an opcode that says load immediate into ax, or a byte that says load immediate and another byte that says this is ax then four bytes of the immediate.
Disassembly of section .text:
0: b8 78 56 34 12 mov eax,0x12345678
5: bb 78 56 34 12 mov ebx,0x12345678
a: b9 78 56 34 12 mov ecx,0x12345678
f: ba 78 56 34 12 mov edx,0x12345678
turns out to be 5 bytes. While possible that the bits of those bytes might decode directly into one of the four registers it is unlikely as that is not how those instruction sets were designed.
You may be over complicating this, and sadly the intel and other x86 docs are not as good as some other vendors. But its really just a flow chart, fairly easy to decode the first byte tells you if you are looking for another byte or not by its definition, the next byte indicates if you need to look further and so on. You do not decode x86 like you decode mips or arm or others that are designed differently. All of them have a decode that says look at these bits and determine the instruction or determine if I need more bits, but x86 does it one way, mips does it another, arm does it another. There are pros and cons to each.
CISC like x86 though is more of a flow chart, the first byte tells you to go to page X that page either has the whole answer or it says get the next byte and based on that go to page Y in appendix X.
Some houses have one occupant, the address/location takes you to one person. Some have more than one and once you get to the house based on the address, then you need further information to determine which person or pet is of interest to you. The first piece of information, the street address conforms to a standard, but the information to isolate the person/pet within that house conforms to a standard for that house. The first byte of an instruction is the opcode. But based on the opcode if there are additional bytes then what those bytes are are opcode specific as we saw above. b8 78 56 34 12 for 0xB8 the second byte is part of the immediate value. There are many you can look up where the second byte is further decoding of the instruction
0: 89 c0 mov eax,eax
2: 89 d8 mov eax,ebx
4: 89 c8 mov eax,ecx
6: 89 d0 mov eax,edx
for the 0x89 opcode then the second byte is not data in these cases but further define the instruction.
It is true that the second bytes decoding is not unique to only that opcode, many instructions will share the same decoding of those bits to for example determine ah,al,ax,eax,bh,bl,bx...etc. And that is documented in the intel documentation as well as countless other books and websites.
The true documentation is the source code to the chip itself, since we rarely have access to that we get documentation, which isnt usually written by the author of the logic, and then maybe polished off by a technical writer, at each step some info may be lost or left confusing. Some vendors are better than others, some versions of their documentation are better than others.
x86 is pretty much the last instruction set you want to learn, having one is not a valid reason, for every x86 you have, just inside that box there are many non-x86 processors, plus for every x86 you own you own quite a few, dozens, of non-x86 devices. And if education and learning is the goal, you want to start with a simulator anyway, greatly improves your chances of success, and crashes dont hurt nearly as much. There are much better instruction sets to start with like msp430 and pdp11 which was clearly what influenced it. arm, thumb, later getting into mips and its nuances, then of the 8 bits I wouldnt start with x86 I would go with something else 6502 or others. Then maybe if curious 8088/8086 using an emulator and the old docs on the internet way back machine, then lastly x86 as in 80386, 80486, and x86-64. Diving into x86-64 first has got to be all about pain, truly for folks into self abuse. If you still feel you have to do this the less painful path of this painful path is to start with 8088/8086 using old manuals and dosbox or bochs or a number of other emulators. Once you get the foundation then what they added in the step to 32 bit then 64 bit may make more sense and you dont have to be confused by the massive amount of protection added over time, you can start clean and pure.
Disassembly of variable length instruction sets is a huge problem to solve, and nobody has solved it because they cant completely. Not possible. I used to learn all new instruction sets by starting with a disassembler. These days I would probably do a simulator instead. The only way to have half a chance of success is to start at the valid entry point(s). And decode in execution order, not linearly through the binary. That will only expose some of the code. The remaining if any is data based and you can try to emulate, but that wont be perfect either. For one thing the data at disassembly time may change run time. You could even emulate the program and run it for days/weeks to discover various data values in various locations that a specific instruction is looking at and still not truly know all the possibilities. So some disassemblers simply get it wrong but show it to you as if it were right and others correctly, simply say I dont know what this is...
today the vast majority of the binaries are compiled, so the data paths are mostily sane and complete. But go get some roms from the stand up video game days, asteroids for example. you will see something that looks like this pseudocode:
a = 0
if(a == 0) goto somewhere
b = 7
we can easily see that the conditional branch is actually an unconditional, disassembly we would need to treat the instruction after the conditional branch as a possible execution path. But then what you find in that rom is that the instruction that follows is actual data then an instruction. a 1 represents the opcode byte a 2 and 3 represent additional bytes for that instruction, more pseudocode
1 a = 0;
1 if(a == 0) goto somewhere
1 b = 7.
But when we continue to decode all the supposedly valid execution paths we find that
1 b = 7.
3 <--- is a branch destination
that is an opcode byte not the latter bytes in an instruction, so now there is a conflict a good disassembler will tell you this. Then the human has to go examine these paths determine which one was valid the a=0.... path or the b = 7. assuming a = 0 and the conditional branch that follows was part of a valid disassembly then it would appear that that is really an unconditional branch and there are a couple of data bytes or fill or whatever then later on some code follows. This could have been intentional as was more common of the day to intentionally throw off a disassembler, or it could have been the result of hand hacking the binary rather than re-building a whole project and burning the roms. (go read up on I think it was defender, hacking the binary in the hotel room the night before the trade show then next day). Those bytes might have been other instructions that were hand modified to bypass a bug. The 6502 is a good starting place and a number of those game roms if you want to write a disassembler there are not as many instructions as say a z80 or 8088/8086 that by using second bytes multiplied the original potential of 256 instructions into a longer list. Early PIC or msp430 would be far easier as a first disassembler as they only have a dozen or two instructions. Msp430 has a debugged/supporrted gnu backend (the llvm one is not debugged nor supported, so avoid it) so you have easy to get at tools if learning instruction sets is of interest.
When you have a fixed instruction length like mips when the 16 bit one is not used or arm when the 16 bit thumb is not used. (AND the instruction set says the instructions have to be aligned (not risc-v)) You can linearly disassemble thruogh memory, some of the "instructions" you find make no sense or are undefined, but you just grind through, the human later will see those as data not instructions but the ones that are instructions will make sense. Unfortunately mips and arm have secondary instruction sets that decode completely differently and have different rules, so you cannot simply disassemble an arm binary either, for something compiler generated today you need to do it in execution order as well, you are far more likely to get most of the instructions decoded, but there will be some jump tables that dead end your efforts leaving chunks of code not properly disassembled.
So while wordy, the short answer is only trust the disassembler as far as you can throw it. And the instructions are pretty easy to decode if you go in execution order from a known to be valid entry point and look at the documentation for the processor.
Advanced Microprocessors And Peripherals, In 8085, an instruction (opcode and operand) is fetched, decoded and executed (one byte opcode instruction) and it is a part of opcode, in case of other If it is single opcode byte, the next bytes are treated as data bytes depending upon the Opcode 83says it can be 7 different instructions depending on a field called "register/opcode field" Each instruction's own docs, e.g. add(html extract of the vol2 manual), shows encodings likeREX.W + 83 /0 ibfor ADD r/m64, imm8, which is what you have. diagram of the ModRM bit fields from wiki.osdev.org 7 0