Parsing custom data packets in an object oriented manner

what is packet parsing
data parsing example
packet parsing c code
parse tcp packet
parsing data
obstacles for data parsing
c++ packet parser
protocol parsing

I am currently developing some software in C++ where I am sending and receiving custom data packets. I want to parse and manage these packets in a well structured manner. Obviously I am first receiving the header and after that the body of the data. The main problem is that I don't like creating a Packet-Object with only the header information and later on adding the body data. What is an elegant way of parsing and storing custom data packets?

Here is a rough sketch of what such a custom data packet could look like:

+-------+---------+---------+----------+------+
| Magic | Command | Options | Bodysize | Body |
+-------+---------+---------+----------+------+

(Lets assume Magic is 4 bytes, Command 1 byte, Options 2 bytes, Bodysize 4 bytes and the body itself is variable in length.) How would I parse this without using any third party libraries?

Normally I'd say something like this could be done to store packet data:

#include <array>

class Packet {
public:

    explicit Packet(std::array<char, 10> headerbytes);

    void set_body(std::vector<char> data);
    std::vector<char> get_body();

    int8_t get_command();

    int16_t get_options();

    bool is_valid();

private:

    bool valid;

    int8_t _command;

    int16_t _options;

    int32_t body_size;

    std::vector<char> _data;

};

The problem is that I provide the header-information first and than add the body data in a hacky way later on. The packet object has a point of time where it is accessible in an incomplete state.

I first receive the header and after the header was received another receive call is made to read the body. Would it make sense to have a parser instance that populates information into the packet object only make it accessible once it holds all needed information? Would it make sense to have a separate class for the header and the body? What would be the best design choice?

I am developing with C++ and for the sending and receiving of data over sockets the boost library is used.

If you don’t want to tie the data reading into one complete constructor (for understandable reasons of separation of concerns), this is a good application for non-polymorphic inheritance:

struct Header {
  static constexpr SIZE=10;
  Header(std::array<char,SIZE>);

  std::int8_t get_command() const {return command;}
  std::int16_t get_options() const {return options;}
  std::int32_t body_size() const {return length;}

private:
  std::int8_t command;
  std::int16_t options;
  std::int32_t length;
};

struct Packet : private Header {
  using Body=std::vector<char>;
  Packet(const Header &h,Body b) : Header(h),body(std::move(b))
  {if(body.size()!=body_size()) throw …;}

  using Header::get_command;
  using Header::get_options;
  const Body& get_body() const {return body;}

private:
  Body body;
};

// For some suitable Stream class:
Header read1(Stream &s)
{return {s.read<Header::SIZE>()};}
Packet read2(const Header &h,Stream &s)
{return {h,s.read(h.body_size())};}
Packet read(Stream &s)
{return read2(read1(s),s);}

Note that the private inheritance prevents undefined behavior from deleting a Packet via a Header*, as well as the surely-unintended

const Packet p=read(s);
const Packet q=read2(p,s);   // same header?!

Composition would of course work as well, but might result in more adapter code in a full implementation.

If you were really optimizing, you could make a HeaderOnly without the body size and derive Header and Packet from that.

[PDF] Design Principles for Packet Parsers, All network devices must parse packet headers to decide how packets identifier included in the header identifies the type of data subsequent to or because a network operator wants a custom header. 1Virtual Extraction is driven by the. The Packet Parsing (PP) language treats packets in an object- oriented style, in order to provide a familiar model for software engineers. In a PP description, an object class is defined for each kind of packet header that is to be parsed.

Data Parsing, These types of exercises are crucial for identifying weaknesses in the way other An analyst who is proficient in programming will be able to develop custom detection Additionally, this person will often be very good at parsing large data sets. of packet filtering based upon rudimentary application-level data parsing. The traditional way of integrating an object-oriented back-end with an external system is through data transfer objects, which are serialized into JSON before going out and deserialized when

You can use exceptions to prevent creation of incomplete packet objects.

I'd use char pointers instead of vectors for performance.

// not intended to be inherited
class Packet final {
public:
    Packet(const char* data, unsigned int data_len) {
        if(data_len < header_len) {
            throw std::invalid_argument("data too small");
        }

        const char* dataIter = data;

        if(!check_validity(dataIter)) {
            throw std::invalid_argument("invalid magic word");
        }
        dataIter += sizeof(magic);
        memcpy(&command, dataIter, sizeof(command)); // can use cast & assignment, too
        dataIter += sizeof(command);
        memcpy(&options, dataIter, sizeof(options)); // can use cast & assignment, too
        dataIter += sizeof(options);
        memcpy(&body_size, dataIter, sizeof(body_size)); // can use cast & assignment, too
        dataIter += sizeof(body_size);

        if( data_len < body_size+header_len) {
            throw std::invalid_argument("data body too small");
        }

        body = new char[body_size];
        memcpy(body, dataIter, body_size);
    }

    ~Packet() {
        delete[] body;
    }

    int8_t get_command() const {
        return command;
    }

    int16_t get_options() const {
        return options;
    }

    int32_t get_body_size() const {
        return body_size;
    }

    const char* get_body() const {
        return body;
    }

private:
    // assumes len enough, may add param in_len for robustness
    static bool check_validity(const char* in_magic) {
        return ( 0 == memcmp(magic, in_magic, sizeof(magic)) );
    }

    constexpr static char magic[] = {'a','b','c','d'};
    int8_t command;
    int16_t options;
    int32_t body_size;
    char* body;

    constexpr static unsigned int header_len = sizeof(magic) + sizeof(command)
            + sizeof(options) + sizeof(body_size);
};

Note: this is my first post in SO, so please let me know if something's wrong with the post, thanks.

[PDF] Using Java to Teach Networking Concepts With a , NetSpy system and the way we use this in Networking class. Network data object orientation: For intuitive analysis, the of custom-built, programmable analysis tools to examine NetSpy can perform the task of packet parsing for the user. PX is a declarative language with object-oriented semantics. A customized computing architecture is generated to match the exact requirements expressed in the PX description. The architecture

I'm guessing you are trying Object-oriented networking. If so, the best solution for such parsing would be Flatbuffers or Cap’n Proto C++ code generator. By defining a schema, you will get state machine code that will parse the packets in an efficient and safe way.

[PDF] Using Java to Teach Networking Concepts With a Programmable , NetSpy system and the way we use this in Networking class. Network data object orientation: For intuitive analysis, the of custom-built, programmable analysis tools to examine NetSpy can perform the task of packet parsing for the user. In an object-oriented design using UML, _____ class diagrams are used to represent classes and their basic relationships, whereas _____ diagrams are used to represent communication between objects. class

[PDF] Parsifal: A Pragmatic Solution to the Binary Parsing Problems, significant amount of data with custom tools, to gain thorough insight of and object-oriented programming; its goal was to be flexible the new, simpler, way to create a chunk: Scapy's goal was not only to parse network packets, but also . Custom Types OrientDB supports custom types for vertices and edges in an Object Oriented manner. This feature is not supported directly through Blueprints, but there is a way to implement them. If you want to create a schema to work with custom types, see Graph Schema.

Serialization, In computing, serialization (US spelling) or serialisation (UK spelling) is the process of Serialization of object-oriented objects does not include any of their associated Serializing the data structure in an architecture-independent format means preventing As such, it is usually trivial to write custom serialization functions. Object-Oriented Programming in Python¶. Contents: Introduction. What is a computer? History of computers; Programming a computer

[PDF] Nail: A practical tool for parsing and generating data - People, data formats and generate them from the internal object model, by data format and the object model. Third, Nail the raw data and interact with dependent fields in a controlled manner. security-focused parser framework for binary protocols. Figure 2: Nail grammar for DNS packets, used by our prototype DNS server. NetSpy can perform the task of packet parsing for the user through object-oriented design. NetSpy's packet classes are designed so that each header packet takes a packet object of the immediately lower protocol as a constructor argument. In this case, class IP would take a Data Link Layer Header

Comments
  • "I don't like creating a Packet-Object with only the header information and later on adding the body data" - Why not?
  • Have you considered using an existing widely used serialisation system such as Google protobuf, rather than reinventing the wheel?
  • @Jesper I think this would be bad design because the object can be accessed in an incomplete state. I am not sure what is common practice neither do I know what would be the best design choice for this. I am thinking of some packet processor that performs parsing and a packet data container that is filled and only accessible once populated with all data
  • @rici Yes I have, I prefer to do it myself. Not because this is necessarily better but because it helps me to deepen my understanding on how these things work and how they are designed.
  • @JesperJuhl I updated my post and added some more information.
  • Interesting approach. A few questions remain: Wouldn't it be better practice to make a separate struct for the Body as well? Also why did you choose structs over a class? Furthermore I was thinking if it might be conceivable to make the Packet class itself an interface? Maybe something like an IPacket which other classes can inherit? What do you think about this?
  • @Kyu96: What purpose would a struct containing just the Body serve? It doesn’t abstract over anything. As for struct vs. class, they are exactly the same except for default access control. You need an interface iff you need dynamic polymorphism, which your problem statement doesn’t suggest.
  • That sounds like a very interesting approach, maybe you can provide a rough example for my scenario? Would you make header parsing, body parsing two pipeline steps?
  • How does the client learn how much data to send to the Manager?
  • @Davis Herring: The Manager is a state machine parsing the content of packets. It receives at a time an entire packet or a chunk of packet. It deals with fragmented packets.
  • @Kyu96: It's quite simple changing it to have only 2 steps, header and body.
  • @Flaviu: Then how does the client receive those packets if, say, one read is long enough to contain several?