How to query nested fields in MongoDB using Presto

presto mongodb
presto mongodb schema
presto mongodb configuration
presto objectid

I'm setting up a Presto cluster which I'd like to use to query a MongoDB instance. Data in my Mongo instance has the following structure:

{
  _id: <value>
  somefield: <value>
  otherfield: <value>
  nesting_1: {
    nested_field_1_1: <value>
    nested_field_1_2: <value>
    ...
  }
  nesting_2: {
    nesting_2_1: {
      nested_field_2_1_1: <value>
      nested_field_2_1_2: <value>
      ...
    }
    nesting_2_2: {
      nested_field_2_2_1: <value>
      nested_field_2_2_2: <value>
      ...
    }
  }
}

Just by plugging it, Presto correctly identifies and creates columns for the values in the top level (e.g. somefield, otherfield) and in the first nesting level -- that is, it creates a column for nesting_1, and its content is a row(nested_field_1_1 <type>, nested_field_1_2 <type>, ...), and I can query table.nesting1.nested_field_1_1.

However, fields with an extra nesting layer (e.g. nesting_2 and everything within it) are missing from the table schema. Presto's documentation for the MongoDB connector does mention that:

At startup, this connector tries guessing fields’ types, but it might not be correct for your collection. In that case, you need to modify it manually. CREATE TABLE and CREATE TABLE AS SELECT will create an entry for you.

While that seems to explain my use case, it's not very clear on how to "modify it manually" -- a CREATE TABLE statement doesn't seem appropriate, as the table is already there. The documentation also has a section on how to declare fields and their types, but it's also not very clear on how to deal with multiple nesting levels.

My question is: how do I setup Presto's MongoDB connector so that I can query fields in the third nesting layer?

Answers can assume that:

  • all nested fields' names are known;
  • there are only 3 layers;
  • there is no need to preserve the layered table layout (i.e. I don't mind if my resulting Presto table has all nested fields as unique columns like somefield, rather than one field with rows like nesting_1 in the above example);
  • extra points if the solution doesn't require me to explicitly declare the names and types of all columns in the third layer, as I have over 1500 of them -- but this is not a hard requirement.

Unable to query Mongodb nested document using presto � Issue , From Presto cli, we are unable to query document based on shipping address zip . Does Presto-MongoDB connector support multi-level nested document? Going to close this issue because nested field is already supported. Builders < BsonDocument >. Filter. Eq (< field >, < value >) <value> is the document to match. To specify an equality condition on a field that is an embedded/nested document, use the query filter document { <field> => <value> } where <value> is the document to match.

If the mongo collection being queried does not have a fixed schema, indicated in the _schema collection, Presto is not able to infer the document structure.

If you prefer,the option is to explicitly declare the schema in the connector configuration, using field mongodb.schema-collection, as described in the documentation. You can set it to a different mongo collection which stores the same values, and create this collection directly.

Nested fields can be declared using the ROW data type, which is also described in the docs and behaves like what would be a struct or dictionary in other programming languages.

Query on Embedded/Nested Documents — MongoDB Manual, MongoDB Manual: How to query on embedded documents/nested documents/ subdocuments/nested fields. Query/select by embedded documents/nested� MongoDB query to aggregate nested array; MongoDB query to sort nested array? MongoDB query to update tag; Combine update and query parts to form the upserted document in MongoDB? MongoDB findOneAndUpdate() to update a single document; Perform nested document value search in MongoDB? MongoDB query to update selected fields; Query deeply nested

You can create a collection in mongodb, for example "presto_schema" in your database and insert sample schema like this

db.presto_schema.insertOne({
    "table" : "your_collection",
    "fields" : [
            {
                    "name" : "_id",
                    "type" : "ObjectId",
                    "hidden" : true
            },
            {
                    "name" : "last_name",
                    "type" : "varchar",
                    "hidden" : false
            },
            {
                    "name" : "id",
                    "type" : "varchar",
                    "hidden" : false
            }
    ]})

In your presto mongodb.properties, add the property like this:

    mongodb.schema-collection=presto_schema

From now, presto will use "presto_schema" instead of your default "_schema" to query.

Analyze Your Data Schema — MongoDB Compass stable, You can expand the field panel to see analyses of each of the nested fields. Using the query bar in the Schema tab, you can create a query filter to limit your� Project Fields to Return from Query¶. This page provides examples in: By default, queries in MongoDB return all fields in matching documents. To limit the amount of data that MongoDB sends to applications, you can include a projection document to specify or restrict fields to return.

Query an Array of Embedded Documents — MongoDB Manual, This page provides examples of query operations on an array of nested documents using the db.collection.find() method in the mongo shell. The examples on� You can type MongoDB filter documents into the query bar to display only documents which match the specified criteria. To learn more about querying documents, see Query Documents in the MongoDB manual.

Mapping MongoDB nested documents to SQL schema, Different ways of mapping MongoDB nested documents to relational schema, This document contains some base fields (“_id” and “Name”), as well as a to use familiar SQL queries to work with a NoSQL data source. Create Queries that Ensure Selectivity¶. Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.

How to extract keys in a nested json array object in Presto?, To do that, first I tried to make an ARRAY with the values of id by SELECT This thing not possible in directly mongo find query so you should use mongo� Specify the name of each field to add and set its value to an aggregation expression.For more information on expressions, see Expressions.

Comments
  • Even if the object is nested deeply, it should be mapped to the nested ROW type. Could you try to set nested ROW (e.g. row(b row(c varchar))) to type field?
  • I'm not sure how to do that, since no level of the nested field appears in the presto table. Do you mean setting up a schema manually in the mongodb.properties file?
  • If your mongodb.properties doesn't have mongodb.schema-collection, _schema collection will be created to hold metadata. We can edit the document directly.
  • I don't see a _schema collection, when you say "will be created" do you mean by presto or does that require action?
  • do you have an example on how to describe the schema of a nested field? is the nested field viewed as a separate table on presto?
  • Does the presto table name match with mongodb collection name?
  • Nevermind, I see it is generated as soon as a SELECT * FROM collection is run. It can be updated to define other columns or change types.
  • There's an example in the linked doc, under the "table definition" header. I'll edit my answer to include it
  • @rvazquezglez I added some details. I think the _schema collection is created by presto itself, but I'm not entirely sure -- I moved companies, so I'm not working with presto anymore, and I don't have it installed in my home PC for testing anymore. If you are facing this problem and want to submit a more detailed answer, I'll be happy to accept yours. I just answered and accepted because I was getting notifications about traffic in this question and didn't want to leave it unanswered.