Is it possible to implement ArangoDB sharding by database (rather than collection or shardKey)?

arangodb documentation
arangodb replication factor
arangodb foxx
arangodb cluster
arangodb in memory
arangodb architecture
arangodb storage engine

I have a large Arango instance with lots of databases - one for each project. Each projects database has a bunch of collections and a lot of data. The databases look something like

project1
project2
project3
...
project500

I'd like to distribute query load by sharding the instance so that each project database runs on a separate server, or spin up multiple large hosts and have Arango set things up automatically. However it seems like ArangoDB sharding only works at the collection level (for instance by record _key within a collection).

Is there any way to setup sharding by database? If not, are there any best practices for running/orchestrating multiple Arango instances?

Is it possible to implement ArangoDB sharding by database (rather , Sharding. ArangoDB organizes its collection data in shards. Sharding allows to use multiple machines to A shard key with only N distinct values can not be hashed onto more than N shards. It is also possible to specify multiple shardKeys . In a production environment you should always deploy your collections with a  3 In ArangoDB you can control by which attributes of your documents the sharding is done for a particular collection, it does not always have to be _key. If you have two collections, you can make them shard their documents by the same sharding attribute(s).

No. Sharding is implemented solely for the purpose of distributing documents of any collection over multiple database servers. This is a means, to implement memory as well as load balancing on ArangoDB clusters.

Cluster Sharding, shard key, then it is not possible to set the _key attribute when inserting a document, provided the collection has more than one shard. 4 Is it possible to implement ArangoDB sharding by database (rather than collection or shardKey)? Dec 24 '18 3 Docker: How to share files with two other images?

Arango can also be implemented using Kubernetes instead of Docker swarm (probably better).You could even create multiple server standalone instances if you really wanted to. Whichever the implementation technology though, I guess what the other answers are trying to indicate is that if you have multiple independent databases, you could, have multiple instances of ArangoDB (or any other DB for that matter). The only time you would want keep multiple DBs in one instance is if the DBs are small enough that they will not compete for the server's resources.

Dividing you current instance should be fairly straight forward as you can backup, restore and manipulate the different DBs independently. Sharding and other associated concepts like partitioning are meant for times where you have to keep all the data within a single database. In that case, one needs to find a way to divide the data in multiple servers while keeping it as a single unit. That does not appear to be the case for here.

If you want to find out more on how to use ArangoDb with Kubernetes, you can find the documentation here

Database Methods | Collections, Sharding with ArangoDB: Distributed, clustered databases are, as far as one How many shards should one choose, when creating collections in ArangoDB clusters? The replication factor is a rather easy one and is most readily Nearly everything is possible, if you can spend a little downtime later to  4 Clustering and replication in ArangoDB Feb 15 '18 4 Removing a column from a gsl_matrix Jun 28 '18 4 Is it possible to implement ArangoDB sharding by database (rather than collection or shardKey)?

Sharding: freedom, when you need it least?, To create a collection with higher replication factor than available DB-Servers collections are only allowed if the fields used to determine the shard key are also added and removed without more consideration than meeting the necessities  isSystem. return the database type db._isSystem() Returns whether the currently used database is the _system database. The system database has some special privileges and properties, for example, database management operations such as create or drop can only be executed from within this database.

Cluster | Administration | Manual, Replication and Sharding: set up the database in a master-slave database and use the web interface to create collections and documents. AQL is thus a little bit more complex than plain SQL at first, but offers much a normal ArangoDB cluster in both data centers and one or more ( arangosync ) shardKeys: ['_key'],​. The data contained in extra will be stored for the user but not be interpreted further by ArangoDB. Creates a new database. The response is a JSON object with the attribute result set to true. Note: creating a new database is only possible from within the _system database. Return codes. 201: is returned if the database was created successfully.

[PDF] ArangoDB v3.4.devel 08. Sep 2018 Documentation, Replication and Sharding: set up the database in a master-slave database and use the web interface to create collections and documents. AQL is thus a little bit more complex than plain SQL at first, but offers much a normal ArangoDB cluster in both data centers and one or more ( arangosync ) shardKeys:['_key'],. Databases, Collections and Documents. Databases are sets of collections. Collections store records, which are referred to as documents. Collections are the equivalent of tables in RDBMS, and documents can be thought of as rows in a table. The difference is that you don’t define what columns (or rather attributes) there will be in advance.

Comments
  • I appreciate the idea, but my experience with docker swarm has been less than positive.
  • no worries then, however, could you please share more about problem/s which you faced with docker swarm? reason is, that we're using it since it came out and ramping up platform on top of it, so if there's some dealbreaker which we didn't hit yet your insight could be lifesaving. thank you
  • We've been using it for our production systems, and we've been experiencing odd, unexpected behavior - networks randomly dropping out, containers "leaking" and slowly filling up the hard drive, dns not working, etc. Most of them could be fixed with a bit of attention, but it's annoying to have to maintain so much. The networking in general can also be a bit arcane, but maybe I'm just new to it also.
  • I agree, that there are some annoyances with setting everything up and it's not maintenance-free, but learning, continuous improvement, and maintenance is required by anything that you operate. What helped us to keep things simple, isolated and maintainable was dedicated swarm per tenant/project. Everything that needs communicate is on shared overlay network and exposed only through nginx proxy. All configs, secrets on swarm. All services either stateless or in failover, preferably cluster mode.
  • That makes sense. Are there any strategies that you think would work for this situation?