neo4j: Is there a way/how to select random nodes?

I would like to retrieve a specific number of random nodes. The graph consists of 3 000 000 nodes where some of them are sources, some are target and some are both.

The aim is to retrieve random sources and as I don't know how to select random, the program generates k random numbers from 1 to 3 000 000 which represent node IDs and then discards all randomly selected nodes that are not sources. As this procedure is time-consuming, I wonder whether it is possible to directly select random sources with cypher query.

In case to select all sources, the query would be the following

START t=node(*) MATCH (a)-[:LEADS_TO]->(t) RETURN a

Does anyone know how would it be possible to select the limited number of random nodes directly with a cypher or, if not possible, suggest any workaround?

You can limit your query with skip/limit so you could do

START t=node(*) 
MATCH (a)-[:LEADS_TO]->(t) 
RETURN a
SKIP {randomoffset} LIMIT {randomcount} 

Otherwise you can also create a set of random node-id's and pass them as parameter to the cypher statement.

Selecting random nodes, neo4j, I usually use rand() with a WHERE clause as a probability predicate. MATCH (n) WITH n WHERE rand() < 0.3 // for a 30% chance to include the  neo4j Nodes Deleted (but not Actually) neo4j,graph-databases. Depending on the amount of changes you need to choose an appropriate transaction size, otherwise you'll see excessive garbage collections and/or OOM exceptions. Use the LIMIT clause and return back the number of deleted nodes.

You can use such construction:

MATCH (a)-[:LEADS_TO]->(t) 
RETURN a, rand() as r
ORDER BY r

It should return you random set of object.

Tested with Neo4j 2.1.3

Returning a random subset of nodes (ideally repeatable), I'm trying to get a random subset of nodes returned (I'm downsampling my data here) and I would like it to be repeatable. Could there a way to  You can do LIMIT 1, but you'll always get the same thing. You can also get your entire result set and pick a random value on the app side, but if you've got a lot of results that's pretty inefficient over the wire, and it requires you to run any followup queries in a separate request.

Another way of the one suggested here, for case you want a random Start nodes with all there connections is:

MATCH (a)-[:LEADS_TO]->[]
WITH a,rand() AS rand
ORDER BY rand LIMIT {YourLimit}
MATCH (a)-[l:LEADS_TO]->(t)
RETURN a,l,t

cypher - neo4j: Is there a way/how to select random nodes? -, the aim retrieve random sources , don't know how select random, program generates k random numbers 1 3 000 000 represent node ids , discards randomly  Try Neo4j Online Explore and Learn Neo4j with the Neo4j Sandbox. Neo4j in the Cloud Deploy Neo4j on the cloud platform of your choice. Startup Program Kickstart your startup with Neo4j. Professional Services Build Enterprise-Strength with Neo4j Expertise. Subscriptions Get the best Neo4j Subscription for your organization.

MATCH (n:Label)
WITH n, rand() AS r
ORDER BY r
RETURN n LIMIT <no. of random nodes>

Extracting random subgraph from large graph · Issue #683 · neo4j , I would like to query for any small random subgraph (~100 nodes) graph with millions of nodes and edges imported to Neo4J database. You can use the path expansion procedures in apoc if you need more control (depth, filters, etc). As a final step you can use the apoc.export.csv to get the result into  Cypher is declarative, and so usually the query itself does not specify the algorithm to use to perform the search. Neo4j will automatically work out the best approach to finding start nodes and matching patterns. Predicates in WHERE parts can be evaluated before pattern matching, during pattern matching,

Graph Algorithms: Practical Examples in Apache Spark and Neo4j, Practical Examples in Apache Spark and Neo4j Mark Needham, Amy E. Hodler Rather than calculating the shortest path between every pair of nodes, the Two common strategies for selecting the subset of nodes are: Random Nodes are  Cypher is Neo4j’s graph query language that allows users to store and retrieve data from the graph database. Neo4j wanted to make querying graph data easy to learn, understand, and use for everyone, but also incorporate the power and functionality of other standard data access languages. This is what Cypher aims to accomplish.

how to get a random set of records from an index with cypher query , I suppose there is this example START x=node:node_auto_index("uname:*") RETURN x SKIP somerandomNumber LIMIT 10; Is there a better way that won't  Using BLOB data in Neo4j is one of the very few real anti-patterns for graph databases, in my opinion. If you have to deal with BLOB data, choose an appropriate store for that use case and use Neo4j to store the URL that points you to the binary data. The Dark Side of Hiding Aspects in Your Data Model Another pitfall to avoid is hiding aspects.

New Directions in Web Data Management 1, Does anyone know how would it be possible to select the limited number of random nodes directly with a cypher or, if not possible, suggest any  Explore the detailed capabilities and use cases for some of the world's most popular and powerful graph algorithms – including PageRank, Union-Find, Louvain modularity and betweenness centrality – and learn how they're optimized for efficient graph analytics within the Neo4j Graph Platform.

Comments
  • How many random nodes do you have to collect?
  • Thanks Michael! I have already created a random set of nodes but not all of randomly generated ids correspond to source nodes - some are just end-nodes. I'll apply your suggestion.
  • in this case the offset is random but would the set be contiguous in some way? i.e. if the randomcount was 100 would the 100 records be returned according to node ids or is it a random sorting with every call?
  • as far as I understand, it would. WDYT, Michael?
  • @MonkeyBonkey I just did a small test-set and yes, the results would be contiguous. For example if SKIP 10 LIMIT 3 gives you [10, 11, 12], then SKIP 10 LIMIT 2 will always give you [10, 11]
  • Nice, though for 3,000,000 nodes it might be slow as I think neo4j would load all of the nodes into memory to do the sort
  • Brilliant! This is more compact: MATCH (a)... RETURN a ORDER BY rand()
  • @BrianUnderwood, as of 2019 it seems that the optimiser somehow knows it doesn't need to pre-load all nodes, it takes a few ms on my laptop. I've seen similar syntaxes in SPARQL and SQL, working efficiently too.