The basic question is, how do we read an entire graph from a Neo4j store into a NetworkX graph? And another question is, how do we extract subgraphs from Cypher and recreate them in NetworkX, to potentially save memory?

## Using a naive query to read all relationships

This is based on cypher-ipython module. This uses a simple query like the following to obtain all the data:

``````MATCH (n) OPTIONAL MATCH (n)-[r]->() RETURN n, r
``````

This can be read into a graph using the following code. Note that the rows may duplicate both relationships and nodes, but this is taken care of by the use of neo4j IDs.

``````def rs2graph(rs):
graph = networkx.MultiDiGraph()

for record in rs:
node = record['n']
if node:
nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = node.labels

relationship = record['r']
if relationship is not None:   # essential because relationships use hash val
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)

return graph
``````

There's something about this query that is rather inelegant, that is that the result set is essentially 'denormalized'.

## Using aggregation functions

Luckily there's another more SQL-ish way to do it, which is to COLLECT the relationships of each node into an array. This then returns lists which represent a distinct node and the complete set of relationships for that node, similar to something like the `ARRAY_AGG()` and `GROUP BY` combination in PostgreSQL. This seems much cleaner to me.

``````# this version expects a collection of rels in the variable 'rels'
# But, this version doesn't handle dangling references
def rs2graph_v2(rs):
graph = networkx.MultiDiGraph()

for record in rs:
node = record['n2']
if not node:
raise Exception('every row should have a node')

nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = list(node.labels)

relationship_list = record['rels']

for relationship in relationship_list:
relationship.start, relationship.end, key=relationship.type,
**relationship.properties
)

return graph
``````

## Trying to extend to handle subgraphs

When we have relationship types that define subtrees, which are labelled something like `:PRECEDES` in this case, we can attempt to materialize this sub-graph selected from a given root in memory. In the query below, the `Token` node with content `nonesuch` is taken as the root.

This version can be used with a Cypher query like the following:

``````MATCH (a:Token {content: "nonesuch"})-[:PRECEDES*]->(t:Token)
WITH COLLECT(a) + COLLECT(DISTINCT t) AS nodes_
UNWIND nodes_ AS n
OPTIONAL MATCH p = (n)-[r]-()
WITH n AS n2, COLLECT(DISTINCT RELATIONSHIPS(p)) AS nestedrel
RETURN n2, REDUCE(output = [], rel in nestedrel | output + rel) AS rels
``````

And the Python code to read the result of this query is as such:

``````# This version has to materialize the entire node set up front in order
# to check for dangling references.  This may induce memory problems in large
# result sets
def rs2graph_v3(rs):
graph = networkx.MultiDiGraph()

materialized_result_set = list(rs)
node_id_set = set([
record['n2'].id for record in materialized_result_set
])

for record in materialized_result_set:
node = record['n2']
if not node:
raise Exception('every row should have a node')

nx_properties = {}
nx_properties.update(node.properties)
nx_properties['labels'] = list(node.labels)

relationship_list = record['rels']

for relationship in relationship_list:

# Bear in mind that when we ask for all relationships on a node,
# we may find a node that PRECEDES the current node -- i.e. a node
# whose relationship starts outside the current subgraph returned
# by this query.
if relationship.start in node_id_set: