Issue on importing of data #1

andyhegedus · 2020-09-01T17:11:06Z

Hi,
Very interested in this data set. First attempt was to use provided cypher import file, but process never completed.
Second attempt, copy and paste snippets at a time from script. Generally worked until this point (all other steps have worked and completed is short order)

// Add Alternative titles for Occupations and Workrole
:auto USING PERIODIC COMMIT
LOAD CSV WITH HEADERS
FROM 'file:///AlternateTitles.txt' AS line FIELDTERMINATOR ' '
MATCH (a:Occupation {onet_soc_code: line.O*NET-SOC Code})
MERGE (t:AlternateTitles {title: line.Alternate Title,
shorttitle: line.Short Title, source: line.Source(s)})
WITH a, t, line
CALL apoc.create.relationship(a, 'Equivalent_To', {}, t) YIELD rel
RETURN count(rel)

This step never completes and after repeated tries I get a memory error. Any suggestions for fixes/workarounds?
Andy

The text was updated successfully, but these errors were encountered:

davidmeza1 · 2020-09-01T17:22:06Z

First, at this point, any code that starts with ":auto" has to be run by itself. Currently working on automating the script.
Some sections will require up to an hour to run, depending on your resources.
You should increase the heap size in the config file to a minimum of 1G and max of 3G. If you have sufficient resources, you can set it higher.

andyhegedus · 2020-09-01T17:38:37Z

Hi,
I have sufficient available RAM (40G) on this machine so I set the heap to 5G. Still does seem to complete. Some how I just find it very odd that this particular block is giving issue. ALL previous blocks issued very quickly and without complaint. Especially the block just before it which is very similar in commands and is importing the abilities.txt which is 4X the size. That completed in <5 sec.
Andy

davidmeza1 · 2020-09-01T17:45:35Z

That is odd. Others have used this as recent as yesterday with no issues. I will review this evening, if i can.

andyhegedus · 2020-09-01T18:02:01Z

One thing I did notice was there there was a node with label Alternate_Titiles (note underscore) that had been created earlier and this block was trying to merge to AlternateTitles (no-hypen) though I could figure where it was created.

andyhegedus · 2020-09-02T23:24:59Z

Hi,

I just ran the next block of your script and it ran fine:
Added 2292 labels, created 2292 nodes, set 6876 properties, started streaming 1 records after 2 ms and completed after 4605 ms. Note: It has created a node with label AlternateTitles which distinct from the earlier created ones Alternate_Titles.

LOAD CSV WITH HEADERS
FROM 'file:///AlternateTitles.txt' AS line FIELDTERMINATOR ' '
MATCH (a:Workrole {onet_soc_code: line.O*NET-SOC Code})
MERGE (t:AlternateTitles {title: line.Alternate Title,
shorttitle: line.Short Title, source: line.Source(s)})
WITH a, t, line
CALL apoc.create.relationship(a, 'Equivalent_To', {}, t) YIELD rel
RETURN count(rel)

I then try running this block without success.

LOAD CSV WITH HEADERS
FROM 'file:///AlternateTitles.txt' AS line FIELDTERMINATOR ' '
MATCH (a:Occupation {onet_soc_code: line.O*NET-SOC Code})
MERGE (t:AlternateTitles {title: line.Alternate Title,
shorttitle: line.Short Title, source: line.Source(s)})
WITH a, t, line
CALL apoc.create.relationship(a, 'Equivalent_To', {}, t) YIELD rel
RETURN count(rel)

davidmeza1 · 2020-09-03T15:43:58Z

I ran all the queries last night with no issues. The name difference Alternate_Titles and AlternateTitles should not cause an issue. The first is an element in the Taxonomy, the second is an actual alternate title for an occupation or workrole. I'll keep looking.

andyhegedus · 2020-09-04T22:38:00Z

Hi David,

I deleted the database and tried a second time with the same result. That step does not complete. I do notice in Activity monitor that Java Swells to 1.5G of memory while it is processing.

One thing I am curious on are the constraints Your cypher file lists only these three. Dis you create other constraints per chance?

// TODO need to add constraints, this is example only
CREATE CONSTRAINT ON (occupation:Occupation) ASSERT occupation.onet_soc_code IS UNIQUE;
CREATE CONSTRAINT ON (element:Element ) ASSERT element.ElementID IS UNIQUE;
CREATE CONSTRAINT ON (occupation:MajorGroup) ASSERT occupation.onet_soc_code IS UNIQUE;

davidmeza1 · 2020-09-08T18:38:49Z

I have not been able to recreate your error. I have run the scripts a couple of time and while it does take some time, it does not fail.
The increase in memory makes sense as this particular query creates many relationships. I am fine tuning the model I use, which should, in theory, reduce the number of relationships.
That is all the constraints I have at this time.

andyhegedus · 2020-09-08T21:42:34Z

I tried again after doing the update to 4.1.1 that was pushed out today. The problematic step was able to load albeit that single step took 20+ minutes to complete. The next block (which I actually ran first to 1 minute to complete).
Also as a side note the last part of your script has imports that I think are specific to your organization and the associated data files are not in the download (correct thing). You may look to deleting those steps from the import script.

andyhegedus changed the title ~~Issue on importing off data~~ Issue on importing of data Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on importing of data #1

Issue on importing of data #1

andyhegedus commented Sep 1, 2020

davidmeza1 commented Sep 1, 2020

andyhegedus commented Sep 1, 2020

davidmeza1 commented Sep 1, 2020

andyhegedus commented Sep 1, 2020

andyhegedus commented Sep 2, 2020

davidmeza1 commented Sep 3, 2020

andyhegedus commented Sep 4, 2020

davidmeza1 commented Sep 8, 2020

andyhegedus commented Sep 8, 2020

Issue on importing of data #1

Issue on importing of data #1

Comments

andyhegedus commented Sep 1, 2020

davidmeza1 commented Sep 1, 2020

andyhegedus commented Sep 1, 2020

davidmeza1 commented Sep 1, 2020

andyhegedus commented Sep 1, 2020

andyhegedus commented Sep 2, 2020

davidmeza1 commented Sep 3, 2020

andyhegedus commented Sep 4, 2020

davidmeza1 commented Sep 8, 2020

andyhegedus commented Sep 8, 2020