Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File reader #209

Closed
julianhyde opened this issue Dec 9, 2023 · 1 comment
Closed

File reader #209

julianhyde opened this issue Dec 9, 2023 · 1 comment

Comments

@julianhyde
Copy link
Collaborator

julianhyde commented Dec 9, 2023

This facility adds a type-safe system to browse directories, sub-directories, and read files as lists of records.

Suppose I am in a directory that has a sub-directory data, which has a sub-directory scott, which has files bonus.csv, dept.csv, emp.csv.gz, salgrade.csv:

$ ls -lR data
data:
total 4
drwxrwxr-x 2 jhyde jhyde 4096 Dec  9 13:04 scott

data/scott:
total 20
-rw-rw-r-- 1 jhyde jhyde  50 Dec  9 13:02 bonus.csv
-rw-rw-r-- 1 jhyde jhyde 130 Dec  9 13:00 dept.csv
-rw-rw-r-- 1 jhyde jhyde 420 Dec  9 13:00 emp.csv.gz
-rw-rw-r-- 1 jhyde jhyde 127 Dec  9 13:03 salgrade.csv

I can access these from Morel using the file object. For example, here is the contents of the file data/scott/dept.csv:

./morel
$ file.data.scott.dept;
val it =
  [{deptno=10,dname="ACCOUNTING",loc="NEW YORK"},
   {deptno=20,dname="RESEARCH",loc="DALLAS"},
   {deptno=30,dname="SALES",loc="CHICAGO"},
   {deptno=40,dname="OPERATIONS",loc="BOSTON"}]
  : {deptno:int, dname:string, loc:string} list

Each file is a list of records (obtained by parsing the CSV format); each directory is a record, and its fields are its constituent files and sub-directories. Here is the directory data/scott:

$ file.data.scott;
val it =
  {bonus=<relation>,dept=<relation>,emp=<relation>,salgrade=<relation>}
  : {bonus:{comm:real, ename:string, job:string, sal:real} list,
    dept:{deptno:int, dname:string, loc:string} list,
    emp:{comm:real, deptno:int, empno:int, ename:string, hiredate:string,
      job:string, mgr:int, sal:real} list,
    salgrade:{grade:int, hisal:real, losal:real} list}

In addition, a directory has special fields .., ~, and /, which take you to the parent directory, user's home directory, and root directory. For example, file.data.`..`.data.scott is equivalent to file.data.scott.

The file value is the starting point for all navigation. It represents the current working directory.

Since Morel is a strongly-typed system, there is a problem that is most noticeable when browsing a large file system: we have traverse every directory, and parse every file, in order to report the type of the file value. We solve this by introducing a new type, called partial records. They work as follows.

When you first ask for the type of file, it reports a partial record:

$ file;
val it = {...}: {...}

Fields of a partial record are progressively discovered, on demand. When you have browsed into the data sub-directory, it has learned of a new field:

$ file.data;
val it = {...}: {...}
$ file;
val it =
  {data={...}, ...}
  : {data: {...}, ...}

When we have asked for the type of dept, we know yet more about file and file.data:

$ file.data.scott.dept;
val it =
  [{deptno=10,dname="ACCOUNTING",loc="NEW YORK"},
   {deptno=20,dname="RESEARCH",loc="DALLAS"},
   {deptno=30,dname="SALES",loc="CHICAGO"},
   {deptno=40,dname="OPERATIONS",loc="BOSTON"}]
  : {deptno:int, dname:string, loc:string} list
$ file;
val it =
  {data={scott={dept=<relation>, ...}, ...}
  : {data: {scott: {dept: {deptno:int, dname:string, loc:string} list, ...}, ...}

The knowledge of a type increases over time, as fields are discovered, but never decreases. The type system never forgets a field it has seen once.

@julianhyde
Copy link
Collaborator Author

I posted a demo: https://www.youtube.com/watch?v=uybUjCYsBKI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant