Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
f9c8ec4
finish the frontend refinement
bobbai00 Apr 28, 2025
1109d72
add computing unit id to the execution request
bobbai00 Apr 29, 2025
7a5a297
incorporate with cuid changes
bobbai00 May 7, 2025
b54e88e
finish the first version
bobbai00 May 9, 2025
c261665
add cuid polling
bobbai00 May 11, 2025
226ae19
recover the image names
bobbai00 May 12, 2025
1e6a605
recover the image names
bobbai00 May 12, 2025
d02f273
add the header
bobbai00 May 12, 2025
de42fc8
recover some unexpected changes
bobbai00 May 13, 2025
c1b81c0
add initial version of gui
bobbai00 Apr 14, 2025
390530b
add initial suggestion service
bobbai00 Apr 14, 2025
695ec24
add tuple and frontend service calling
bobbai00 Apr 14, 2025
70805ff
2nd version
bobbai00 Apr 16, 2025
13e5858
remove redundant providers code
bobbai00 Apr 16, 2025
53431b4
add test data
bobbai00 Apr 17, 2025
2a96b79
fix Texera workflow
bobbai00 Apr 17, 2025
bf7482a
keep improving
bobbai00 Apr 18, 2025
6303b8e
add initial
bobbai00 Apr 18, 2025
8b81eef
refactor the openai agent
bobbai00 Apr 21, 2025
6b4846a
add knowledge TODO
bobbai00 Apr 21, 2025
8c8e436
add config and more knowledge base
bobbai00 Apr 24, 2025
ae48e6b
finish end to end demo, TODO: fix path to add the operator ID
bobbai00 Apr 25, 2025
bf5e752
use pydantic to refactor
bobbai00 Apr 26, 2025
6343db3
reduce the trigger frequency
bobbai00 Apr 26, 2025
d862342
add sanitize request
bobbai00 Apr 27, 2025
a418e7e
finish the sanitization lifecycle
bobbai00 Apr 29, 2025
c88a175
refactor the frontend to align with latest backend
bobbai00 Apr 30, 2025
0e428b5
fix agent
bobbai00 Apr 30, 2025
8f5d7fe
fix frontend
bobbai00 Apr 30, 2025
8fb5630
fix request handling
bobbai00 Apr 30, 2025
823b132
fix request sending
bobbai00 Apr 30, 2025
8bc4335
prevent from requesting too much
bobbai00 Apr 30, 2025
95c7cae
refactor the openai agent to use function call
bobbai00 May 6, 2025
6431414
refactor with more capability
bobbai00 May 7, 2025
46e5f7f
sync the request format with current backend
bobbai00 May 15, 2025
7bb23c9
add the text area for operator property panel
bobbai00 May 15, 2025
dbbab8f
add more interaction to buttons
bobbai00 May 15, 2025
8d82684
make the reload work
bobbai00 May 15, 2025
774e176
make the viewpoint fixed
bobbai00 May 15, 2025
e63c6ab
add link id to the interpretation with more rigid sanitization
bobbai00 May 15, 2025
d977e93
fix the preview display
bobbai00 May 16, 2025
e9e50ce
add table profile related proto
bobbai00 May 17, 2025
881f3c7
add related proto definition in java
bobbai00 May 17, 2025
cd97bbd
generate python side definition
bobbai00 May 18, 2025
7204239
adjust the python side definition
bobbai00 May 18, 2025
78c89bb
add the profiler
bobbai00 May 18, 2025
75b6093
modify the profiler definition
bobbai00 May 18, 2025
6266a59
add profiler to the context
bobbai00 May 18, 2025
9e94367
update the profiler definition
bobbai00 May 18, 2025
c5df685
update the table profile definition
bobbai00 May 18, 2025
9dfbcc1
add the table profiler to the main loop
bobbai00 May 18, 2025
d137c76
add table profiler to the java side loop
bobbai00 May 18, 2025
eeec8e2
merge the controller message with table profiler query
bobbai00 May 19, 2025
3bc2757
add table profile as part of execution statistics
bobbai00 May 19, 2025
5e2a185
add table profile to the websocket event
bobbai00 May 19, 2025
b30463b
add python side proto generated file
bobbai00 May 19, 2025
c9447c3
add tableprofile to frontend handler
bobbai00 May 19, 2025
4b6a448
try to fix the stats handler
bobbai00 May 19, 2025
f34cd9c
fix execution service
bobbai00 May 19, 2025
2266e80
fix the control return
bobbai00 May 19, 2025
8ecebe5
fix the proto script
bobbai00 May 19, 2025
d5a3add
fix the equivalence comparison
bobbai00 May 19, 2025
aa6207c
fix the frontend retrieving
bobbai00 May 19, 2025
9fcf1c9
update the python proto
bobbai00 May 19, 2025
47180df
fix py response
bobbai00 May 19, 2025
a7cfaca
fix the profile generation
bobbai00 May 19, 2025
fa2c4f6
add the requirements.txt
bobbai00 May 20, 2025
6bd2e39
add the frontend
bobbai00 May 20, 2025
28a4886
update the yarn lock
bobbai00 May 20, 2025
a107abf
save the dependency
bobbai00 May 20, 2025
340df39
improve the stats display
bobbai00 May 20, 2025
1984ffd
use the output tuple as the profile
bobbai00 May 20, 2025
1b11775
add the python based scan op
bobbai00 May 20, 2025
1f6e116
fix the frontend display issue
bobbai00 May 20, 2025
baf65f7
make the schema work
bobbai00 May 20, 2025
f84b8fa
make the schema work
bobbai00 May 20, 2025
9e4da59
add suggestion service proto definition
bobbai00 May 20, 2025
a8225f1
add py side data cleaning endpoint
bobbai00 May 20, 2025
3b6a970
add frontend suggestion display framework
bobbai00 May 20, 2025
d002cc6
add initial prototype for suggestion generator
bobbai00 May 21, 2025
3aaae8b
finish initial prototype for data cleaning suggestion
bobbai00 May 21, 2025
efb7183
add the frontend proto ts generated file
bobbai00 May 21, 2025
d6c227a
move the column profile to left panel
bobbai00 May 21, 2025
f19a4d7
fix the style and add the detailed area
bobbai00 May 21, 2025
37970b7
add the schema passing
bobbai00 May 21, 2025
209a41a
extract out the suggestion action service
bobbai00 May 21, 2025
02773cc
add the schema passing to the suggestion service
bobbai00 May 21, 2025
e76f797
make the e2e work
bobbai00 May 22, 2025
c992073
have the e2e work
bobbai00 May 22, 2025
c579393
fix op type and operator icon
bobbai00 May 22, 2025
6e1e460
improve the performance of the csv reader
bobbai00 May 22, 2025
7d26d6f
rename operator file names
bobbai00 May 22, 2025
dae6d46
fix date type convertion issue
bobbai00 May 22, 2025
7ce41a1
make the sanitization on port better
bobbai00 May 22, 2025
f750148
temproraily disable the suggestion panel
bobbai00 May 22, 2025
c5ccf01
fix some issues of table profiling and data loading
bobbai00 May 22, 2025
ba264a2
make the copilot on udf work
bobbai00 May 22, 2025
4298f28
fix the left panel display
bobbai00 May 22, 2025
72ec71f
fix the datetime convertion
bobbai00 May 22, 2025
7a98e61
remove the table stats button
bobbai00 May 22, 2025
c014a53
improve the function call document
bobbai00 May 22, 2025
6a42e39
finishing up the demo frontend
bobbai00 May 24, 2025
b243f47
finish the instruction for agent
bobbai00 May 24, 2025
38270da
fix the frontend
bobbai00 Jun 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions core/amber/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,5 @@ tenacity==8.5.0
SQLAlchemy==2.0.37
pg8000==1.31.2
pympler==1.1
tensorflow==2.19.0
dataprofiler==0.13.3
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ message ControlRequest {
EmptyRequest emptyRequest = 56;
PrepareCheckpointRequest prepareCheckpointRequest = 57;
QueryStatisticsRequest queryStatisticsRequest = 58;
QueryTableProfileRequest queryTableProfileRequest = 59;

// request for testing
Ping ping = 100;
Expand Down Expand Up @@ -271,4 +272,8 @@ message PrepareCheckpointRequest{

message QueryStatisticsRequest{
repeated core.ActorVirtualIdentity filterByWorkers = 1;
}

message QueryTableProfileRequest{

}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ syntax = "proto3";
package edu.uci.ics.amber.engine.architecture.rpc;

import "edu/uci/ics/amber/engine/architecture/worker/statistics.proto";
import "edu/uci/ics/amber/engine/architecture/worker/tableprofile.proto";
import "scalapb/scalapb.proto";

option (scalapb.options) = {
Expand All @@ -43,6 +44,7 @@ message ControlReturn {
WorkerStateResponse workerStateResponse = 50;
WorkerMetricsResponse workerMetricsResponse = 51;
FinalizeCheckpointResponse finalizeCheckpointResponse = 52;
TableProfileResponse tableProfileResponse = 53;

// common responses
ControlError controlError = 101;
Expand Down Expand Up @@ -137,4 +139,8 @@ message WorkerStateResponse {

message WorkerMetricsResponse {
worker.WorkerMetrics metrics = 1 [(scalapb.field).no_box = true];
}

message TableProfileResponse {
worker.TableProfile table_profiles = 1 [(scalapb.field).no_box = true];
}
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ service WorkerService {
rpc PauseWorker(EmptyRequest) returns (WorkerStateResponse);
rpc PrepareCheckpoint(PrepareCheckpointRequest) returns (EmptyReturn);
rpc QueryStatistics(EmptyRequest) returns (WorkerMetricsResponse);
rpc QueryTableProfile(EmptyRequest) returns (TableProfileResponse);
rpc ResumeWorker(EmptyRequest) returns (WorkerStateResponse);
rpc RetrieveState(EmptyRequest) returns (EmptyReturn);
rpc RetryCurrentTuple(EmptyRequest) returns (EmptyReturn);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

syntax = "proto3";

package edu.uci.ics.amber.engine.architecture.worker;

import "scalapb/scalapb.proto";

option (scalapb.options) = {
scope: FILE,
preserve_unknown_fields: false,
no_default_values_in_constructor: true
};

/* ------------------------------------------------------------------ */
/* GENERIC MATRIX (row-major flat array) */
/* ------------------------------------------------------------------ */

message NumericMatrix {
repeated double values = 1; // length = rows * cols (row-major)
uint32 rows = 2;
uint32 cols = 3;
}

/* ------------------------------------------------------------------ */
/* GLOBAL PROFILE (dataset-wide) */
/* ------------------------------------------------------------------ */

message GlobalProfile {

// ---- basic counts ----
uint64 samples_used = 1; // how many rows were sampled
uint64 column_count = 2;
uint64 row_count = 3;

// ---- row null / uniqueness ----
double row_has_null_ratio = 4;
double row_is_null_ratio = 5;
double unique_row_ratio = 6;
uint64 duplicate_row_count = 7;

// ---- metadata ----
string file_type = 8; // “csv”, “parquet”, …
string encoding = 9; // e.g. “utf-8”

// ---- pairwise stats ----
NumericMatrix correlation_matrix = 10;
NumericMatrix chi2_matrix = 11;

// ---- schema map: column-name -> indices (to mirror DataProfiler) ---
map<string, ColumnIndexList> profile_schema = 12;

// ---- timing ----
message Times {
double row_stats_ms = 1;
}
Times times = 13;
}

// helper for profile_schema
message ColumnIndexList {
repeated uint32 indices = 1;
}

message ColumnStatistics {

// ---- numeric summaries (nullable when not applicable) ----
double min = 1;
double max = 2;
double median = 3;
double mean = 4;
double variance = 5;
double stddev = 6;
double skewness = 7;
double kurtosis = 8;
double sum = 9;

// ---- distribution ----
repeated double quantiles = 10; // e.g. [q0, q0.5, q1]
uint64 num_zeros = 11;
uint64 num_negatives = 12;

// ---- uniqueness / cardinality ----
uint64 unique_count = 13;
double unique_ratio = 14;

// ---- categorical helpers ----
bool categorical = 15;
map<string, uint64> categorical_count = 16;

// ---- nulls ----
uint64 null_count = 17;
repeated string null_types = 18;

// ---- data-type representation share (DataProfiler style) ----
map<string, double> data_type_representation = 19;
}

/* ------------------------------------------------------------------ */
/* FULL COLUMN PROFILE */
/* ------------------------------------------------------------------ */

message ColumnProfile {

// identity
string column_name = 1;
string data_type = 2; // “string”, “int”, “float”, …
string data_label = 3;
// quick hints
bool categorical = 4;
string order = 5; // “random”, “ascending”, “constant value”

// examples
repeated string samples = 6; // a few raw sample strings

// heavy stats
ColumnStatistics statistics = 7;
}

/* ------------------------------------------------------------------ */
/* TOP-LEVEL CONTAINER */
/* ------------------------------------------------------------------ */

message TableProfile {
GlobalProfile global_profile = 1;
repeated ColumnProfile column_profiles = 2;
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ package edu.uci.ics.amber.engine.common;
import "edu/uci/ics/amber/engine/architecture/rpc/controlcommands.proto";
import "edu/uci/ics/amber/engine/architecture/rpc/controlreturns.proto";
import "edu/uci/ics/amber/engine/architecture/worker/statistics.proto";
import "edu/uci/ics/amber/engine/architecture/worker/tableprofile.proto";
import "edu/uci/ics/amber/core/virtualidentity.proto";
import "edu/uci/ics/amber/core/workflowruntimestate.proto";
import "scalapb/scalapb.proto";
Expand Down Expand Up @@ -88,7 +89,8 @@ message ExecutionStatsStore {
int64 startTimeStamp = 1;
int64 endTimeStamp = 2;
map<string, OperatorMetrics> operator_info = 3;
repeated OperatorWorkerMapping operator_worker_mapping = 4;
map<string, architecture.worker.TableProfile> operator_table_profile = 4;
repeated OperatorWorkerMapping operator_worker_mapping = 5;
}


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from core.architecture.handlers.control.control_handler_base import ControlHandler
from proto.edu.uci.ics.amber.engine.architecture.rpc import (EmptyRequest, TableProfileResponse)


class QueryTableProfileHandler(ControlHandler):
async def query_table_profile(self, req: EmptyRequest) -> TableProfileResponse:
return TableProfileResponse(self.context.table_profile_manager.get_table_profile())
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from .debug_manager import DebugManager
from .exception_manager import ExceptionManager
from .state_processing_manager import StateProcessingManager
from .table_profile_manager import TableProfileManager
from .tuple_processing_manager import TupleProcessingManager
from .executor_manager import ExecutorManager
from .pause_manager import PauseManager
Expand Down Expand Up @@ -62,6 +63,7 @@ def __init__(self, worker_id, input_queue):
)

self.statistics_manager = StatisticsManager()
self.table_profile_manager = TableProfileManager()
self.pause_manager = PauseManager(
self.input_queue, state_manager=self.state_manager
)
Expand Down
Loading
Loading