Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Test Pipeline #120

Merged
merged 39 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
daa68c8
Abhinay1997 Apr 20, 2024
e2e5632
Merge branch 'main' into wer_utils
Abhinay1997 Apr 22, 2024
6af9f5a
Add basic Fraction type to handle Number normalization
Abhinay1997 May 2, 2024
87c230a
Add EnglishNumberNormalizer
Abhinay1997 May 2, 2024
d8cda9f
Merge branch 'main' into wer_utils
Abhinay1997 May 2, 2024
b8c30fe
Adds Basic Fraction type for WER
Abhinay1997 May 4, 2024
06e66e4
Refactor + Add english normalizers
Abhinay1997 May 4, 2024
3334d44
Bug fixes in number normalization. regex, multiplier processing.
Abhinay1997 May 8, 2024
da3a719
wer evaluate function + string optimization
Abhinay1997 May 10, 2024
acb80ff
Add wer test on long audio
Abhinay1997 May 10, 2024
dbbf9bf
Remove Wagner-Fischer, fix normalization bugs.
Abhinay1997 May 28, 2024
16a5525
Hirschberg's LCS Algorithm for edit operations
Abhinay1997 May 28, 2024
70456b3
Remove warnings in Fraction implementation
Abhinay1997 May 28, 2024
a3c94cc
Add tests
Abhinay1997 May 28, 2024
b7e52fa
Merge branch 'main' into wer_utils
Abhinay1997 May 28, 2024
60f8956
Refactoring
Abhinay1997 May 29, 2024
89df136
Refactor regression tests
Abhinay1997 Jun 11, 2024
ad13284
Add WER to regression test results, fix overflow
Abhinay1997 Jun 11, 2024
47be844
clean up files
Abhinay1997 Jun 11, 2024
bf46309
Merge branch 'main' into wer_utils
Abhinay1997 Jun 11, 2024
6296506
patch overflow for now.
Abhinay1997 Jun 11, 2024
6a28fc1
Re-add file needed for tests
Abhinay1997 Jun 12, 2024
26bb7c6
Fix xcode test attachment
ZachNagengast Jul 28, 2024
01baf7b
Fix overflow when using Int.
Abhinay1997 Aug 2, 2024
cca6f50
Add flag to run only on first audio file of the dataset
Abhinay1997 Aug 2, 2024
3fceef3
Abhinay1997 Aug 6, 2024
ad4c7f5
PR Clenup:
Abhinay1997 Aug 6, 2024
74ad9be
Merge branch 'main' into wer_utils
ZachNagengast Aug 6, 2024
525657b
Adds system memory, disk space and battery level tracking.
Abhinay1997 Aug 12, 2024
83ffc3f
Remove sample JSON
Abhinay1997 Aug 12, 2024
a8d6e27
Merge branch 'main' into wer_utils
Abhinay1997 Aug 12, 2024
2f3be51
Fix compilation on non macOS
Abhinay1997 Aug 12, 2024
d9bc43b
Fix battery checks for watchOS
Abhinay1997 Aug 12, 2024
c99bd94
Fix imports
Abhinay1997 Aug 12, 2024
f2e2fac
Merge branch 'main' into regression-test-automations
ZachNagengast Oct 24, 2024
6962d0d
Regression test automations
ZachNagengast Oct 24, 2024
a74a592
Add modelSizeMB to testInfo
ZachNagengast Oct 25, 2024
039003e
Cleanup for merge
ZachNagengast Oct 27, 2024
442eb24
Upgrade example app
ZachNagengast Oct 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.DS_Store
/.build
/Packages
.vscode/
xcuserdata/
DerivedData/
.swiftpm/configuration/registries.json
Expand Down Expand Up @@ -56,8 +57,11 @@ fastlane/report.xml
fastlane/Preview.html
fastlane/screenshots
fastlane/test_output
fastlane/benchmark_data
fastlane/upload_folder

### Xcode Patch ###
**/*.xcconfig
*.xcodeproj/*
!*.xcodeproj/project.pbxproj
!*.xcodeproj/xcshareddata/
Expand Down
120 changes: 120 additions & 0 deletions BENCHMARKS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# WhisperKit Benchmarks

This document describes how to run the benchmarks for WhisperKit. The benchmarks can be run on a specific device or all connected devices. The results are saved in JSON files and can be uploaded to the [argmaxinc/whisperkit-evals-dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset) dataset on HuggingFace as a pull request. Below are the steps to run the benchmarks locally in order to reproduce the results shown in our [WhisperKit Benchmarks](https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks) space.

## Download the Source

To download the code to run the test suite, run:

```sh
git clone [email protected]:argmaxinc/WhisperKit.git
```

## Local Environment

Before running the benchmarks, you'll need to set up your local environment with the necessary dependencies. To do this, run:

```sh
make setup
```

See [Contributing](CONTRIBUTING.md) for more information.


## Xcode Environment

When running the tests, the model to test needs is provided to the Xcode from Fastlane as an environment variable:

1. Open the example project:

```sh
xed Examples/WhisperAX
```

2. At the top, you will see the app icon and `WhisperAX` written next to it. Click on `WhisperAX` and select `Edit Scheme` at the bottom.

3. Under `Environment Variables`, you will see an entry with `MODEL_NAME` as the name and `$(MODEL_NAME)` as the value.

## Devices

> [!IMPORTANT]
> An active developer account is required to run the tests on physical devices.

Before running tests, all external devices need to be connected and paired to your Mac, as well as registered with your developer account. Ensure the devices are in Developer Mode. If nothing appears after connecting the devices via cable, press `Command + Shift + 2` to open the list of devices and track their progress.

## Datasets

The datasets for the test suite can be set in a global array called `datasets` in the file [`Tests/WhisperKitTests/RegressionTests.swift`](Tests/WhisperKitTests/RegressionTests.swift). It is prefilled with the datasets that are currently available.

## Models

The models for the test suite can be set in the [`Fastfile`](fastlane/Fastfile). Simply find `BENCHMARK_CONFIGS` and modify the `models` array under the benchmark you want to run.

## Makefile and Fastlane

The tests are run using [Fastlane](fastlane/Fastfile), which is controlled by a [Makefile](Makefile). The Makefile contains the following commands:

### List Connected Devices

Before running the tests it might be a good idea to list the connected devices to resolve any connection issues. Simply run:

```sh
make list-devices
```

The output will be a list with entries that look something like this:

```ruby
{
:name=>"My Mac",
:type=>"Apple M2 Pro",
:platform=>"macOS",
:os_version=>"15.0.1",
:product=>"Mac14,12",
:id=>"XXXXXXXX-1234-5678-9012-XXXXXXXXXXXX",
:state=>"connected"
}
```

Verify that the devices are connected and the state is `connected`.

### Running Benchmarks

After completing the above steps, you can run the tests. Note that there are two different test configurations: one named `full` and the other named `debug`. To check for potential errors, run the `debug` tests:

```sh
make benchmark-devices DEBUG=true
```

Otherwise run the `full` tests:

```sh
make benchmark-devices
```

Optionally, for both tests, you can specify the list of devices for the tests using the `DEVICES` option:

```sh
make benchmark-devices DEVICES="iPhone 15 Pro Max,My Mac"
```

The `DEVICES` option is a comma-separated list of device names. The device names can be found by running `make list-devices` and using the value for the `:name` key.

### Results

After the tests are run, the generated results can be found under `fastlane/benchmark_data` including the .xcresult file with logs and attachments for each device. There will also be a folder called `fastlane/upload_folder/benchmark_data` that contains only the JSON results in `fastlane/benchmark_data` that can used for further analysis.

We will periodically run these tests on a range of devices and upload the results to the [argmaxinc/whisperkit-evals-dataset](https://huggingface.co/datasets/argmaxinc/whisperkit-evals-dataset), which will propagate to the [WhisperKit Benchmarks](https://huggingface.co/spaces/argmaxinc/whisperkit-benchmarks) space and be available for comparison.


# Troubleshooting


If you encounter issues while running the tests, heres a few things to try:

1. Open the project in Xcode and run the tests directly from there.
1. To do this, open the example app (from command line type: `xed Examples/WhisperAX`) and run the test named `RegressionTests/testModelPerformanceWithDebugConfig` from the test navigator.
2. If the tests run successfully, you can rule out any issues with the device or the models.
3. If they dont run successfully, Xcode will provide more detailed error messages.
2. Try specifying a single device to run the tests on. This can be done by running `make list-devices` and then running the tests with the `DEVICES` option set to the name of the device you want to test on. For example, `make benchmark-devices DEVICES="My Mac"`. This will also enable you to see the logs for that specific device.
3. If you are still encountering issues, please reach out to us on the [Discord](https://discord.gg/G5F5GZGecC) or create an [issue](https://github.com/argmaxinc/WhisperKit/issues) on GitHub.
8 changes: 8 additions & 0 deletions Examples/WhisperAX/Debug.xcconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// For licensing see accompanying LICENSE.md file.
// Copyright © 2024 Argmax, Inc. All rights reserved.

// Configuration settings file format documentation can be found at:
// https://help.apple.com/xcode/#/dev745c5c974

CODE_SIGN_STYLE=Automatic
DEVELOPMENT_TEAM=
Loading