Voice Note To Midi OpenClaw Skill - ClawHub
Do you want your AI agent to automate Voice Note To Midi workflows? This free skill from ClawHub helps with notes & pkm tasks without building custom tools from scratch.
What this skill does
Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
Install
npx clawhub@latest install voice-note-to-midiFull SKILL.md
Open original| name | description | tags |
|---|---|---|
| voice-note-to-midi | Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing | audiomidimusictranscriptionmachine-learning |
šµ Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
- Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
- ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
- Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
- Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
- Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Step 1: Stem Separation (HPSS) ā
ā - Isolate harmonic content ā
ā - Remove drums/percussion ā
ā - Noise gating ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Step 2: Pitch Detection ā
ā - Basic Pitch ML model (Spotify) ā
ā - Polyphonic note detection ā
ā - Onset/offset estimation ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Step 3: Analysis ā
ā - Pitch class distribution ā
ā - Key detection ā
ā - Dominant note identification ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Step 4: Quantization & Cleanup ā
ā - Timing grid snap ā
ā - Key-aware pitch correction ā
ā - Octave pruning (harmonic removal) ā
ā - Overlap-based pruning ā
ā - Note merging (legato) ā
ā - Velocity normalization ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
MIDI Output (Standard MIDI File)
Setup
Prerequisites
- Python 3.11+ (Python 3.14+ recommended)
- FFmpeg (for audio format support)
- pip
Installation
Quick Install (Recommended):
cd /path/to/voice-note-to-midi
./setup.sh
This automated script will:
- Check Python 3.11+ is installed
- Create the
~/melody-pipelinedirectory - Set up the virtual environment
- Install all dependencies (basic-pitch, librosa, music21, etc.)
- Download and configure the hum2midi script
- Add melody-pipeline to your PATH
Manual Install:
If you prefer manual setup:
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi
- Add to your PATH (optional):
echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc
Verify Installation
cd ~/melody-pipeline
./hum2midi --help
Usage
Basic Usage
Convert a voice memo to MIDI:
./hum2midi my_humming.wav
This creates my_humming.mid with 16th-note quantization.
Specify Output File
./hum2midi input.wav output.mid
Command-Line Options
| Option | Description | Default |
|---|---|---|
--grid <value> |
Quantization grid: 1/4, 1/8, 1/16, 1/32 |
1/16 |
--min-note <ms> |
Minimum note duration in milliseconds | 50 |
--no-quantize |
Skip quantization (output raw Basic Pitch MIDI) | disabled |
--key-aware |
Enable key-aware pitch correction | disabled |
--no-analysis |
Skip pitch analysis and key detection | disabled |
Usage Examples
Quantize to eighth notes
./hum2midi melody.wav --grid 1/8
Key-aware quantization (recommended for tonal music)
./hum2midi song.wav --key-aware
Require longer minimum notes
./hum2midi humming.wav --min-note 100
Skip analysis for faster processing
./hum2midi quick.wav --no-analysis
Combine options
./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80
Processing MIDI Input
You can also process existing MIDI files through the quantization pipeline:
./hum2midi input.mid output.mid --grid 1/16 --key-aware
This skips the audio processing steps and goes directly to analysis and quantization.
Sample Output
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
[Key-Aware Mode Enabled]
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Input: my_humming.wav
Output: my_humming.mid
ā Step 1: Stem Separation (HPSS)
Isolating melodic content...
Loaded: 5.23s @ 44100Hz
ā Melody stem extracted ā 5.23s
ā Step 2: Audio-to-MIDI Conversion (Basic Pitch)
Running Spotify's Basic Pitch ML model on melody stem...
ā Raw MIDI generated (Basic Pitch)
ā Step 3: Pitch Analysis & Key Detection
Notes detected: 42 total, 7 unique
Note range: C3 - G4
Pitch classes: C3, E3, G3, A3, C4, D4, G4
Dominant note: G3 (23.8% of notes)
Detected key: G major
ā Step 4: Quantization & Cleanup
Octave pruning: removed 3 harmonic notes above 67 (median+12)
Overlap pruning: removed 2 harmonic notes at overlapping positions
Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
Grid: 240 ticks (1/16)
Notes: 38 notes
Key: G major
Key-aware: 2 notes corrected to scale
Tempo: 120 BPM
ā Quantized MIDI saved
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Done! Output: my_humming.mid
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š ANALYSIS SUMMARY
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
Detected Notes: C3, E3, G3, A3, C4, D4, G4
Detected Key: G major
Quantization: Key-aware mode (notes snapped to scale)
MIDI Info: 38 notes, 7 unique pitches, 120 BPM
Pitches: C3, E3, G3, A3, C4, D4, G4
Notes & Limitations
Audio Quality Matters
- Clear, loud melody produces the best results
- Background noise can cause false note detection
- Reverb and effects may confuse pitch detection
- Close-mic'd vocals work significantly better than room recordings
Musical Considerations
- Monophonic sources work best (single melody line)
- Polyphonic audio (chords, multiple instruments) will produce messy results
- Vibrato and pitch bends may be quantized to stepped pitches
- Rapid note passages may be missed or merged
Technical Limitations
- Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
- Note velocities are normalized but may need manual adjustment
- Very short notes (<50ms) may be filtered out by default
- Extreme pitch ranges may cause octave detection issues
Post-Processing Recommendations
After generating MIDI, you may want to:
- Import into your DAW and adjust tempo to match your original recording
- Quantize further if stricter timing is needed
- Adjust note velocities for dynamics
- Apply swing/groove templates if the rigid grid sounds too mechanical
- Edit individual notes that were misdetected (common with fast runs)
Supported Audio Formats
Input formats supported via FFmpeg:
- WAV, AIFF, FLAC (uncompressed, best quality)
- MP3, M4A, AAC (compressed, acceptable)
- OGG, OPUS (open source formats)
- Most other formats FFmpeg supports
Troubleshooting
No notes detected
- Check that input file isn't silent or corrupted
- Try increasing
--min-notethreshold - Verify audio has clear melodic content (not just noise)
Too many notes / messy output
- Enable octave pruning and overlap pruning (on by default)
- Use
--key-awareto constrain to musical scale - Check for background noise in source audio
Wrong key detected
- Key detection works best with at least 8-10 measures of music
- Chromatic passages may confuse the detector
- Manually review and adjust in your DAW if needed
Notes in wrong octave
- Basic Pitch sometimes detects harmonics instead of fundamentals
- The pipeline includes pruning, but some may slip through
- Use your DAW's transpose function for simple octave shifts
References
- Basic Pitch - Spotify's polyphonic pitch detection model
- librosa HPSS - Harmonic-Percussive Source Separation
- Krumhansl-Kessler Key Profiles - Key detection algorithm
License
This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.