Lossy or lossless: which audio format should I keep my library in?

For long-term archiving choose FLAC (lossless, ~50% the size of WAV). For everyday listening on phones and streaming use MP3 320 kbps or Opus 192 kbps (transparent quality, no audible difference at normal volumes). Avoid converting lossy → lossy → lossy: every step compounds artifacts.

What sample rate and bitrate should I use for podcasts?

Spoken-word podcasts: 44.1 kHz mono, 64–96 kbps Opus or 96 kbps MP3. Music podcasts: 44.1 kHz stereo, 128–192 kbps. Going above wastes bandwidth — speech has limited frequency content above 7 kHz, and listeners on data plans appreciate the smaller file. Apple Podcasts and Spotify both accept up to 48 kHz / 320 kbps.

Why does my MP3 sound worse after re-encoding from another MP3?

Lossy → lossy compounds quantization noise. Each encode discards the same kind of perceptual information again, multiplying artifacts. Always re-encode from a lossless master if you have one (WAV, FLAC, or the original recording). If only an MP3 is available, keep the bitrate at or above the source — never go up to "improve quality".

Will conversion preserve ID3 tags (artist, album, cover art)?

Most modern tools (FFmpeg with <code>-map_metadata 0</code>, foobar2000, dBpoweramp) preserve ID3 tags and embedded cover art. Some quick-and-dirty converters strip them silently. KaijuConverter preserves tags in its <a href="/convert/flac-to-mp3">audio conversions</a>; if metadata is critical to you, verify the output with <code>ffprobe</code> before deleting the original.

Lossy or lossless: which audio format should I keep my library in?

For long-term archiving choose FLAC (lossless, ~50% the size of WAV). For everyday listening on phones and streaming use MP3 320 kbps or Opus 192 kbps (transparent quality, no audible difference at normal volumes). Avoid converting lossy → lossy → lossy: every step compounds artifacts.

What sample rate and bitrate should I use for podcasts?

Spoken-word podcasts: 44.1 kHz mono, 64–96 kbps Opus or 96 kbps MP3. Music podcasts: 44.1 kHz stereo, 128–192 kbps. Going above wastes bandwidth — speech has limited frequency content above 7 kHz, and listeners on data plans appreciate the smaller file. Apple Podcasts and Spotify both accept up to 48 kHz / 320 kbps.

Why does my MP3 sound worse after re-encoding from another MP3?

Lossy → lossy compounds quantization noise. Each encode discards the same kind of perceptual information again, multiplying artifacts. Always re-encode from a lossless master if you have one (WAV, FLAC, or the original recording). If only an MP3 is available, keep the bitrate at or above the source — never go up to "improve quality".

Will conversion preserve ID3 tags (artist, album, cover art)?

Most modern tools (FFmpeg with <code>-map_metadata 0</code>, foobar2000, dBpoweramp) preserve ID3 tags and embedded cover art. Some quick-and-dirty converters strip them silently. KaijuConverter preserves tags in its <a href="/convert/flac-to-mp3">audio conversions</a>; if metadata is critical to you, verify the output with <code>ffprobe</code> before deleting the original.

Audio Fingerprinting with Python: acoustid and chromaprint

Q: Lossy or lossless: which audio format should I keep my library in?

For long-term archiving choose &lt;strong&gt;FLAC&lt;/strong&gt; (lossless, ~50% the size of WAV). For everyday listening on phones and streaming use &lt;strong&gt;MP3 320 kbps&lt;/strong&gt; or &lt;strong&gt;Opus 192 kbps&lt;/strong&gt; (transparent quality, no audible difference at normal volumes). Avoid converting lossy → lossy → lossy: every step compounds artifacts.

Q: Will conversion preserve ID3 tags (artist, album, cover art)?

Most modern tools (FFmpeg with &lt;code&gt;-map_metadata 0&lt;/code&gt;, foobar2000, dBpoweramp) preserve ID3 tags and embedded cover art. Some quick-and-dirty converters strip them silently. KaijuConverter preserves tags in its &lt;a href=&quot;/convert/flac-to-mp3&quot;&gt;audio conversions&lt;/a&gt;; if metadata is critical to you, verify the output with &lt;code&gt;ffprobe&lt;/code&gt; before deleting the original.

Audio fingerprinting generates a unique identifier from the sonic content of a file, independent of its format or metadata tags. It's used to identify songs, detect duplicates, sync subtitles, and auto-populate metadata from databases like MusicBrainz.

How Chromaprint Works

Chromaprint (created by Lukáš Lalinský) works as follows:

Converts audio to mono 44.1 kHz signal
Splits into overlapping ~1.2-second windows
Applies FFT to get the frequency spectrum
Calculates chroma feature vectors (12 semitones per octave)
Generates a 32-bit fingerprint per window → concatenation = final fingerprint

The result is an array of 32-bit integers encoded as base64.

Installation

# Install chromaprint binary (fpcalc)
# Windows: download from https://acoustid.org/chromaprint
# Linux:
apt-get install libchromaprint-tools

# macOS:
brew install chromaprint

# Python binding:
pip install pyacoustid mutagen

Generating a Fingerprint

import acoustid
import chromaprint

def generate_fingerprint(audio_path):
    """Generate an audio fingerprint with duration."""
    duration, fingerprint = acoustid.fingerprint_file(audio_path)
    print(f"File:        {audio_path}")
    print(f"Duration:    {duration:.1f} seconds")
    print(f"Fingerprint: {fingerprint[:60]}...")
    return duration, fingerprint

duration, fp = generate_fingerprint("song.mp3")

Using fpcalc Directly

import subprocess
import json

def fpcalc(audio_path, length=120):
    """Call fpcalc to get the fingerprint as JSON."""
    result = subprocess.run(
        ['fpcalc', '-json', '-length', str(length), audio_path],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        raise RuntimeError(f"fpcalc error: {result.stderr}")
    return json.loads(result.stdout)

data = fpcalc("song.ogg")
print(f"Duration:    {data['duration']:.1f}s")
print(f"Fingerprint: {data['fingerprint'][:50]}...")

Identifying Songs with AcoustID

AcoustID is the free public service mapping fingerprints to MusicBrainz IDs. Get a free API key at acoustid.org.

import acoustid

API_KEY = "your_api_key_here"

def identify_song(audio_path):
    results = acoustid.match(API_KEY, audio_path)
    for score, recording_id, title, artist in results:
        print(f"Score:     {score:.0%}")
        print(f"Title:     {title}")
        print(f"Artist:    {artist}")
        print(f"MBID:      {recording_id}")
        print("---")

identify_song("unknown_track.mp3")

Typical output:

Score:     98%
Title:     Bohemian Rhapsody
Artist:    Queen
MBID:      b7b38bff-b9c7-3c7e-b82d-1f49a1f0e4f2
---

Fetching Extended Metadata

import acoustid
import requests

def fetch_full_metadata(audio_path, api_key):
    duration, fingerprint = acoustid.fingerprint_file(audio_path)

    url = "https://api.acoustid.org/v2/lookup"
    params = {
        'client':      api_key,
        'meta':        'recordings releases releasegroups tracks',
        'duration':    int(duration),
        'fingerprint': fingerprint,
    }
    resp = requests.get(url, params=params)
    data = resp.json()

    if data['status'] != 'ok' or not data.get('results'):
        print("Not found")
        return None

    best  = data['results'][0]
    score = best.get('score', 0)

    if not best.get('recordings'):
        print(f"Found (score {score:.0%}) but no metadata")
        return None

    rec  = best['recordings'][0]
    info = {
        'title':   rec.get('title', 'Unknown'),
        'artists': [a['name'] for a in rec.get('artists', [])],
        'duration': rec.get('duration', 0),
        'mbid':    rec.get('id', ''),
        'score':   score,
    }

    if rec.get('releases'):
        release = rec['releases'][0]
        info['album'] = release.get('title', '')
        info['year']  = release.get('date', {}).get('year', '')

    return info

meta = fetch_full_metadata("song.flac", API_KEY)
if meta:
    print(f"Title:   {meta['title']}")
    print(f"Artist:  {', '.join(meta['artists'])}")
    print(f"Album:   {meta.get('album', 'N/A')}")
    print(f"Year:    {meta.get('year', 'N/A')}")
    print(f"Score:   {meta['score']:.0%}")

Detecting Duplicates in a Music Collection

import acoustid
import chromaprint
from pathlib import Path

def extract_raw_fingerprint(audio_path):
    """Decode fingerprint to list of int32 for direct comparison."""
    _, fp_b64 = acoustid.fingerprint_file(audio_path)
    return chromaprint.decode_fingerprint(fp_b64)[0]

def fingerprint_similarity(fp1, fp2):
    """Compare two fingerprints using Hamming distance on bit level."""
    if not fp1 or not fp2:
        return 0.0
    length = min(len(fp1), len(fp2))
    equal_bits = sum(
        32 - bin(a ^ b).count('1')
        for a, b in zip(fp1[:length], fp2[:length])
    )
    return equal_bits / (length * 32)

def detect_duplicates(folder, threshold=0.80, extensions=('mp3','flac','ogg','wav','m4a')):
    folder = Path(folder)
    files  = [f for ext in extensions for f in folder.rglob(f'*.{ext}')]

    print(f"Analyzing {len(files)} audio files...")
    fingerprints = {}
    for f in files:
        try:
            fingerprints[f] = extract_raw_fingerprint(str(f))
            print(f"  OK  {f.name}")
        except Exception as e:
            print(f"  ERR {f.name}: {e}")

    files_list = list(fingerprints.keys())
    duplicates  = []
    for i in range(len(files_list)):
        for j in range(i + 1, len(files_list)):
            a, b = files_list[i], files_list[j]
            sim  = fingerprint_similarity(fingerprints[a], fingerprints[b])
            if sim >= threshold:
                duplicates.append((sim, a, b))

    if duplicates:
        print(f"\nDuplicates found (threshold {threshold:.0%}):")
        for sim, a, b in sorted(duplicates, reverse=True):
            print(f"  [{sim:.1%}] {a.name}  ↔  {b.name}")
    else:
        print("\nNo duplicates found")

    return duplicates

detect_duplicates("/Music/collection", threshold=0.85)

Auto-Tagging MP3 Files

import acoustid
from mutagen.mp3 import MP3
from mutagen.id3 import ID3, TIT2, TPE1, TALB, TDRC, ID3NoHeaderError
import time

def auto_tag(mp3_path, api_key):
    print(f"Processing: {mp3_path}")

    meta = fetch_full_metadata(mp3_path, api_key)
    if not meta or meta['score'] < 0.85:
        print("  → Low confidence, skipping")
        return False

    try:
        tags = ID3(mp3_path)
    except ID3NoHeaderError:
        tags = ID3()

    tags[TIT2.FrameID] = TIT2(encoding=3, text=meta['title'])
    if meta['artists']:
        tags[TPE1.FrameID] = TPE1(encoding=3, text=', '.join(meta['artists']))
    if meta.get('album'):
        tags[TALB.FrameID] = TALB(encoding=3, text=meta['album'])
    if meta.get('year'):
        tags[TDRC.FrameID] = TDRC(encoding=3, text=str(meta['year']))

    tags.save(mp3_path)
    print(f"  → Tagged: {meta['title']} — {', '.join(meta['artists'])}")
    return True

def tag_folder(folder, api_key, pause=1.0):
    from pathlib import Path
    files   = list(Path(folder).rglob('*.mp3'))
    success = 0
    for i, f in enumerate(files, 1):
        print(f"[{i}/{len(files)}]", end=" ")
        if auto_tag(str(f), api_key):
            success += 1
        time.sleep(pause)  # respect API rate limit

    print(f"\nResult: {success}/{len(files)} files tagged")

tag_folder("/Music/untagged", API_KEY)

Real-World Use Cases

Use Case	Tool	Description
Identify unknown song	acoustid.match()	Queries MusicBrainz/AcoustID
Clean duplicate library	fingerprint_similarity()	Name-independent comparison
Auto-tag MP3 collection	mutagen + acoustid	Fills ID3 tags automatically
Detect copyright in video	chromaprint + own DB	Compare against reference catalog
Sync subtitles	fpcalc + correlation	Detect timing offset automatically

Additional Resource

To convert between audio formats (MP3, FLAC, OGG, WAV, AAC, M4A) without any coding, use KaijuConverter — free and no registration needed.

Related conversions

Audio format pairs that come up most often:

Audio Fingerprinting with Python: acoustid and chromaprint

Audio Fingerprinting with Python: acoustid and chromaprint

How Chromaprint Works

Installation

Generating a Fingerprint

Using fpcalc Directly

Identifying Songs with AcoustID

Fetching Extended Metadata

Detecting Duplicates in a Music Collection

Auto-Tagging MP3 Files

Real-World Use Cases

Additional Resource

Related conversions

Frequently Asked Questions