DICOM to JPEG/PNG: Privacy-First Medical Image Conversion
Converting DICOM files to web-friendly formats while stripping PHI, applying windowing correctly, and staying HIPAA-aware.
DICOM to JPEG/PNG: Privacy-First Medical Image Conversion
A DICOM file is not really an image. It is a medical record with pixel data attached. The .dcm container holds somewhere between 80 and 400 tagged fields covering everything from the patient's name and MRN to the X-ray tube voltage and the radiopharmaceutical half-life. The image is one field among many.
When a clinician or researcher asks for a "JPEG of that scan," two different things have to happen. Someone has to convert 12 or 16-bit grayscale pixel data into 8-bit RGB that a normal viewer can display. And someone has to make sure none of those 400 other fields leak into the output.
Both matter. Getting only the first right is how you end up with a publication-ready image that exposes a patient's name in the EXIF data.
The PHI fields that show up in DICOM
The DICOM standard (PS 3.15 confidentiality profile, 2011, revised through 2024) lists the tags that contain patient health information. The non-exhaustive short list:
| Tag | Name | Contains |
|---|---|---|
| (0010,0010) | PatientName | Full legal name |
| (0010,0020) | PatientID | MRN |
| (0010,0030) | PatientBirthDate | DOB, YYYYMMDD |
| (0010,0040) | PatientSex | M/F/O |
| (0008,0080) | InstitutionName | Hospital |
| (0008,0090) | ReferringPhysicianName | Physician name |
| (0008,1010) | StationName | Scanner workstation |
| (0008,0020) | StudyDate | Exam date |
| (0008,0030) | StudyTime | Exam time |
| (0020,000D) | StudyInstanceUID | Unique study ID, links to RIS |
A de-identification pass needs to handle all of these plus the private tags that manufacturers add (GE, Siemens, and Philips all embed their own proprietary fields, some of which contain PHI). The DICOM Standard Supplement 142 defines "Basic Application Level Confidentiality" with a specific action for each tag: remove (X), replace with dummy (D), keep (K), or clean (C).
Just deleting the obvious fields is not enough. Burned-in annotations on the pixel data itself are a major leak source. Ultrasound images often have the patient's name rendered directly into the top-left corner of the pixel data. OCR plus a mask is the only way to handle those.
Windowing: converting pixel values to what humans see
CT and MRI data is 12 to 16 bits per pixel. A chest CT covers roughly -1000 (air) to +3000 (dense bone) on the Hounsfield scale. A JPEG is 8 bits, so 256 shades. You have to pick a window.
The DICOM header usually specifies a default with tags (0028,1050) WindowCenter and (0028,1051) WindowWidth. For a chest CT, a "lung window" is typically center -600, width 1500. A "mediastinum window" is center 40, width 400. A "bone window" is center 400, width 1800. Same data, three different outputs, each for a different clinical question.
The pseudocode is straightforward:
low = center - width / 2
high = center + width / 2
for each pixel p:
if p <= low: out = 0
elif p >= high: out = 255
else: out = (p - low) / (high - low) * 255
Applying the modality LUT first (the rescale slope and intercept from tags 0028,1052 and 0028,1053) is also required for CT. Skip it and the pixel values you are windowing are in raw detector units, not Hounsfield units, and the window centers above will produce garbage.
HIPAA-aware local conversion
Under 45 CFR 164.514(b), de-identified data is not PHI and is not subject to the Privacy Rule. The Safe Harbor method requires removal of 18 specific identifier categories. DICOM de-identification profiles in Supplement 142 map cleanly to Safe Harbor: name, geographic subdivisions smaller than a state, dates (except year for over-89 patients), phone, fax, email, SSN, MRN, account numbers, biometric identifiers, full-face photos, and the rest.
The Expert Determination method is the other route, usually used for dates and device identifiers that have research value.
Whichever route, the conversion environment matters. Uploading a DICOM to a third-party web service to extract a JPEG is a Business Associate relationship, and if the vendor has not signed a BAA, you have just created a breach. A browser-based converter that runs everything locally via WebAssembly never sees the file. Konvrt's DICOM handling at /convert does the header parse, de-identification, windowing, and PNG/JPEG export in the browser. The image bytes do not cross the network. For clinical researchers processing a 2,000-series study, /batch handles the bulk pass.
This also matters for research data under GDPR Article 9 and similar frameworks outside the US, where medical data has a higher protection bar regardless of whether the identifiers are present.
Practical output notes
PNG is the right default for single images. Lossless, supports 16-bit grayscale if you need diagnostic quality, and every tool on earth reads it. JPEG at quality 95 is fine for presentation slides and papers, but do not archive to JPEG.
For a CT series, exporting as a grid of PNG thumbnails plus a single multi-frame TIFF gives you a browseable overview and a research-usable stack. 8-bit TIFF for presentation, 16-bit for any downstream quantitative work.
A de-identified DICOM with a de-identification profile record (tag 0012,0062 set, 0012,0063 listing the profile used) is better than a PNG for research sharing, because the recipient can redo the windowing for their clinical question. Convert to PNG only when the destination actually needs PNG.
One final check before any DICOM-derived image leaves your institution: scan the pixel data for burned-in text. A single unmasked identifier in the corner of an ultrasound frame undoes every tag you cleaned.