Alright, so something went wrong in my first run process based on what I'm seeing for results (still a bunch of patches beyond P3). I'll share everything I've done so far. Graeme Gill suggested I use his icclu tool to convert from P3 RGB to L*a*b* and then to 2020 RGB, which makes sense.
So I started with the 1005-patch "XXXL verification testchart (video)" which is verify_video_xxxl.ti1
Opened that up and pulled out the RGB values and put them in a separate text file called rgbinput.txt
icclu -pl -ff -s 100 -ir ACES_P3.icm < rgbinput.txt > laboutput.txt
Then opened that up and pulled out the L*a*b*values and put them in a separate text file called labinput.txt
icclu -pl -fb -s 100 -ir Rec2020.icm < labinput.txt > rgboutput.txt
Then we also need the XYZ values to populate the testchart file, so I pulled the resulting RGB values out and put them in a separate text file called rgbinput2.txt
icclu -px -ff -s 100 -or Rec2020.icm < rgbinput2.txt > xyzoutput.txt
Optional: Combined the RGB and XYZ values into CSV and removed anything with R,G, or B value >60 (they are in %) to keep everything <240 nits for now.
Converted the RGB and XYZ values into the space-delimited format for testchart .ti1 format and put them into the .ti1 appropriately.
Loaded the testchart into DisplayCal to make sure everything looked normal (it did), then sorted by "Maximize lightness difference" and saved. This is also optional, but figured it was a good step to make sure DisplayCal was happy with the result anyway.
Known potential issues:
1. The XXXL video testchart was probably designed with SDR 709 in mind, which isn't ideal for our HDR purposes, but as a proof of concept I wasn't really concerned if the spacing of patches was optimal by using a pre-conditioning profile of P3 or anything else like that. I don't think this will cause any major issues.
2. When viewing the profiles in DisplayCal, ACES_P3.icm uses Gamma 2.6 tone response curve, DisplayP3.icm uses sRGB, and Rec2020.icm uses Rec.709. This seemed like it could cause some big problems, and so this is what I'm leaning toward as the issue currently. I thought about creating synthetic profiles of P3 and 2020 using 2084 tone curve before I went through this process. I'll try that now. Hopefully that straightens things out.
Edit: Just did some quick manual testing with P3 primaries and the synthetic profiles. I think the conversion is working correctly now! I'll post the testchart as soon as I'm done.
Here's the full 1005 original patches, transformed to P3 within 2020 and in .csv format. From here it can be filtered down (what I'm about to do) and then placed into a testchart file.
https://mega.nz/file/hEkDzAYK#F2Cl1C...C1nyXaA8TfSHNQ
And here's the resulting testchart limited to RGB values of 60% max (369 patches):
https://mega.nz/file/RMcTlDaL#3F4TbH...61NV02dJIvgPxA
And here's the resulting testchart limited to RGB values of 70% max (482 patches):
https://mega.nz/file/xE8GUQRI#hDcYG7...DjmnhwcK7xunss
So if this works we should be able to generate a testchart preconditioned to the synthetic P3 2084 profile and have better spacing and also not have to discard a bunch of generated patches like I did. I'll test it shortly...
Update: the 369 patch filtered chart I shared works really well! Nothing seems to be out of gamut and so average dE is much better. However, my first attempts to generate a new full testchart using my synthetic P3 profile for preconditioning has resulted in targen hanging.
Update2: We can use the P3 D65 2084 10000 nit profile for preconditioning when we generate testcharts.