Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 30th December 2013, 22:09   #1  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
Any attempt to dump disc real raw binary data?

Hi there,

I've recently studied the in & out of how data are layered onto a CD-ROM surface by reading the ECMA standard.
I realized that the binary data you get from CD drive are actually not the binary raw data burnt onto the physical material:
Sector data first get scrambled before being transformed into F1-Frames, then into F2-Frames and into F3-Frames. Finally, F3-Frames are encoded through 8-to-14 Modulation (EFM) before getting burnt.
So when a CD drive is reading raw data, the firmware goes through all this process backward to send unscrambled sector data to the computer.

Did anybody make any attempts to modify the firmware of a disc drive and get the binary raw data instead of the sector data?
xinyingho is offline   Reply With Quote
Old 31st December 2013, 02:44   #2  |  Link
nhakobian
Registered User
 
Join Date: Oct 2009
Location: San Francisco, CA
Posts: 99
While this might be interesting from an educational standpoint in understanding the standards, I really don't see the practicality of it. There is a ton of coding for error resillience, ability to seek, and prevention of certain bit patterns that may be difficult to read (the EFM modulation). I did some reading on these techniques a few months back, but didn't run into anything that returns the raw 1's and 0's that are encoded on a disk.

My guess is that low level encoding is probably implemented in a hardware chip, not at the firmware level. But, you never know, it might be possible.
nhakobian is offline   Reply With Quote
Old 31st December 2013, 08:47   #3  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
It could be useful to preserve medias in their real original form for educational purpose bu also for emulation or simply for archiving.
There are indeed a lot of encodings for error resilience, seeking and bit pattern prevention but I don't see why it would prevent the reading of the actual raw data. These successive encodings indeed transform the initial sector data into the final form for burning. What you get at the end is still binary data in the form of pits and lands.

You're probably right about encoding/decoding being done in a hardware chip instead of at firmware level. So after understanding how data are encoded, the next step is to find what actually encode them in a CD drive. Do you know if there are any documentation about the precise schematic layout of any CD drives?
xinyingho is offline   Reply With Quote
Old 31st December 2013, 13:42   #4  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
You also need to be aware that when reading the "raw" bits from the disc, you don't get a sequence of discrete '0' and '1' bits from the sensor, but rather a continuous (analogous) signal! That signals remains around the "high" level for a certain amount of time, then remains around the "low" level for a certain amount of time, then again around the "high" level. And so on. Reconstructing the most-likely "data" (payload) bits from the input signal needs to consider the signal level and the signal timing. This process is probably implemented in special DSP chips with no way to get the original (analogous) input signal via software. One reason, or maybe the reason, why the FEM modulation is used on CD-ROM's and DVD-ROM's is to avoid long sequences of '0' bits in the "raw" signal. And that is required because, on the "physical" layer, '0' bits are encoded by keeping the signal level constant while '1' bits are encoded by swapping the signal level between high/low level (NRZ-M coding). Consequently, if we had very long sequences of '0' bits in the "raw" signal, there would be very long intervals with no signal change at all - which would make reconstructing the signal timing impossible (with the required precision).
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 31st December 2013 at 14:41.
LoRd_MuldeR is offline   Reply With Quote
Old 31st December 2013, 14:56   #5  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
I understand what you mean. I indeed read that the final EFM encoding is used to avoid having too many successive 0 bits. The reason you give is quite reasonable and must be right.

But anyway, I want to get a binary raw format, so if I can't get access to the analogous signal, it's not that much an issue. If I can get the signal just after its analog to digital conversion, it would be perfect actually.

The thing is I want to see what can be done with commercially available disc drives connected to a PC. Can we change the firmware or any other internal programs to manipulate the original signal? Or maybe get access to any buffers containing the original signal before or after its digital conversion?
xinyingho is offline   Reply With Quote
Old 31st December 2013, 15:10   #6  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Well, who says that the "raw" (continuous) signal that comes from the sensor will be "boiled down" to a sequence of discrete '0' and '1' bits before it goes through the FEM decoder? Quite often, FEC (forward error correction) decoders operate on signal probabilities, i.e. the input is not a sequence of discrete '0' and '1' bits, but a sequence of continuous [-1,+1] values - where "-1" means 100% probability that the "raw" bit is a '0' bit, +1 means 100% probability that the "raw" bit is a '1' bit and ±0 means equal probability that the "raw" bit is a '0' or '1' bit. The decoder then looks at these signal probabilities and decides which codeword is the most-likely one to have been "sent" originally. Finally, it outputs the corresponding "data" bit(s).
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 31st December 2013 at 15:21.
LoRd_MuldeR is offline   Reply With Quote
Old 31st December 2013, 15:38   #7  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
I see. You're quite knowledgeable in this domain.
So the EFM decoder also acts as a FEC decoder by working directly on the analogous signal and then output digital F3-Frames.

Do you know if this is a unique DSP chip that is doing the overall decoding until unscrambled sector data? I've heard that it's a least possible to get the still-scrambled sector data.
xinyingho is offline   Reply With Quote
Old 31st December 2013, 16:55   #8  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by xinyingho View Post
So the EFM decoder also acts as a FEC decoder by working directly on the analogous signal and then output digital F3-Frames.
Well, I don't know for sure. But that's how such things are done in practice, in my experience. Might even be implementation-specific.

Quote:
Originally Posted by xinyingho View Post
Do you know if this is a unique DSP chip that is doing the overall decoding until unscrambled sector data? I've heard that it's a least possible to get the still-scrambled sector data.
Sorry, no idea. And again this might be different between different devices...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 31st December 2013 at 17:01.
LoRd_MuldeR is offline   Reply With Quote
Old 31st December 2013, 19:14   #9  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
I understand. The standard only dictates an algorithm, not the actual hardware implementation. So from a brand to another, or even from device to device of the same company, actual implementations can be quite different.

I got to see if I can find any device schematics and work from there as reference. We always have to start from somewhere
xinyingho is offline   Reply With Quote
Old 31st December 2013, 20:38   #10  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
The algorithm is deterministic. So no matter what company is, the result is the same.

I really see no point in this adventure, as these details are anyway hidden in DSPs.
Reading the raw data is also useless from an arhival point of view, it's the "cooked" data the interesting one.

The only incentive to play with raw data I see it only for copy protection reasons.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Old 1st January 2014, 15:20   #11  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
We're saying the same thing about the algorithm and the result in different ways.

Indeed, one of the reasons to play with raw data is to also be able to copy the copy protection data as well. If you want to do real archiving, it's always better to do a complete copy than a partial copy including hacks to disable copy protection systems. Doing the former should be more beneficial in the long run. If you just want to instantly enjoy the "cooked" data for personal use, it's of course useless, but it's not my purpose at all.

Anyway, all the details may be hidden in DSPs, but it's never a bad thing to have a deeper understanding on the tools you're using everyday.
xinyingho is offline   Reply With Quote
Old 1st January 2014, 15:43   #12  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Ghitulescu View Post
The algorithm is deterministic. So no matter what company is, the result is the same.
I would disagree here. What is deterministic about FEM encoding is how the sequence of "data" (payload) bytes is converted to 14-Bit codewords, resulting in the "raw" bits to be burned on the disc. It is also deterministic how a sequence of those "raw" bits is converted to phases of "pits" and "lands" (cf. NRZ-M Coding). But when reading the disc, what you get is a signal that is continuous in time and value, not a discrete sequence of '0' and '1' bits. There could be a zillion of ways to reconstruct the original data bits from there. And the result of a read operation is not deterministic, because in reality you never have a "perfect" signal and thus you can only try to determine which code word is the most likely one to have been sent. If it was deterministic, read errors would be non-existing. But we all know they exist. Also the "raw" signal can vary each time you read the same part of the disc - otherwise re-reading (e.g. in case of a read error) would be pointless.

Quote:
Originally Posted by xinyingho View Post
If you just want to instantly enjoy the "cooked" data for personal use, it's of course useless, but it's not my purpose at all.
So what exactly is your purpose?
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 1st January 2014 at 16:05.
LoRd_MuldeR is offline   Reply With Quote
Old 1st January 2014, 16:49   #13  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
Well, I should rather say it's not my main purpose.
The thing is I've recently discussed with some people of the video game emulation community about why there are so many different teams dedicated to the same goal: dumping old game images correctly in order to achieve 100% complete emulation (emulation being too slow to enjoy with available computing power isn't an issue). It happens that they all disagree on what methods should be used to do the job. To understand why they disagree, I had to get technical knowledge on how to dump ROMs and discs and what kind of images can be produced.
I realized that, for ROMs, they already got the raw binary data but, for discs, they were playing around with what commercial solutions could offer without having the real raw binary data. So those teams are disagreeing on how to get the closest to raw data. But if it's possible to get the actual raw data, then their disagreements will become numb.
xinyingho is offline   Reply With Quote
Old 6th January 2014, 21:58   #14  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
Quote:
Originally Posted by LoRd_MuldeR View Post
I would disagree here. What is deterministic about FEM encoding is how the sequence of "data" (payload) bytes is converted to 14-Bit codewords, resulting in the "raw" bits to be burned on the disc. It is also deterministic how a sequence of those "raw" bits is converted to phases of "pits" and "lands" (cf. NRZ-M Coding). But when reading the disc, what you get is a signal that is continuous in time and value, not a discrete sequence of '0' and '1' bits. There could be a zillion of ways to reconstruct the original data bits from there. And the result of a read operation is not deterministic, because in reality you never have a "perfect" signal and thus you can only try to determine which code word is the most likely one to have been sent. If it was deterministic, read errors would be non-existing. But we all know they exist. Also the "raw" signal can vary each time you read the same part of the disc - otherwise re-reading (e.g. in case of a read error) would be pointless.
You're right about this. I was referring to the "mathematical" part, not to the ADC part. Nevertheless, once the reflected laser beam was read out, certain determinism would be achieved, provided the errors are correctable.

Concerning the games. Going beyond "cooked" is useless in my opinion, because this is the data any console needs. It's very rare that a console has full aceess to raw data (like a Spectrum Sinclair ZX had with its cassette interface), but even then it's only for copy protection purposes. Most parts of raw access are anyway hidden in the firmware/chip of the reader. It would be easier to circumvent the routine than to copy "raw" all media.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Old 7th January 2014, 00:52   #15  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
Of course, when trying to emulate machines with analog inputs as magnetic or optical media, it's more convenient to convert those media into cooked data. And cooked data are readily available with general use products. The thing is that those old copy protection schemes are also a part of those systems, and so a complete emulator should also be able to emulate those protection systems.

Most people would say it's not worth it. Nevertheless, you have websites describing those protection systems, why not preserve them and emulate them as well? It's still part of the recent computing age history

Anyway, my question isn't about if it's worth it, it's about if we can dump raw data with general use products and some programming skills, or if some electromechanical plumbing skills are also needed, or if it's better to start from scratch and build up an entirely dedicated device.
I know that, for floppy disc, it's now possible to buy a USB controller, get low-level reads and dump raw data (search for KryoFlux). I can imagine one day when optical disc would be something from the past, similar devices would get developed for optical discs.
xinyingho is offline   Reply With Quote
Old 9th January 2014, 10:50   #16  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
Quote:
Originally Posted by xinyingho View Post
I know that, for floppy disc, it's now possible to buy a USB controller, get low-level reads and dump raw data (search for KryoFlux). I can imagine one day when optical disc would be something from the past, similar devices would get developed for optical discs.
IIRC (it has been more than 20 years) one could do this also with a regular drive and a bit of programming (in assembler), once the interface and specs of the NEC controller (or its equivalent) every floppy unit had were known. With this trick worked a lot of (now forgotten) software to duplicate copy protected discs, or read Mac, Atari, Commodore etc discs on a PC.

I think I understand your goal better.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Old 9th January 2014, 13:01   #17  |  Link
xinyingho
Registered User
 
Join Date: Jun 2012
Posts: 9
Interesting to know that floppy disc raw dumping was already possible with regular drives and some programming.
I wish it would also be possible with optical drives.
xinyingho is offline   Reply With Quote
Old 9th January 2014, 16:05   #18  |  Link
Ghitulescu
Registered User
 
Ghitulescu's Avatar
 
Join Date: Mar 2009
Location: Germany
Posts: 5,769
There's no miracle in this, the floppy interface was simple enough to be directly driven by the CPU (like the earliest interfaces, eg those for Spectrum Sinclair ZX80). Optical media however, required a lot more CPU power, so, like HDDs except for the first two generations, they had specialised controllers that hide the details from the interface and provide only the cooked data.
__________________
Born in the USB (not USA)
Ghitulescu is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:12.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.