Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Disassemblers

Old News ;-)

See Also Recommended
Books

Recommended Links

Reverse Engineering

Debugging

Understanding
FAT filesystem Softpanorama Archives     Tribute to Dmitry Gurtyak Humor Etc

NEWS CONTENTS

Old News ;-)

[Nov 25, 2006] freshmeat.net Project details for Dissy

Dissy is a disassembler for multiple architectures. It is implemented as a graphical frontend to objdump. It allows fast navigation through the disassembled code and easy searching for addresses and symbols.

Release focus: Minor feature enhancements

Changes:
This release adds a text entry box for highlighting patterns in the disassembled code. The PowerPC version now supports visualizing jumps. A few minor bugs have been fixed.

Author:
Simon Kagstrom [contact developer]

Is Sony in violation of the LGPL - Programming stuff

Update: Click here

I'm sure you've already heard about the Sony rootkit that was first revealed by Mark Russinovich of Sysinternals. After the Finnish hacker Matti Nikki (aka muzzy) found some revealing strings in one of the files (go.exe) that are part of the copy protection software, the rootkit is also suspected to be in violation of the open-source license LGPL. The strings indicate that code from the open-source project LAME was used in the copy protection software in a way that's not compatible with the LGPL license which is used by LAME.

On Slashot muzzy mentioned that he doesn't have access to Sabre BinDiff, a tool that can be used to compare binary files. I was in the opposite position as I have BinDiff but I didn't have the file in question (go.exe). I mailed muzzy and he hooked me up with the file.

I compared go.exe with a VC++-compiled version of lame_enc.dll but unfortunately BinDiff didn't find a single relevant matched function. A quick manual check didn't reveal any LAME functions in go.exe either.

Even though go.exe apparently does not contain any LAME code, a considerable amount of tables and constants from the LAME source files can be found in the go.exe file. Here's a list of the LAME tables I've been able to locate. The first column shows the hex address where the table can be found in the go.exe file, the second column shows the name of the table as it appears in the LAME source code and the third column shows the LAME source file where the table can be found.

I have to add though, that not a single table actually seems to be used by the go.exe code. What does that mean? I've asked random people and I've heard speculation ranging between "accidentaly linked" and "encrypted code in go.exe that uses the tables and can't be found in the disassembler". Further analysis needs to be made but at this point I'm leaning towards more or less accidental inclusion.Posted by sp in Misc at 11:38

Comments

Display comments as (Linear | Threaded)The code in GO.EXE could be compiled with another compiler. In that case your comparison would probably not find a match, but it may still be there. #1 Rhialto on Nov 14 2005, 12:41 Reply This idea is absolutely right and I've thought about it. go.exe was apparently compiled using VC++ 7 (debug build) while lame_enc.dll was compiled using VC++ 6 (release build). That's what PEiD ( http://peid.has.it/ ) says, at least.

In the past I've succesfully used BinDiff to match functions from files compiled with gcc to functions from files compiled with VC++ though. The question is now whether VC++ 7 is so much different from VC++ 6 that BinDiff is less likely (or even unable) to match them even though code produced from VC++ 6 and gcc seem to be similar enough for BinDiff to work.

Furthermore I think the main point of importance is that the tables in go.exe are not referenced by any code (at least not in a way that a static disassembler can detect). I think the reason for this might be the solution to the entire violation question. #1.1 sp on Nov 14 2005, 12:52 Reply They are very different, the VC7 compiler was a complete rewrite of the VC6 compiler and has many improvements. #1.1.1 Anonymous on Nov 15 2005, 17:03 Reply Heya,

I'm another reverser, and I'd be interested in taking a look at this myself (I also have IDA and BinDiff)... could you possibly send me a copy of the exe?

Cheers,

Will #2 Will Whistler on Nov 14 2005, 13:06 Reply This raises an interesting question: are tables originating from LGPLed code enough to make the LGPL apply to the final executable, even though it might not actually use the data?

After all, the tables also have been written and are part of the source code covered by the license. I don't think copyright law would make a difference between the source for executable code and that for the data needed by that code. #3 Arend Lammertink (http://plone.vrijschrift.org) on Nov 14 2005, 14:38 Reply Good observation. That's actually exactly why I didn't even make an attempt to answer the question posed in the topic. I don't know enough about license and copyright issues to make an educated guess. #3.1 sp on Nov 14 2005, 15:27 Reply Coming to think of it, it's not surprising at all you can't find any code if you compare a dll and a static linked executable on Windows.

Windows' dlls are designed in such a way that function calls between dlls are completely different from their static equivalents. Function calls are adressed using an offset table in the dll. The caller uses special access code. That's why dlls are accompanied by "import" libraries. Every function that can be used from outside of a dll has to be "exported" using some declspec macro's. I'm sure these will also influence name mangling, etc.

To make a long story short: try comparing the executable with a static Lame library... #4 Arend Lammertink (http://plone.vrijschrift.org) on Nov 14 2005, 15:18 Reply I assumed that this wouldn't matter because of the level of abstraction BinDiff uses to determine whether code from two files is equal or not. The calling convention shouldn't really matter here.

But alas, assumption is the mother of all fuck-ups. So I went back to check. As I expected I don't get any results from a statically linked LAME either.

I also want to draw attention to another issue. LAME is an application that uses a lot of FPU instructions. Go.exe barely uses any.

I've created an opcode distribution list for the files lame_enc.dll and go.exe. The former uses tens of thousands of FPU instructions with fld being the 2nd most used instruction (only mov is used more often). The latter file, on the other hand, uses only a few hundred FPU instructions and there are 26 more frequently used CPU instructions before the 1st FPU instruction comes in the list. #4.1 sp on Nov 14 2005, 20:11 Reply What relevant parts of the LGPL would be infringed if it does contain this? The LGPL doesn't require that things that link to it also be LGPL, unlike the GPL. #5 Nick Johnson on Nov 14 2005, 21:04 Reply They still have to offer the source code for any LGPL code they distribute, or modify and distribute, and they still have to include an LGPL license notice. They can link to LPGL code, but they can't hide it. #5.1 Rodrin on Nov 15 2005, 16:22 Reply Another, perhaps more logical explanation, given the lack of substantial similarity: Perhaps the Sony software includes LAME signatures so it can detect whether a user is running LAME to encode MP3s. #6 Ansel (http://www.anselsbrain.com/) on Nov 14 2005, 21:22 Reply Perhaps the tables in question aren't used to execute anything, but merely to detect LAME and/or programs that use it? #7 HyperHacker (http://hypernova.amarok-shadow.com) on Nov 15 2005, 07:19 Reply Let me quote some comment on a slashdot story (http://yro.slashdot.org/yro/05/11/15/1250229.shtml?tid=117&tid=188&tid=17) from muzzy:

That only concerns GO.EXE, and while the analysis is correct for that executable, I checked for LAME references against every binary in the compressed XCP.DAT file after I managed to unpack it (thanks to freedom-to-tinker.com guys for providing description of the format). Turns out, there's more binaries including references to LAME, and this time there's actually code that uses the data as well. And not just LAME, there's also Id3lib included in one dll, and bladeenc and mpglib distributed along with the DRM. All of this is LGPL, it's code, and it's being used. #8 Cone on Nov 15 2005, 15:06 Reply Yes, this is correct. We're right now working on the new files and we've already matched code manually. We're now in the process of developing a few tools to match code automatically because there's a lot of code to match. #8.1 sp on Nov 15 2005, 15:08 Reply Congratulations, guys! #8.1.1 Arend Lammertink on Nov 15 2005, 15:26 Reply What if the tables from LAME are there, to be used to detect a LAME encoder being used on the system? ie, if you try to rip the tracks, it will see that LAME is running, and perhaps corrupt the resulting ripped file? #9 Ed Felton on Nov 15 2005, 16:30 Reply Wonders if go.exe makes any systems calls to register itself. #10 hawkeyeaz1 on Nov 15 2005, 16:44 Reply

Recommended Links

See also Decompilation Page and Reverse Engineering Links

The REAP project at InterGlossa is developing tools to support maintenance and reverse engineering of assembly language programs, concentrating on well-engineered hand-coded programs. Abstraction of assembly programs takes place in the context of a selected `engineering model' which includes the definition of the instruction set semantics but also constraints on the programs similar to those found in ABIs. The process of translation takes the form of a large-scale inductive demonstration that the program meets the constraints of the `engineering model' as the translated abstraction is produced. An engineer's interface makes this manifest to the engineer supervising the translation.

This approach can in principle handle programs whose models include a disciplined use of code self-modification or dynamic register bank switching. As intermediate language for the major analyses involved we use a representation based on the XANDF X/Open standard originating from the UK Defence Research Agency. XANDF is a standard for architecture neutral program representation which will permit support for analyses of portability. Concurrency is not yet covered but recent advances show how XANDF can be extended to encompass concurrency and distribution. We illustrate the effectiveness of the tools with examples taken from live Intel 8051 and Zilog Z80 systems.

My understanding of the parent post was that this is exactly what he was saying. I don't think he was claiming that programs written in assembly were better, but that programmers who knew assembly were better programmers.

I think you were agreeing with him. [ Reply to This | Parent ] Re:Actually, they DON'T. (Score:2)
by be-fan (61476) on Thursday February 05, @09:41PM (#8197542)
There was an interesting study done comparing the performance and productivity of C++ vs Lisp vs Java programmers. Results are here. [flownet.com]

One very interesting thing they found was that while the best C++ programs were faster, the average Lisp program was faster*. Programmer experience could not account for this.

In retrospect, its easy to see why. When you write clean, straight-forward code like you would in a production environment, its much easier for the compiler to optimize high-level code than low-level code. Compilers for languages like Lisp/Scheme/Haskell/etc do all sorts of optimizations that existing C/C++/Java compilers either don't do (forgotten technology) or can't do (pointers cause lots of problems).

My point is that programming at a higher level, in general, allows the compiler to do more optimization than programming at a lower level. Given infinite time, an asm programmer will always be able to crank out faster code than a C++ programmer, who will always be able to crank out faster code than a Lisp programmer. However, in the real world, it may very well be the case that giving the optimizer more meat to work on will result in a program that is ultimately faster over all. Re:Actually, they DON'T. (Score:1)
by shmat (124756) <david.dguy@net> on Friday February 06, @10:47AM (#8201667)
(http://www.dguy.net/)
I agree completely. I started my career coding in assembly language (yes, I'm old). When I started using C I thought I had died and gone to heaven because I was 10 TIMES more productive with C.

Like most assembly language programmers, I went through the compiler generated assembly for my first couple of C programs because I wanted to see how bad a job the compiler did. I found that the assembly was hard to understand but very efficient. There were very few places where I could have done better.

As to learning computer science, I think the only value in using assembly language as a teaching tool is that assembly language requires extremely careful attention to detail and patience. So maybe it serves as a screening process because good developers need lots of both. However, algorithms, data structures, OO, patterns, etc. are far more important to learn than assembler. PDP11, VAX, 68K mislabeled (Score:3, Informative)
by snStarter (212765) on Friday February 06, @01:04AM (#8198707)
No one would really call the PDP-11 a CISC machine. You might call it a RISC VAX however (pause for audience laughter).

Also, many PDP-11's were random logic and not micro-coded. The later 11's were microcoded, of course, the 11/60 being the extreme because it had a writeable control store that let you define your own micro-coded instructions.

It's important to remember that the entire RT-11 operating system was written entirely in MACRO-11 by some amazing software engineers who knew the PDP-11 instruction set inside and out. The result was an operating system that ran very nicely in a 4K word footprint.

The VAX had a terrific compiler, BLISS-32, which created amazingly efficient code; code no human being would ever create but fantastic none-the-less. Forgetting the Most Important Point (Score:4, Funny)
by duck_prime (585628) on Thursday February 05, @08:45PM (#8197142)

For learning, we don't have to learn assembly first anymore, you can start with any language. I think it is good to take a two pronged approach. Learn C first, and at the same time, start learning digital logic. [...] When one is comfortable with both, I think learning assembly is much easier.

You are missing the One True Purpose of assembly language, and the One True Reason everyone should learn assembly first:

Nothing else in the Universe can make students grateful -- grateful! -- to be allowed to use C While Don Knuth's assembly language MIX runs on a theoretical processor, all of the examples in The Art of Computer Programming (TAOCP) are based on it. Even as he has revised the editions, he has updated the language to be based on RISC (search Google for MMIX [google.com]), but he chose not to update the examples to a higher level language. Here is his reasoning from his web page [stanford.edu]:

Many readers are no doubt thinking, Why does Knuth replace MIX by another machine instead of just sticking to a high-level programming language? Hardly anybody uses assemblers these days.

Such people are entitled to their opinions, and they need not bother reading the machine-language parts of my books. But the reasons for machine language that I gave in the preface to Volume 1, written in the early 1960s, remain valid today:


Moreover, if I did use a high-level language, what language should it be? In the 1960s I would probably have chosen Algol W; in the 1970s, I would then have had to rewrite my books using Pascal; in the 1980s, I would surely have changed everything to C; in the 1990s, I would have had to switch to C++ and then probably to Java. In the 2000s, yet another language will no doubt be de rigueur. I cannot afford the time to rewrite my books as languages go in and out of fashion; languages aren't the point of my books, the point is rather what you can do in your favorite language. My books focus on timeless truths.

Therefore I will continue to use English as the high-level language in TAOCP, and I will continue to use a low-level language to indicate how machines actually compute. Readers who only want to see algorithms that are already packaged in a plug-in way, using a trendy language, should buy other people's books.

The good news is that programming for RISC machines is pleasant and simple, when the RISC machine has a nice clean design. So I need not dwell on arcane, fiddly little details that distract from the main points. In this respect MMIX will be significantly better than MIX

YALE PATT ALREADY DID THIS (Score:2)
by Prof. Pi (199260) on Friday February 06, @02:26PM (#8204582)
One of the leaders in computer architecture, Yale Patt, has already written a book [mhhe.com] based on this concept. He gives enough of an overview of logic design to understand things at an RTL (register) level, and distills CPU design to its essentials. He doesn't get to C until half way through the book.

His observation is that CS students have a MUCH easier time comprehending things like recursion when they understand what's really going on inside.

(My efforts to get this book introduced at my old university were unsuccessful, as the department chairman was afraid that teaching assembly language would drive students away. He wanted to teach them Java instead.)

MIXAL (Score:3, Informative)
by texchanchan (471739) <[email protected]> on Thursday February 05, @07:03PM (#8196116)
(http://www.chanchan.net/)
MIXAL, MIX assembly language. MIX was the virtual machine I learned assembly on in 1975. Googling reveals that MIX was, in fact, the Knuth virtual computer. The book came with a little cue card with a picture of Tom Mix [old-time.com] on it. MIX has 1 K of memory. Amazing what can be done in 1 K.
[ Reply to This | Parent ]

Re:Knuth (Score:1)
by d_p (63654) on Thursday February 05, @08:25PM (#8196980)
Knuth uses MIX, a 6 bit machine language as well as a form of assembly for his simulated computer.

It may have been updated to 8 bit in the addendums to the book. Re:Somewhere in the middle... (Score:5, Insightful)
by Saven Marek (739395) on Thursday February 05, @07:09PM (#8196209)
I learned LOGO and BASIC as a kid, then grew into Cobol and C, and learned a little assembly in the process. I now use C++, Perl, and (shudder) Visual Basic (when the need arises). My introduction to programming at a young age through very simple languages really helped to whet my appetite, but I think that my intermediate experiences with low level languages helps me to write code that is a lot tighter than some of my peers.

I'm with you there. I learned C, C++ and assembler while at university, and came out with the ability to jump into anything. Give me any language and I can guarantee I'll be churning out useful code in a VERY short amount of time.

Compare this to my brother, 12 years younger than me who has just completed the same comp.sci course at the same uni, and knows only one language; Java. Things change, not always for the better. I know many courses haven't gone to the dogs as much as that, but many have. I'm not surprised the idea of teaching coders how the computer works is considered 'novel'.

I can see a great benefit for humanity the closer computers move to 'thinking' like people, for people. But that's just not done at the hardware level, it's done higher. The people who can bring that to the world are coders, and as far as I'm concerned thinking in the same way as the hardware works is absolutely essential for comp.sci. Less so for IT. Re:Good idea, Bad Idea (Score:5, Insightful)
by RAMMS+EIN (578166) on Thursday February 05, @07:10PM (#8196217)
(http://www.inglorion.net/)
``Bad Idea: Teaching CS by starting with one of the most cryptic languages around, and then trying to teach basic CS fundamentals.''

I completely disagree. Assembly is actually one of the simplest languages around. There is little syntax, and hardly any magic words that have to be memorized. Assembly makes an excellent tool for learning basic CS fundamentals; you get a very direct feeling for how CPUs work, how data structures can be implemented, and why they behave the way they do. I wouldn't recommend assembly for serious programming, but for getting an understanding of the fundamentals, it's hard to beat. Re:Good idea, Bad Idea (Score:4, Insightful)
by pla (258480) on Thursday February 05, @07:17PM (#8196302)
(Last Journal: Sunday December 08, @05:40PM)
Then, confuse the hell out of a student with assembly

I disagree. Personally, I learned Basic, then x86 asm, then C (then quite a few more, but irrelevant to my point). Although I considered assembly radically different from the Basic I started with, it made the entire concept of "how the hell does that Hello World program actually work?" make a whole lot more sense.

From the complexity aspect, yeah, optimizing your code for a modern CPU takes a hell of a lot of time, effort and research into the behavior of the CPU itself. But to learn the fundamental skill of coding in assembler, I would consider it far less complex than any high-level language. You have a few hundred instructions (of which under a dozen make up 99% of your code). Compare that to C, where you have literally thousands of standard library functions, a good portion of which you need to understand to write any non-trivial program.


There are already problems with people interested in CS getting turned off by intro/intermediate programming classes.

You write that as though you consider it a bad idea...

We have quite enough mediocre high-level hacks (which I don't mean in the good sense, here) flooding the market. If they decide to switch to English or Art History in their first semester, all the better for those of us who can deal with the physical reality of a modern computer. I don't say that as an "elitist" - I fully support those with the mindset to become "good" programmers (hint: If you consider "CS" to have an "S" in it, you've already missed the boat) in their efforts to learn. But it has grown increasingly common for IT-centric companies to have a handful of gods, with dozens or even hundreds of complete wastes-of-budget who those gods need to spend most of their time cleaning up after. We would do better to get rid of the driftwood. Unfortunately, most HR departments consider the highly-paid gods as the driftwood, then wonder why they can't produce anything decent.

Hmm, okay, rant over.
Re:Whatever (Score:2, Insightful)
by jhoger (519683) on Thursday February 05, @07:01PM (#8196088)
(http://hogerhuis.net/)
A lot of software work is at a smaller scale. If 60% or so of software's lifecycle is maintenance, and there's a lot of software out there, and also since many software projects are very small, I'd venture to say that process is almost irrelevant for plenty of work.

Being knowledgeable about low level operation of the machine will take you farther, since you won't have the fear of getting down to the bare metal to figure out a problem. And assembly language is important there... but also things like debuggers, protocol sniffers, etc. Anything that lets you get to the bare metal to figure out a problem will get you to a solution quicker.

Process and modern design concepts are important for large projects and at the architectural level. Great concept. (Score:5, Insightful)
by shaitand (626655) on Thursday February 05, @06:59PM (#8196060)
I started out learning to code in asm on my c64 and I'd have to say it was a very rewarding experience.

Anyone who disagrees with this probably doesn't have much experience coding in assembler to begin with. Asm really is fairly easy, the trick is that most who teach asm actually spend too much time on those computer concepts and not enough time on actual real coding. It's wonderful understanding how the machine works, and necessary to write good assembler but you should start with the 2 pages of understanding that is needed to "get" asm at all.

Then teach language basics and THEN teach about the machine using actual programs (text editor, other simple things) and explaining the reason they are coded the way they are in small chunks. Instead of handing a chart of bios calls and a tutorial on basic assembler, introduce bios calls in actual function in a program, most of them are simple enough that when shown in use they are quite clear and anyone can understand.

After all assembler, pretty much any assembler, is composed of VERY simple pieces, it's understanding how those pieces can be fit together to form a simple construct and how those simple constructs form together to create a simple function and how those simple functions form together to create a simple yet powerful program that teaches someone programming. Learning to program this way keeps things easy, but still yields a wealth of knowledge about the system.

It also means that when you write code for the rest of your life you'll have an understanding of what this and that form of loop do in C (insert language here) and why this one is going to be faster since simply looking at the C (insert language here) concepts doesn't show any benefit to one over the other.


Top Visited
Switchboard
Latest
Past week
Past month

Last modified: March 12, 2019