PDA

View Full Version : End of 32 bit computing?


arcticpenguin
18th November 2003, 01:50 PM
http://www.infoworld.com/article/03/11/18/HNamddays_1.html

AMD predicts it will stop producing 32 bit processors in 2005, as prices of its 64 bit processors drop. Are we really looking at the end of 32 bit desktop computing? I presume 32 bit and smaller processors will continue in embedded devices.

It seems like there will be a lot of devices out there that do not need a 64 bit address space any time soon, and embedded information appliances (PDAs, cell phones, etc) may play a larger role in the future than they do today.

Of course, AMD doesn't have much of hte embedd3ed market, so maybe that's why they don't pay it heed...

a_unique_person
18th November 2003, 03:04 PM
Not quite the end. Intel obviously wants to draw a line between the desktop at 32 bit and servers at 64 bit with it's new line of CPUs. AMD appears to be doing it's best to mess with that proposal. If they don't go broke first.

†= Crap!
18th November 2003, 06:21 PM
Intel has already created the 64 bit Pentium 4 EE (extreme edition)

LuxFerum
19th November 2003, 02:15 AM
Even consoles have more bits than that.
They should go straight to 256 bits.:D

Astroglide
19th November 2003, 02:30 AM
The P4EE is a 32 bit chip. Nice try though.

Peskanov
19th November 2003, 02:40 AM
IMO, sooner or later we will have to adopt parallel computing. It's the only way to break the "double speed every year" routine.
We need massive processing computation capabilities badly, and this need is going only to rise, driven by both entertainment (videogames) and industry (especially in the upcoming medicine revolution).

If you are wondering what relation bear this with 32/64 bits, it's because I see 64 bit processors as bad for the parallel computing world. Using the same power and space, you can easily put 3 or 4 CPU of 32 bits instead of a 64 bits one.
In the other hand, 64 bits makes software engineer's life much more easy, on parallels computers, due to it's address space.

However, I still think that parallel computing is a software problem, not a hardware one. Anything above a 20.000 transistors ARM CPU is a bit of a waste...Okay maybe basic PowerPC because if the FP unit but that's all ;)

In any case, I don't buy this "fatter, more complex" philosophy currently used to force processor to keep the same efficiency at the higher clock rates.
Parallelize or die! :)

BTW, the next Sony and Nintendo consoles will probably feature a parallel processing in small scale, however I don't have info about which kind of CPU cells will be used. Probably 32 bits with FP I guess

egslim
19th November 2003, 05:36 AM
AMD predicts it will stop producing 32 bit processors in 2005, as prices of its 64 bit processors drop. Are we really looking at the end of 32 bit desktop computing? I presume 32 bit and smaller processors will continue in embedded devices.
K8 isn't just a 64 bit CPU, it runs both 32 and 64 bit natively. So AMD is simply saying they will have replaced the entire K7 line with K8 in 2005. They are not saying anything about the software those chips will run.

If you are wondering what relation bear this with 32/64 bits, it's because I see 64 bit processors as bad for the parallel computing world. Using the same power and space, you can easily put 3 or 4 CPU of 32 bits instead of a 64 bits one.
In the other hand, 64 bits makes software engineer's life much more easy, on parallels computers, due to it's address space.
AMD claims that the added 64 bit capabilities use about 5% extra chip-space. The added power requirements are similarly small.
If you have 4 1P 32 bit CPU systems, they can use a total of 8GB physical memory. (4*2GB) Putting them in the same box means your available physical memory is decreased to 2GB total, or 512MB per CPU. With a 64 bit CPU this isn't an issue. For some tasks, clusters just don't work very well.

Peskanov
19th November 2003, 07:49 AM
AMD claims that the added 64 bit capabilities use about 5% extra chip-space. The added power requirements are similarly small.


Interesting; I have not reliable info about, but I observed that in the G5 IBM only uses 2 integer units (unlike older designs like PPC604 which had 3), and I thinked it was due to the extra resources needed for a 64 bits design.

Anyway I am convinced this relation (5%) does not hold for more balanced CPUs like old MIPS, ARM and PPC.
Remenber the quantity of trash a current CPU carry: 0x86 translator, branch prediction, vectorial units, big cache L1, on-chip L2 cache controler, etc...

My arguement is that I would prefer to see hundreds or thousands of simple CPUs on a desktop computer, instead of just 2 or 3 64 bit CPU's.
Take a look for example to the papers related about FLEX RAM, to see one example of a possible cheap and powerful parallel system based on simple 32 bits CPUs.



If you have 4 1P 32 bit CPU systems, they can use a total of 8GB physical memory. (4*2GB) Putting them in the same box means your available physical memory is decreased to 2GB total, or 512MB per CPU. With a 64 bit CPU this isn't an issue. For some tasks, clusters just don't work very well.


And why would you like to have Zillions of bytes of memory, if you don't have processing power to use it on reasonable times?
It's not a bad concept to have a number of processors associated with the quantity of RAM you need or use.
In other words, I see no bad in using 1 CPU for every GB, or much better, 1 CPU for every MB like in the FLEX system.
Of course, fine grain multiprocessing is much of a software problem today.
With Linux and some 64 bit boxes anybody can mount a big multiprocessor. What I see is that it's a bit like victorian technology, very inefficient.

jimlintott
19th November 2003, 08:05 AM
They can have my 32 bit machines when they pry them from my cold dead hands.

:D

egslim
19th November 2003, 12:09 PM
Interesting; I have not reliable info about, but I observed that in the G5 IBM only uses 2 integer units (unlike older designs like PPC604 which had 3), and I thinked it was due to the extra resources needed for a 64 bits design.
G5 is a derivative from the IBM Power4 core, so it's an entirely new design compared to previous Apple CPU's. They didn't just "bolt on" 64 bit support, they designed an entirely new CPU with completely different specs. So you can't compare them as simply a 64 bit CPU vs a 32 bit one.

The problem I have with your view is the fact that many algorithms just can't be threaded very well. If you have lots of dependencies, threading doesn't work. And that even ignores how hard it is to create multithreaded programs in general.

I believe the future for general purpose computing are multi-cored and multithreaded CPU's, with high single thread performance as well. AMD and Intel seem to agree with me, Intel has HT while AMD designed Opteron for multi-core.
The type of CPU's you are advocating are useful for certain types of niche computing, but not much else. Certainly not for desktop applications.

I can understand, though not agree with, your desire for elegant computing solutions. But it's not 64 bit computting that is opposing it, its the impossibility for several algorithms to be multi-threaded.

rockoon
19th November 2003, 05:42 PM
As an experienced programmer let me detail my desire for MORE MEMORY and that MORE MEMORY equals COMPUTATIONAL POWER.

This can be seen in many avenues of software development. Chess engines use endgame lookup tables and the performance boost for these engines is significant. These tables reduce what would amount to hours of computation into a single lookup that can be done a billion timer per second. Parallel computing doesnt even come close to this sort of performance boost.

What it amounts to is with more memory you can store more results of complex calculations. Once they have been calculated you no longer need to calculate them as long as you have someplace to store the results. So even if the table you wish to create takes a year to compute on a single cpu system, it would only have to be done once. It could then be shared with everyone who owns a computer and has a need for this data.

There are for sure certain areas where parallel computing is the only way to go. N-Body algorithms (cosmological simulations and the like) rely heavily on being able to perform a large number of computations that cant be made simpler or turned into a lookup. But these types of problems are relatively rare and the day-to-day end user does not run these sorts of programs. But even here, these algorithms could be made better with more memory. Especialy the Particle-Mesh methods.

Peskanov
20th November 2003, 03:58 AM
G5 is a derivative from the IBM Power4 core, so it's an entirely new design compared to previous Apple CPU's. They didn't just "bolt on" 64 bit support, they designed an entirely new CPU with completely different specs. So you can't compare them as simply a 64 bit CPU vs a 32 bit one.


True, Power4, and it's son Power970 (G5) rely in deep pipelines and also do some microcode (I mean decodify some complex inst. into smaller ones in the first stages of the pipeline).
The target of this arch. is reaching higher clockrates, using simpler pipeline stages.
As more stages a needed, pipeline locks get more expensive in time. This is compensated putting more instructions under process, using more functional units than usual (more pipelines under the issue stage).
In other words, the current design benefits from having more units, unlike previous designs, which got little benefit. In fact the G5 is the first PPC to have 2 FPUs, as far as I know.
Why did they put only 2 integer units (a criticised decision btw)? I have no idea, but my guess is the complexity added by the 64-bit overhead.


The problem I have with your view is the fact that many algorithms just can't be threaded very well. If you have lots of dependencies, threading doesn't work. And that even ignores how hard it is to create multithreaded programs in general.


I agree, as you say explicit threading is an horrible way of doing parallel computing. Horrible for the programmer I mean.
That's the reason I said that paralle computing it's a "software problem", because the metal it's here and it's cheap, but putting it to work it's hard for coders.
IMO the solution passes into using more expresive languages, because C/C++/Java are too low level. A language like SQL, for example, while not being general-purpose, provides a way to tell WHAT you want, but not HOW to do it. This would allow the computer to issue the task in a parallel manner without programmer intervention.
The multiprocessor industry (like Cray) seems to work mostly in compiler technology today, trying to extract the info needed for parallelization from normal C code.
It will be interesting to see which model Sony and Nintendo choose into the upcomming projects.


I believe the future for general purpose computing are multi-cored and multithreaded CPU's, with high single thread performance as well. AMD and Intel seem to agree with me, Intel has HT while AMD designed Opteron for multi-core.
The type of CPU's you are advocating are useful for certain types of niche computing, but not much else. Certainly not for desktop applications.


I don't agree at all. Tell me which kind of CPU intensive application do you use, and I will show you multiprocessing implementations. There are very, very few non-parallelizable algo.
Emulation of other computer comes to mind, maybe calcuting series (calculating digits of PI)...
Please note that if you have a gfx accelerator in your computer, you have in fact a multiprocessor machine. Only that it's not general purpose multiprocessing.


I can understand, though not agree with, your desire for elegant computing solutions. But it's not 64 bit computting that is opposing it, its the impossibility for several algorithms to be multi-threaded.


It's not question of elegance. While I dislike bizarre cpus like x86, it's a question of FLOPs per dollar.
A monster CPU today has more than 40 million transistors, and all they do it's to execute 2 or 3 instruction every cycle, 4 when they are lucky.
Instead of frying eggs over the surface of these monsters, we could own much more efficient devices based on COMA arch., executing hundreds or thousand inst./cycle, for the same price. And simpler CPUs=More units for dollar.

Zep
20th November 2003, 04:20 AM
Scan down through here (http://www.pattosoft.com.au/jason/Articles/HistoryOfComputers/1990s.html), and as you do so, note the technology that was CURRENT when the Digital Alpha arrived...

64 bits? YAWN. Been there, done that, we've got some in our COMPUTER MUSEUM already!

a_unique_person
20th November 2003, 04:37 AM
Originally posted by Peskanov

It's not question of elegance. While I dislike bizarre cpus like x86, it's a question of FLOPs per dollar.
A monster CPU today has more than 40 million transistors, and all they do it's to execute 2 or 3 instruction every cycle, 4 when they are lucky.


The x86 is going 64bit, and hiding deep in the bowels, is a little 8080 trying to get out. It is really a tribute to the importance of software. The HP/Intel 64 bit architecture is really a demonstration of how you just can't design an architecture and then leave the software to others to build as an exercise.

Software is king. Look at how many perfectly good architectures have died. Intel has made billions from selling ugly architectures that have software written for them. The hard part is getting an architecture and software working together. One that does is gold.

Peskanov
20th November 2003, 07:01 AM
Scan down through here, and as you do so, note the technology that was CURRENT when the Digital Alpha arrived...

64 bits? YAWN. Been there, done that, we've got some in our COMPUTER MUSEUM already!


Alpha? That was yesterday, pal.
64 bit designs started in the eighties, I think. VAX maybe? I am not sure.
One of the first Cray supercomputers, the celebrated CDC6600, was a 60 bit design, and was released in 1964.

http://ed-thelen.org/comp-hist/cdc6600.html

The problem is not viability of 64 bit. These things are doable, of course. The question is efficiency.
Are 64 bit CPUs the future, or ill-conceived dinosaurs ready to be discarded by evolution?

jimlintott
20th November 2003, 09:08 AM
The hard part is getting an architecture and software working together. One that does is gold.

Something like NetBSD. (http://www.netbsd.org/Ports/) I was able to boot it on my electric toothbrush. :p

arcticpenguin
20th November 2003, 09:28 AM
VAX was 32 bit.

Peskanov
20th November 2003, 10:39 AM
AUP,


The x86 is going 64bit, and hiding deep in the bowels, is a little 8080 trying to get out. It is really a tribute to the importance of software. The HP/Intel 64 bit architecture is really a demonstration of how you just can't design an architecture and then leave the software to others to build as an exercise.


I strongly disagree. In the eighties, there were easily more than 30 different architectures in the 8/16 bit market, plus dozens of CP/M machines derived from the same hacked code. Creating computers and developing software for them was never a problem; but conquering a market or finding a niche for them was a problem!
Have you seen the state of the world of embedded computing? You can find all flavours of CPUs, compilers, real time monitors and debuggers, assemblers, OSes, etc...Making an architecture "run" correclty is piece of cake today, and was not so hard back then.

Do you think the IBM PC is the standard because it's software "worked", and competence's products did not? You really didn't know the old versions of DOS and Windows, did you?
I can assure you that the solutions provided by the competence were usually of higher quality than those famous microsoft "weekend hacks". That's no secret or even seriously debated.


Software is king. Look at how many perfectly good architectures have died. Intel has made billions from selling ugly architectures that have software written for them. The hard part is getting an architecture and software working together. One that does is gold.


Working architectures are as normal as brands of cars, for example.
Marketing is king. The forces of the market will throw a good product into oblivion easily, and that happens everyday.

I think I am not really understanding what you mean with "getting an architecture and software working together". I am no sure what you mean, but I am quite sure the original IBM PC did not, because at the time of it's triumph, it was poor in all fronts.

Andonyx
20th November 2003, 11:34 AM
Originally posted by LuxFerum
Even consoles have more bits than that.
They should go straight to 256 bits.:D

That's not exactly the case...

In many cases the Graphics chip itself such as the Emotion Engine is processing internally at 256 bits or 128 bits or whatever. However the CPU itself and the data piplines between the different chips is still running at 32 bits.

jj
20th November 2003, 12:03 PM
Originally posted by arcticpenguin
VAX was 32 bit.

How about UNIVAC 1108?

:D

egslim
20th November 2003, 12:54 PM
On a CPU, the ALU's don't take much die-space. This goes for RISC as well. There are schedulers, branchprediction, FPU, caches and more eating way more space. If you claim adding 64 bit support blows up the die size, show us some clear proof first. AMD has already shown it doesn't.

As for algorithms, I'm not a programmer. However, I'm very curious about how you could split the general calculation of a Taylor series. With Tayler you calculate each new term using the previous one.

But anyway, the technology you advocate is still at the very least years away. We need 64 bit much sooner.

As more stages a needed, pipeline locks get more expensive in time. This is compensated putting more instructions under process, using more functional units than usual (more pipelines under the issue stage).
Not correct. The problem you describe is compensated for by using better branchprediction and more and better Out of Order (OoO) execution in general.
Adding more execution units doesn't help much, if any. The problem is to supply them with enough instructions.
Second, G5 is actually wider than its predecessor. The G4e has two, not identical ALU's. G5 has two much more similar ones, so they are more likely to be used at the same time. There is also a third pipe which off-loads certain instructions from the regular ALU's.

edited for typo's

Peskanov
20th November 2003, 03:51 PM
On a CPU, the ALU's don't take much die-space. This goes for RISC as well. There are schedulers, branchprediction, FPU, caches and more eating way more space. If you claim adding 64 bit support blows up the die size, show us some clear proof first. AMD has already shown it doesn't.


I think you are missinterpreting me. I already acknowelded that for a modern CPU this could the case because of big caches, vectorial units, FPUs, etc (you can read the laundry list some posts before this one). My assertion about the G5 are only speculations of mine.
My point is for simpler CPUs, which in my opinion would form a better base for masive parallel computing.
In the case of a good old, clean risc processors, like MIPS R2000 or the original ARM made y Acorn, the used 30.000 & 20.000 transistors respectively (I am recalling, it's not sure).
In these CPUs there were no caches or FPUs, so the processor were basically a short pipeline with and ALU in the middle.
Most of the transistors in these machines were wasted in the register set only. For example, the MIPS sported 32 registers with 32 bits, and I guess there were implemented with flip-flops (anybody corrects me if I am wrong) which means probably that 7.000 transistors were used only for that.
Double this quantity for a 64 bit design with the same number of register.
Fast adders, in the other hand, wastes much more than x2 transistors when doubling the bit width.
I would say that the only part that would remain intact is instruction decode.
The old 60 bits CDC sported 100.000 transistors I think. I know, it's a very old example but I lack info about early 64 bit designs.


As for algorithms, I'm not a programmer. However, I'm very curious about how you could split the general calculation of a Taylor series. With Tayler you calculate each new term using the previous one.


It seems you didn't read my previous reply to you correctly, because I already said that:

There are very, very few non-parallelizable algo.
Emulation of other computer comes to mind, maybe calcuting series


Btw, calculating series is not a common need. It's a very specific scientific need, and even there most of the time lots of different series are requested, and that can be paralellized (and it is most times!).


But anyway, the technology you advocate is still at the very least years away. We need 64 bit much sooner.


Well, by 2005 the next generation consoles will hit the market. Both Sony and Nintendo are working in multiprocessor designs.
Console companies don't have to worry too much about backwards compatibility. It they are succesful, the technology could reach another markets quite soon. This is not uncommon.
I am curious about the cores that form the base of their systems. Some info about "CELL" leaked out, but it was negated by Sony. Also recently some info from IBM was released, and it's quite possible that powerpc's will form the base of Nintendo multiprocessor. The question is: which kind of ppc, a fat one (970) or a small one (603 or 750)?

About what we need, that's easier to know:
We need cheaper, more powerful processing. That includes both memory and processor.
There will always a balance between cost, time to market, legacy code, prejudices from the professionals, etc...So yes, 64 bit processors are the best solutions for now. But I am not sure about long term.


Not correct. The problem you describe is compensated for by using better branchprediction and more and better Out of Order (OoO) execution in general.
Adding more execution units doesn't help much, if any. The problem is to supply them with enough instructions.
Second, G5 is actually wider than its predecessor. The G4e has two, not identical ALU's. G5 has two much more similar ones, so they are more likely to be used at the same time. There is also a third pipe which off-loads certain instructions from the regular ALU's.


My understanding of deep pipelines design (which is quite limited I reckon) is that ideally you want to maximize the quantity of instructions that are "on the fly", so that in case some get locked there will still be flow in another path.
The G5 document from IBM happily announces that a 970 cpu can have more 100 instruction on the fly simultaneously.
About brach prediction and OOO, of course I know, but I don't think that's all the solution for deep pipelines.
I could be wrong of course, I will have to consult my local guru about that.

About G4/G5, the integer units of G3 and G4 were quite similar yes, and only featured 2 integer units; but previous designs oriented to the server market, the 604 and the 620 (which was a 64 bits design), featured 3 units.
In fact I was perplexed about that simplification when G3 hit the road.

Peskanov
21st November 2003, 03:12 PM
FE DE ERRATAS

It seems my memory is quite useless lately. All the numbers I gave before are wrong!

The 60 bits CDC 6600 packaged 400.000 transistors. But as it had multithreading and out of order execution, it can't be labelled as a "pure desing". Not to mention it's age! Probably most of the parts (like adders or multipliers) are better and simpler today.

The old MIPS R2000 had 110.000 transistors. It seems I confused it with the ARM. I don't know how they managed to waste all these transistors because it is a really simple CPU, especially compared to the ARM.

The ARM2 had 30.000 transistors. A really nice design, especially the opcode set which is really powerful and simple at the same time.

And yes, VAX was 32 bits. The oldest 64 bit design a found is the R4000 from 1991. Even 64 bit sparcs were done in the nineties.

Sorry for the confusion ;)

egslim
24th November 2003, 11:17 AM
Btw, calculating series is not a common need. It's a very specific scientific need, and even there most of the time lots of different series are requested, and that can be paralellized (and it is most times!).
Trouble is, for general purpose computing you need to be able to do anything at a reasonable speed. I've just shown you an example of an algorithm that cannot be split. There are more of those, so massive parallel computing simply isn't suited for general purpose computing. It does have its uses for certain specific tasks, however.

Don't forget that you can't just put as many simple cores in one die as you like, there needs to be communication. And I think timings will be a major issue, vastly decreasing clockspeeds.

Probably most of the parts (like adders or multipliers) are better and simpler today.
They probably use more transistors now, to increase clockspeed.

At 90nm Intel can fit 100 miljon transistors on just over 1 square centimeter. Nobody needs to save on transistors, there are plenty of them.

Perhaps you could check out Sun's Niagara (I think), it is pretty much what you are proposing. However, it is only targeted at servers, not suited for workstations.

Paul C. Anagnostopoulos
30th November 2003, 11:28 AM
Thirty-two bits were fun, but I say good riddance. If you can't store the number of pennies in the world economy in one word, what the hell good is it?

Of course, the burning question is: When do I get a machine with a quantum coprocessor?

~~ Paul

xouper
2nd December 2003, 06:03 PM
Why not just skip 64 bit cpus and go straight to 256 bits?

epepke
7th December 2003, 04:16 PM
Originally posted by Peskanov
I strongly disagree. In the eighties, there were easily more than 30 different architectures in the 8/16 bit market, plus dozens of CP/M machines derived from the same hacked code. Creating computers and developing software for them was never a problem; but conquering a market or finding a niche for them was a problem!

Have you seen the state of the world of embedded computing? You can find all flavours of CPUs, compilers, real time monitors and debuggers, assemblers, OSes, etc...Making an architecture "run" correclty is piece of cake today, and was not so hard back then.

Do you think the IBM PC is the standard because it's software "worked", and competence's products did not?

I think you're misinterpreting the word "worked" here. Certainly there were plenty of nice architectures in the 80's. Including SIMD. The Connection Machine software was a wonder to behold. SIMD may be making a comeback, but only because of graphics cards. But most of these great architectures are dead.

80xxx/Pentium/Whatever survived and trounced just about everything else because of the low incremental cost of getting legacy software working on the next release. RISC chips are only hanging on, really, because of Apple.

corplinx
7th December 2003, 08:45 PM
Lessee, tons of 8 and 16 bit chips still in the embedded market. The embedded market is just now moving towards 32 bit.

Someone tell them they will be left behind!

Peskanov
8th December 2003, 09:15 AM
epeke,


I think you're misinterpreting the word "worked" here. Certainly there were plenty of nice architectures in the 80's. Including SIMD. The Connection Machine software was a wonder to behold. SIMD may be making a comeback, but only because of graphics cards. But most of these great architectures are dead.


The fact that nearly all 70's and 80's architectures died it's not a clue about which ones were better or worse, IMO.
You can find defects in all of them, but the IBM PC XT and AT was a compendium of disasters and that was not a problem.


80xxx/Pentium/Whatever survived and trounced just about everything else because of the low incremental cost of getting legacy software working on the next release. RISC chips are only hanging on, really, because of Apple.


I don't buy that. The IBM PC was a success from day one, especially with the XT models. There was no legacy software to use on these little monsters.
In the 80's these machines were expensive for the US people; but, on europe, they had absurd prices. Very very expensive!
However, they sold. CP/M machines were more robust, cheaper, and had better bussines software (I have seen CP/M machines working on the 90's), but it did not matter.
The marketing muscle pertained to IBM, and they used it. I remenber their huge adds in all the big newspapers. Nobody in the bussines could compete in prestige and marketing power. The rest is history.