PDA

View Full Version : Old Computer Architectures


a_unique_person
20th September 2006, 04:59 AM
There was one design that I thought was a really radical design, but before it's time, as the technology of the day would not have been able to do it justice. These days, however, I think it would be perfect.

That is the idea behind the TMS 9900. Instead of screwing around with registers, just do away with them altogether. Registers were really just a manually cache, where you could hold data for working on, rather than having to perform a very slow read/write to memory.

With the 9900, everything is a memory/memory operation. The only register is a program counter and status code. At the time, it must have slowed things down a lot. These days, of virtual registers and cache, you could have what is effectively a 'level 0' cache, that holds the result of the last few operations. Problem solved about registers, (although current technology almost makes them virtual anyway).

MRC_Hans
20th September 2006, 05:12 AM
The main purpose of registers is speed. Registers, being on the CPU chip, running at CPU clock are way faster than memory. And, of course, the control of them is micro-coded, another speed advantage. Declaring the right variables "register" in C can really speed up your program.

Another architecture of the past (I forget its type name) had the stack as CPU registers. That would make for some speedy branching. Of course, this sets a limit to stack depth that will seriously cramp the style of most compilers.

Hans

a_unique_person
20th September 2006, 05:56 AM
That's the whole point, 'register' was a clumsy hack, at best. No registers is the way to go.

MRC_Hans
20th September 2006, 05:58 AM
Anytime external memory gets fast enough, yes.

Hans

a_unique_person
20th September 2006, 07:24 AM
Which is what you can do with the technology available these days. Use registers, but use them as 'virtual' memory (not virtual in the traditional sense of the term.) Registers will always be faster, but add a lot of complexity to compiler optimisation, etc.

Paul C. Anagnostopoulos
20th September 2006, 07:52 AM
Another purpose of registers is to have a small address space that can be encoded in the instruction in only a few bits.

~~ Paul

a_unique_person
21st September 2006, 01:41 AM
That would be where a stack would be useful. All addressing for short term data, on the stack, everything else, the full address.

BobK
21st September 2006, 05:43 AM
I'm no expert, but here's the way I understand it.

Access of data stored in registers on the CPU which operate at full CPU speed is faster than retrieval of that data by way of the bus which operates at a much slower speed. Typically several times faster.

Ducky
21st September 2006, 06:27 AM
hey betamax was pretty good too...:rolleyes:

kevin
21st September 2006, 09:22 AM
It's been a long time since i looked at assembler (6502 Assembly baby!) but I thought a lot of instructions had particular registers implicit in the instruction. Something like add register A to register B. With this wouldn't you need to specify the memory locations?

I guess you could go the way of Forth where there are no variables, just the stack (virtual in the case of this processor) and have assembler instructions that work on the top X items of the stack.

Almo
21st September 2006, 09:30 AM
6502! Boo-ya!

rockoon
21st September 2006, 01:14 PM
Modern L0 caches *are* essentialy registers in that L0 cache memory performs no worse than registers in regards to fetching and storing machine words.

Paul C. Anagnostopoulos has hit the correct issue on the head.

Encoding a reference to (for instance) 1 of 16 registers requires 4 bits, so an instruction that references two registers requires 2*4 = 8 bits of overhead for the referencing.

Encoding a reference to (for instance) 1 of 4294967296 memory locations (4 gigabytes) requires 32 bits, so an instruction that references two memory locations requires 2*32 = 64 bits of overhead for referencing.

This 'overhead' comes into play in several ways - Program code would simply be larger if everything was treated as general purpose memory, and this effects the efficiency of the instruction decode pipeline when the decode pipeline needs to be flushed (due to a branch misprediction) and has to start from scratch.

snooziums
21st September 2006, 01:29 PM
hey betamax was pretty good too.

Actually, it was better quality. However, the tapes has a shorter recording time. But, the thing that killed it off was that Sony refused to license Beta without licensing fees. If Sony never imposed those fees, VHS would have never been created in response.

(6502 Assembly baby!) ... Something like add register A to register B.

Actually, the 6502 had only one general propose register, register "A." However, one could use the Index X and Index Y registers as other registers (giving up some memory addressing modes).

6502! Boo-ya!

Actually, the 6502 was one of the most popular processors ever made. But the must popular device that used it... the Nintendo video game system.

But back to the subject, registers are still much faster than memory. Internal registers operate at the CPU speed, where as memory just about never does.

Zombified
21st September 2006, 02:09 PM
Even ignoring video game systems, the 6502 was probably the most popular 8-bit processer in personal computers; it appeared in Apples, Commodores, Ataris and a bunch of lesser known systems. In some cases minor variations were used such as the 6510 in the Commodore 64, but that's a trivial variation.

The Z80 would give it a good run for its money, though, appearing in almost all the remaining systems: TRS-80s and most CP/M machine, although the latter tended to be business computers. The original 8080 and variants were used a lot less than the Z80.

A Z80 variant was used in some Nintendo systems such as the original Gameboy, so I'm not sure that if you include game systems the 6502 necessarily comes out on top, though.

Arguably, the 6502 treated memory partly as a register array, as a number of direct addressing modes used one-byte addresses for the lowest 256 bytes of memory.

Some microcontrollers like the 8051 family do something similar, with a 64KB address space where the first 256 bytes have special significance, and in the case of the 8051 those bytes of memory are implemented in the processor and don't require an external buss access.

Paul C. Anagnostopoulos
21st September 2006, 03:43 PM
Holy cow! Look what I found today:

http://www.bitsavers.org/pdf/Index.txt

~~ Paul

Zombified
21st September 2006, 04:02 PM
Holy cow! Look what I found today:

http://www.bitsavers.org/pdf/Index.txt

~~ Paul
That's awesome.

The fact that its an unhotlinked text file, requiring one to type the filename into one's browser, is the oldschool icing on the cake. ;)

Paul C. Anagnostopoulos
21st September 2006, 07:16 PM
Yes, I liked that, too.

I noticed a decided lack of GE computer manuals.

~~ Paul

Ducky
21st September 2006, 07:25 PM
Actually, it was better quality. However, the tapes has a shorter recording time. But, the thing that killed it off was that Sony refused to license Beta without licensing fees. If Sony never imposed those fees, VHS would have never been created in response.



Yes I am aware of this.

There was a subtle comment on the current discussion in that joke that I think you may have missed.

a_unique_person
21st September 2006, 08:00 PM
Modern L0 caches *are* essentialy registers in that L0 cache memory performs no worse than registers in regards to fetching and storing machine words.

Paul C. Anagnostopoulos has hit the correct issue on the head.

Encoding a reference to (for instance) 1 of 16 registers requires 4 bits, so an instruction that references two registers requires 2*4 = 8 bits of overhead for the referencing.

Encoding a reference to (for instance) 1 of 4294967296 memory locations (4 gigabytes) requires 32 bits, so an instruction that references two memory locations requires 2*32 = 64 bits of overhead for referencing.

This 'overhead' comes into play in several ways - Program code would simply be larger if everything was treated as general purpose memory, and this effects the efficiency of the instruction decode pipeline when the decode pipeline needs to be flushed (due to a branch misprediction) and has to start from scratch.

Registers are only used for short term memory addressing, hence the use of a stack for any quick memory addressing. Might work.

rockoon
21st September 2006, 09:46 PM
Registers are only used for short term memory addressing, hence the use of a stack for any quick memory addressing. Might work.

Of course, the top of the stack on desktops (intel/amd) is almost always in the L0 cache and on those rare times that it isnt, it is almost always in the L1/2 cache.

Further, the stack already has special addressing modes as a displacement from the current stack pointer as an 8-bit, 16-bit, or 32-bit signed value. Also, a special addressing mode dealing with a SECOND (implicit) displacement (the EBP register) from the stack pointer (the ESP register) is there.

And the most important point is that on the x86 derivatives, the stack has absolutely no implicit size. Its up to the software to decide that sort of thing (ie, the be-all-end-all generic solution)

off the top of my head, the current x86 has these basic addressing schemes:

[REGISTER]
[ABSOLUTE ADDRESS]

[REGISTER + REGISTER]
[REGISTER + 8-BIT OFFSET]
[REGISTER + 16-BIT OFFSET]
[REGISTER + 32-BIT OFFSET]

[REGISTER + REGISTER + 8-BIT OFFSET]
[REGISTER + REGISTER + 16-BIT OFFSET]
[REGISTER + REGISTER + 32-BIT OFFSET]

[REGISTER * 3-BIT SCALER]
[REGISTER * 3-BIT SCALER + REGISTER]
[REGISTER * 3-BIT SCALER + 8-BIT OFFSET]
[REGISTER * 3-BIT SCALER + 16-BIT OFFSET]
[REGISTER * 3-BIT SCALER + 32-BIT OFFSET]

[STACK POINTER]
[STACK POINTER + 8-BIT OFFSET]
[STACK POINTER + 16-BIT OFFSET]
[STACK POINTER + 32-BIT OFFSET]

[STACK POINTER + BASE POINTER]
[STACK POINTER + BASE POINTER + 8-BIT OFFSET]
[STACK POINTER + BASE POINTER + 16-BIT OFFSET]
[STACK POINTER + BASE POINTER + 32-BIT OFFSET]

I might have added a few that don't really exist (for example, in a few cases there might be a full prefix byte before the instruction to select 16-bit when in 32-bit mode .. I can't remember if that applies to addressing or not) .. also I might have left out a few (I sort of have a life)

And I know I left out the fact that all of these can be prefixed by a segment/selector override (most instructions presume the DS segment/selector, while ES, FS, and GS, and SS selectors are also an option with the proper override prefix)

I dont see how a registerless system can mimick all these addressing modes without A) using more than one instruction to do what currently can be done with one instruction, or B) being just as complex as the current system in which case its just a transformation of the current situation. :eye-poppi

Edited to add:

On top of it all, under the hood a modern CISC processor is really a RISC processor. The transformation level is virtualy 'free' in terms of performance because the transformations take place on seperate circuitry in parallel while previously transformed instructions are being executed. The only time it matters is when the instruction pipeline has to be flushed, and in those cases the penalty is already so large (think of an 8-man bucket brigade) that the extra clock cycle (using 8 instead of 7 men) to do the transform is only a minor issue. Idealy you never flush the instruction pipeline and thats the real solution.

Rob Lister
21st September 2006, 10:16 PM
Gosh, this thread brought back memories.

My first machine language was 8088. I liked it (I used to be weird that way) but I always thought it sad that there were not more registers with which to work.

From (very long term) memory, which oft fails

Code segment, Data Segment, Extra Segment, ...there's another I forget
AX, BX, CX (doubled as a counter) and DX. (I thought an additional four would have made my life eaiser)
Oh, and the Instruction Pointer

I know I'm forgetting several. It was fun as I remember it. I never got very good at it but I had a good time trying.

a_unique_person
21st September 2006, 10:40 PM
Of course, the top of the stack on desktops (intel/amd) is almost always in the L0 cache and on those rare times that it isnt, it is almost always in the L1/2 cache.

Further, the stack already has special addressing modes as a displacement from the current stack pointer as an 8-bit, 16-bit, or 32-bit signed value. Also, a special addressing mode dealing with a SECOND (implicit) displacement (the EBP register) from the stack pointer (the ESP register) is there.

And the most important point is that on the x86 derivatives, the stack has absolutely no implicit size. Its up to the software to decide that sort of thing (ie, the be-all-end-all generic solution)

off the top of my head, the current x86 has these basic addressing schemes:

....
And I know I left out the fact that all of these can be prefixed by a segment/selector override (most instructions presume the DS segment/selector, while ES, FS, and GS, and SS selectors are also an option with the proper override prefix)

I dont see how a registerless system can mimick all these addressing modes without A) using more than one instruction to do what currently can be done with one instruction, or B) being just as complex as the current system in which case its just a transformation of the current situation. :eye-poppi

Edited to add:

On top of it all, under the hood a modern CISC processor is really a RISC processor. The transformation level is virtualy 'free' in terms of performance because the transformations take place on seperate circuitry in parallel while previously transformed instructions are being executed. The only time it matters is when the instruction pipeline has to be flushed, and in those cases the penalty is already so large (think of an 8-man bucket brigade) that the extra clock cycle (using 8 instead of 7 men) to do the transform is only a minor issue. Idealy you never flush the instruction pipeline and thats the real solution.

The big RISC thing was that all addressing was just direct to memory, as the many addressing modes of an architecture like the x86 and 68xxx are just not used often enough to make the overhead of having them and the chip real estate worthwhile. RISC then went to the other extreme, of having lots of registers, which were just as hard to manage as redundant addressing modes. The register free architecture would actually registers, or 0 level cache, but these would be purely the top of stack, say. Addressing to the stack could be made quite flexible, with offsets hard coded in the instruction, say, 8bits, which would give you quick access to the top 256 words of stack, using 256 registers.

Zombified
21st September 2006, 10:51 PM
Gosh, this thread brought back memories.

My first machine language was 8088. I liked it (I used to be weird that way) but I always thought it sad that there were not more registers with which to work.

From (very long term) memory, which oft fails

Code segment, Data Segment, Extra Segment, ...there's another I forget
AX, BX, CX (doubled as a counter) and DX. (I thought an additional four would have made my life eaiser)
Oh, and the Instruction Pointer

I know I'm forgetting several. It was fun as I remember it. I never got very good at it but I had a good time trying.
There were also SS (stack segment), SP (stack pointer), BP (base pointer), SI (source index) and DI (destination index).

AX-DX were 16-bit compositions of 8-bit pairs (e.g. AH & AL), the others were 16-bit only.

In compiler generated code and some assembly-language, BP usually pointed to a function's call frame, so regardless of how much gunk you pushed on the stack, you could always find parameters and local variables relative to BP. Also tremendously useful for debugging when trying to figure out where the hell you were. SI and DI were used for indirect addressing modes in the data segment (as was BX; BP defaulted to the stack segment) as well as "string" operations like movs and scas.

I did a lot of x86 assembly language, and it's stuck in there for good unless I get some serious brain damage.

a_unique_person
22nd September 2006, 12:48 AM
I read Adam Osbornes books on micro-processor architecture from cover to cover, and the only one that stood out in my mind as being insanely stupid was the 8086. Just about everything else had some sort of logic or elegance to it. 68000, TMS9900, PDP-11, 8080, 6502, etc.

rockoon
22nd September 2006, 03:00 AM
The register free architecture would actually registers, or 0 level cache, but these would be purely the top of stack, say. Addressing to the stack could be made quite flexible, with offsets hard coded in the instruction, say, 8bits, which would give you quick access to the top 256 words of stack, using 256 registers.

I disagree that an 8-bit offset should give access to the top 256 machine words because thats too restrictive and it is a built-in backward compatability bomb. It should give access to the top 256 items of the fundamental memory unit which is the machine byte (8-bit on most processors, although there are exceptions such as the PDP)

Machine words have changed size three times (8 -> 16 -> 32 -> 64) in the last 30 years and there is no reason to believe that this trend wont continue.

Now consider this situation:

The processor, due to its deep pipeline, could have 7 or 8 instructions "in transit" all somewhere between being fetched to being finalized.

Now imagine that one of those instructions makes a change to the stack pointer!!

Is your head spinning yet?

Edited to add:

Intel is pushing hard to move away from the stack-based FPU on its machines, recommending that compilers use the SSE registers and instructions for floating point work (SSE isnt just a SIMD extension)

a_unique_person
22nd September 2006, 05:18 AM
I was basing the 8 bit idea on the number of registers in RISC type computers.

As AMD says, and Intel now agrees, wide, not deep.

My head has been spinning ever since I started reading Adam Osborne's books.

Zombified
22nd September 2006, 02:24 PM
Although 8 and 16 bits are pretty small for integer sizes, the main motivation to go 16->32->64 bits is pointer size, not integer size. 4GB of memory is too small for some applications, as was 64KB (or various other hacks) in previous generations.

Paul C. Anagnostopoulos
22nd September 2006, 05:49 PM
Although 64 bits are nice when you want to represent an arbitrary amount of money.

~~ Paul

rockoon
22nd September 2006, 09:30 PM
Although 8 and 16 bits are pretty small for integer sizes, the main motivation to go 16->32->64 bits is pointer size, not integer size. 4GB of memory is too small for some applications, as was 64KB (or various other hacks) in previous generations.

There are plenty of other advantages to 64-bit integers, and by my estimation those advantages are more widely leveraged than addressing beyond 4GB. For example, one 64-bit integer can hold eight 8-bit integers and these can be manipulated in parallel (similar to SIMD, except not explicit.) On top of that, modern machines can execute multiple integer instructions per cycle so you can actualy manipulate sixteen or even twenty-four 8-bit integers simultaneously. Completely blows MMX out of the water.

Now jump up to 128-bit integers... 32 to 48 bytes processed per clock cycle! I drool for that day and I suspect so do many low level programmers.