Intel 4004 Microprocessor Full Circuit Diagram and Schematic Analysis

If reconstructing or analyzing the architecture of the first commercially available CPU interests you, begin by sourcing the original logic schematics from archived datasheets–specifically the 1971 “MP944/MCS-4 Manual.” Focus on Sheet 1, which details the arithmetic logic unit (ALU) and the single-bit serial adder. Note the use of four-bit parallel adders with carry-lookahead logic, a precursor to modern ripple-carry designs. Verify the connections for the 16 pins labeled I/O, as these interface directly with the ROM and RAM blocks.

Tracing the instruction decoder requires attention to the 10-micron PMOS layout. The decoder splits into a 4-bit opcode and a 3-bit operand field, with an additional clock phase controlling execution. Isolate the bus control lines first–these regulate data flow between registers. Use a multimeter to confirm continuity on the D0–D7 data lines, as oxide degradation in older schematics often corrupts these signals.

For emulation or reverse-engineering purposes, cross-reference the internal timing diagrams with the instruction set reference. The φ1 and φ2 clock pulses govern fetch-execute cycles; their edges trigger latch operations. If constructing a physical replica, prioritize sourcing 2N7000 MOSFETs for the output drivers–they approximate the original depletion-mode devices without requiring obsolete parts.

Power distribution reveals critical insights: the core operates on -10V VDD and +5V VGG, with substrate bias set to -15V. Measure these voltages first; deviations beyond ±0.3V indicate substrate leakage, common in aged samples. Ground references must connect to a star topology to prevent latch-up in the parasitic bipolar structures inherent to the fabrication process.

Analyzing the Foundational Microprocessor Blueprint

Begin reverse-engineering the layout by tracing power rails: VDD (+5V) and VSS (ground) appear on opposing edges of the silicon die. Use a multimeter in continuity mode to confirm connections, noting that metal layers often obscure buried diffusion paths. The original schematics disclose a 16-pin dual-in-line package, so map each pin to its functional block immediately–pins 1–4 handle address outputs, 5–8 data I/O, and 9–12 clock/control signals.

Identify the arithmetic logic unit (ALU) by locating its carry-lookahead logic near the center of the die. This section contains 120 series-connected transistors forming the 4-bit parallel adder. Measure transistor gate lengths–typically 8–10 micrometers–to distinguish enhancement-mode PMOS devices (Vth ≈ -3V) from depletion-mode loads. Cross-reference findings with the Known Good Die (KGD) photomicrographs from the 1971 MCS-4 documentation to verify layer stacking order: field oxide, polysilicon gates, then aluminum interconnects.

Extract clock generation circuitry by tracking the φ1 and φ2 signals. These non-overlapping phases drive the dynamic logic, requiring a two-phase clock with precisely 650 ns period (1.54 MHz). Construct a test setup with a pulse generator outputting ±5V square waves at 0° and 180° phase shift–any deviation beyond 50 ns skew risks race conditions in the 17,280-transistor design.

Critical Signal Paths and Debugging

Signal Pin Expected Impedance Voltage Levels (V) Debug Procedure
SYNC 9 2.1 kΩ 0 to +5 Check φ1 alignment; verify ROM address latch enable
CM-ROM 10 1.8 kΩ +5 (active low) Scope pin 10 during instruction fetch cycle; pulse should be 400 ns wide
WRITE 11 2.4 kΩ +2.5 (tri-state) Confirm pull-up resistor; measure current sink (≤ 2 mA)
RESET 12 3.3 kΩ 0 (active high) Hold low for ≥ 8 clock cycles to clear registers

Decode instruction execution by monitoring the PLA (programmable logic array) at die coordinates X: 80 μm, Y: 110 μm. This matrix implements 16 opcodes, each mapped to a unique 4-bit microcode. Use a logic analyzer to capture bus cycles when executing instructions like JCN (jump conditional) or FIM (fetch immediate)–these reveal timing dependencies between the ALU and register file.

Validate memory access by probing the onboard ROM addresses. The device contains 2,048 bits of mask-programmed ROM arranged as 256×8. During read operations, expect pin 3 (A0) to toggle within 20 ns of φ2’s rising edge. If delays exceed 30 ns, suspect metal migration in the 1 μm-wide interconnects or threshold voltage drift in PMOS pass transistors–common failure modes after 40+ years of thermal stress.

Recreating the Schematic for Modern Tools

Convert the analog behavior into a SPICE netlist by modeling each PMOS transistor with LEVEL=1 parameters: VTO=-3 KP=3.5E-6 LAMBDA=0.02. Group functional blocks into subcircuits–example for the accumulator (ACC) below:

.SUBCKT ACC D_IN D_OUT PHI1 PHI2 VDD VSS
M1  A   D_IN  2   VSS  PMOS L=10u W=20u
M2  2   PHI1  VDD VDD  PMOS L=8u  W=12u
M3  D_OUT 2   PHI2 VDD PMOS L=8u  W=12u
.ENDS

Simulate power dissipation using worst-case conditions: Tj=125°C, VDD=5.25V. Dynamic current peaks at 85 mA during register transfers, while static leakage remains below 15 µA. For prototyping, substitute 2N7000 NMOS FETs if original PMOS devices are unavailable–adjust gate drive voltages to maintain compatible logic thresholds (VIH ≥ 3.5V, VIL ≤ 1.0V).

Key Components Layout in the Pioneer Single-Chip CPU

Study the central logic block first–it integrates 16 four-bit registers with an arithmetic unit via a shared 4-line bus. Each register pair connects directly to the ALU through dual-channel multiplexers, minimizing signal degradation. The layout prioritizes compact routing: vertical metal traces for address paths and horizontal polysilicon lines for data flow, reducing cross-talk by 40% compared to random wiring.

Examine the program counter (PC) and stack configuration–the PC occupies a dedicated 12-bit register cluster, while the stack uses three 12-bit memory locations. This arrangement allows recursive subroutines up to three levels deep without external support. The PC increments via a carry-lookahead adder, cutting propagation delay to 180 ns per cycle. Adjacent timing logic coordinates fetch-decode-execute phases through a singular clock pulse splitter.

Observe the instruction decoder’s spatial hierarchy: a 3-stage NOR-ROM array decodes 46 operation codes. Each stage resolves bit patterns sequentially, reducing silicon footprint by 30% over parallel decoders. The decoder’s output feeds into control lines that branch into four functional units–arithmetic, logic, memory, and I/O–each isolated by oxide barriers to prevent latch-up.

Focus on the I/O interface: four bidirectional 4-bit ports share a single 5V bus with tri-state buffers. External signals pass through edge-triggered latches synchronized to the instruction cycle’s final phase. Ground pins separate analog and digital domains, limiting noise susceptibility to under 5 mV peak-to-peak. Each port includes an internal pull-up resistor, eliminating external components for logic-high inputs.

Trace the clock distribution network–a single-phase input divides into four sub-clocks via complementary metal-oxide-semiconductor inverters. Skew between sub-clocks remains under 20 ns, critical for maintaining register synchronization. The primary oscillator feeds a frequency divider chain producing 740 kHz, optimized for thermal stability across -40°C to +85°C operational range.

Map the ROM layout: 2,048 bits of mask-programmed storage arranged in 16 columns of 16 words. Word lines run perpendicular to bit lines, enabling simultaneous 8-bit access during fetch operations. Decoupling capacitors flank the ROM core, smoothing supply voltage dips during peak current draws up to 120 mA.

Inspect the power grid–ground rails encircle the die perimeter, while VCC traces form a mesh over the active regions. This topology ensures IR drop under 70 mV even during worst-case switching scenarios. Substrate contacts are placed at 50 µm intervals, preventing latch-up under transient conditions exceeding 4× nominal voltage.

Step-by-Step Signal Path Analysis of the Pioneering Microprocessor

Begin by isolating the instruction fetch cycle: trace the 4-bit bus from the program counter to the address latch, noting how the 12-bit address splits into three 4-bit segments for ROM access. Each segment passes through a dedicated inverter bank (e.g., transistors Q1–Q4 on sheet 5) to sharpen edges before entering the ROM matrix. The decoded output–stored in 16×8 NOR gates–feeds directly into the instruction register, where the opcode’s first half triggers immediate control signals. Verify that VCC remains stable at 12V; deviations above 13V (or below 11.5V) distort voltage swing in the PMOS load transistors, causing false state retention in the accumulator.

For arithmetic operations, focus on the adder/subtractor unit: static transfer gates (M1–M4 on sheet 3) route the operands from the temporary register and accumulator into the 4-bit parallel adder. Carry propagation occurs via dynamic logic–watch for the carry-in transistor chain (Q7–Q10) where stray capacitance (>0.3pF) slows propagation delays beyond 2μs. Probe the COUT pin with a 10MHz logic analyzer to confirm the carry-out pulse aligns with the φ2 clock phase (max 20% duty cycle). Misalignment here cascades into erroneous results in the next instruction cycle. Use a 1kΩ pull-down resistor on unused address lines to prevent floating gates from corrupting RAM read/write ops during multi-cycle instructions.