A one page CPU in SpinalHDL

FPGA’s are fun and interesting devices, sadly the usual ways of programming them, Verilog and VHDL, are tedious and annoying to work with. Sadly most FPGA tools support only these languages. Luckily a few projects exist to make a nicer and more fun language to work with, which are translated to either Verilog or VHDL so they can be used with all normal FPGA tools. One of these languages is SpinalHDL, a Scala based language that promises more abstraction, less boilerplate code and also has a good amount of already existing libraries. The only downside is the long name, SpinalHDL, so I’ll short it to Spinal for now :

To try out and learn Spinal a bit, I decided to make a CPU, an always popular project for FPGA’s. To make it a bit more interesting, I wanted to make a CPU which code fits on a single page of paper, inspired by the fun CPU’s made by Revaldinho, which can be found here. Those CPU’s also have the added rule of having a emulator and assembler also fit in a single page of paper, which I choose not to stick to.

Continue Reading

Getting an IBM AS/400 Midrange computer on the internet

Recently I’ve gotten a hold of an old IBM mid-range computer, an AS/400 150.
This is an 1997 server very much aimed at businesses, pay-rolling, inventory management and such. It can be used as a multi user system, with users logging in via a terminal.
The operating system it runs is OS/400 and that is also the only OS it can run, no Linux available for this system.
Of course it comes with all the fun programming languages like COBOL and RPG, all the business classics :)
It’s compatible with the IBM system/36, so any programs made for an 80’s S/36 machine run without problems on the AS/400 machines. It also looks very much 90s, though I personally like the cover at the back, hiding all ports.

Continue Reading

Hello World on the Lichee Tang RISC-V/FPGA board

A Chinese company recently launched a small dev board with some interesting features, it’s one of the very few boards with a RISC-V CPU, like the SiFive HiFive1, at least, I though. Unlike the HiFive1 board, the Lichee Tang doesn’t use a custom RISC-V microcontroller, instead it has a FPGA IC with a RISC-V core pre-loaded. This also means it’s possible to use it as an FPGA board, add custom things in the FPGA next to the CPU and so on. And this is just where the interesting tidbits start. Sadly, the documentation currently is mostly in Chinese. So let’s change that and make a hello world project. Meaning, get a toolchain up and running, recompile the FPGA project and load it all on the dev board. Don’t worry, I also created a virtual machine with all the work done already.

Continue Reading


Using GAls in the 21st century

A long time ago, 7400 logic was all that was available, but having only 7400 logic means designs can get big and difficult, but making a custom logic IC was expensive and hardly a possibility. A few companies started making programmable logic devices like the Programmable Logic Array (PLA) and Programmable Array Logic (PAL). These devices started being available in the 1970’s and where generally one time programmable devices, with the capability to contain a handful of simple logic. Much later, in 1985, the Generic Array Logic (GAL) came into existence, a reprogrammable version of the PAL. Later on the CPLD’s and FPGA’s came and are still used today.When reading about this I discovered there are still GAL’s made and sold, and getting old ones from Ebay is easy enough. They looked like weird odd devices, so I decided to have a closer look, eventually making a dual BCD to 7 segment decoder with them that even handled multiplexing.

Continue Reading


Programming and debugging the STM8S microcontrollers using open source tools

ST makes the very popular STM32 line of ARM Cortex M microcontrollers, which have a good amount of hobby and professional users, but ST also makes a lesser known family of microcontrollers, at least, lesser known except in China. The STM8S series are 8 bit microcontrollers with their own CPU core and peripherals that are fairly similar to the ones on STM32 devices. And they are used on a lot of cheap Chinese gadgets and DIY boards. Slowly some hobbyist interest also is starting to form and there is an Arduino like library available and good support in the interesting Forth language. Sadly, an C IDE with debugging support, like the STM32 system workbench is not available, at least, not for Linux. There is ST Visual Develop but it’s a bit crusty and Windows only.

The STM8 devices are cheap, really really cheap. But apart from being very cheap and in a lot of Chinese gadgets, for hobby use just grabbing an Arduino, STM32 or other mode popular board is a lot easier to get running and to get support. But if you want to try a new architecture, hack some Chinese gadget or need a microcontroller that is less then 0.25usd in bulk. please read on.

I already made a few makefile projects for the STM8S, but that still lacks debugging, so time to do it a bit better. And why not make everything work in Eclipse while we’re at it. This is all tested on Debian Stretch, I imagine it will work on any Debian based distro like Ubuntu or Mint and without too much changes on any other distro, but your mileage may vary.

As the SDCC compiler is unable to optimize unused functions, the makefile splits every C file from the peripheral drivers into a separate file and then compiles a library. This way the code size remains small. The downside is that stepping into a function like GPIO_WriteReverse does not work when debugging. As these drivers are provided by ST and probably tested, I don’t find this a problem, but good to keep this in mind.

Look at that fancy debugging support :D

Continue Reading


StupidCPU, the RAM

a.k.a. the most energy inefficient memory possible.
A computer needs RAM to function, the StupidCPU is no different.
I decided to design the RAM first as it’s a relative straitforward element and the first board also means designing an interface to hookup multiple boars together.
And multiple boards it will be. The design idea is to make everything on 10*10cm PCB’s as those are the cheapest to order from China and one single boards means more things to go wrong.
A backplane like structure will be used with multiple boards plugging in to the backplane to form a computer.
This blog post will discribe the backplane interface and the first board, the RAM board.

Continue Reading


Using FPGA’s for audio processing

FPGA’s are cool, Digital Signal Processing is cool and audio is a nice way to show it.

To get a bit better at working with FPGA’s and see if I remembered anything from DSP classes I started working on a project to combine those two. I had a board laying around with an I2S ADC/DAC that doesn’t need any configuration so the plan was to read in the audio data, process it and spit it back out. For the processing I choose to use the digital biquad filter, a fairly simple filter that can be used for a multitude of purposes. It can be used as a lowpass, highpass, bandpass, notch filter and even more. In the end I the result is that the FPGA reads in the audio data, can apply a maximum of 11 biquad filters (more are possible) and spits it back out. A computer program is made to calculate the filters:

A RIAA filter, using 3 of the 11 possible biquad filters.

Continue Reading


More ECL logic ramblings

In the previous post I talked about the quirks and weirdness of ECL gates, Now it’s time to design some more logic with it.

With ECL it is easy to make a NOR/OR gate or D latch, making an AND/NAND gate is a lot more difficult, and usually those are made up from a few NOR/OR gates. This makes logic design a bit interesting, as everything has to be made using NOR/OR gates. As an added bonus, all signals are differential and all gates have an differential output. As an example, the classic 2 input MUX is usually designed like this:

Now lets turn it into an ECL version, using as few gates as possible. The first thing to know is that every AND can be turned into an OR with an inverter added to all in and outputs. Vice versa, an OR can be turned into an AND with inverters at all in and outputs. Optimizing designs like this is called bubble pushing, as it adds inverters, which are indicated by a circle, or bubble, on all in/outputs. So let’s get rid of the ANDs first.

Time to get rid of as many inverters, or bubbles, as possible. First of all, the NOT gate and bubble on the input it’s going to is unnecessary and can be removed. As ECL has differential signals, there also is an inverted version of S available, removing the bubble on the top NOR. The schematic now looks like this:

Just 2 bubbles left on the inputs for A and B. Luckily, these are also easy to solve. Removing both the bubbles would invert the output of each NOR, in term inverting the output of the MUX. But with ECL, every gate has an inverted output anyways, solving the problem. The final schematic looks like this:

There we go, an NOR based inverter, using just 3 gates, assuming a differential signal is used for the S input. For completion, a 4 input MUX would like like this:

 

A D FLIP-FLOP

Well, that’s a MUX solved, but flip-flops are also very handy. Luckily turning an D latch into a D flip-flop is simple, just use two and an inverter, or with ECL latches, turn the differential clock signal around.


ECL logic ramblings

ECL logic, or emitter coupled logic is a family of very high speed logic. with ECL logic, a transistor is never completely on or off, meaning the transistor is never saturated, making it very fast. It also makes it weird and quirky, interesting enough to have a look at. First of all, ECL logic generally works on -5V instead of 5V, though later it was also available using a more normal 5V power. Second of all, the logic level is not 0V and -5V, but roughly -0.8V and -1.6V. Third, most ECL logic has 2 outputs, a normal and an inverting one. This can be used to make logic design easier or to have an differential pair output for noise immunity. ECL was used in multiple well known designs, for example the Cray-1 supercomputer used ECL.

All in all, interesting devices. The easiest way to experiment with them is by buying some logic devices. ON semiconductors still makes them, the NC10EP01 for example is a single gate OR/NOR that can be used with 5V or -5V as power supply. It’s a very fast device, reaching over 3Ghz switching speeds. The datasheet can be viewed here.

But where is the fun in that, it’s much more fun to make a few gates. The schematic for a NOR gate is easy to find, it’s even listed on the Wikipedia page about ECL. I changed the schematic a bit to make a 4 input NOR. I also made an SR latch based on the design found here. The schematic for the NOR gate is:

Continue Reading


The StupidCPU, an 8 BIT 7400 computer, part 1, the architecture.

Another day, another silly DIY computer.
Making a computer from scratch is a popular project, whether it’s done with an FPGA, an older CPU like the 6502, discrete logic like the 7400 series or even with just a big bag of transistors, so time to give it a go as well.
A good couple of years ago I designed a very simple ALU as a school project, but never did anything with it, so time to change it.
As it’s a horribly inefficient 8 bit CPU, I’ll call it the StupidCPU.

The StupidCPU is an 8 bit CPU, with 16 bit data buses, one data bus for executable code and one for RAM, making it a Harvard architecture CPU.
All peripherals are memory mapped as RAM.
As an 8 bit CPU has, well, 8 bits, some trickery is needed for having 16 bit buses. the RAM is accessed using bank switching and the program counter is 16 bits for accessing the executable code.
Every instruction is 12 bits wide and there is a maximum of 16 instructions, they all follow the following format:

With a maximum of 16 instructions, 4 bits are needed to select the instruction. The other 8 form an immediate value or an address for the RAM.
To keep everything simple, the CPU has 1 usable register, called the ACC register. As it can directly read and write to 256 RAM bytes, those are used as a kind of registers in the way a normal RISC CPU uses registers.
The StupidCPU also has 2 status bits, carry and zero. Carry is high when an overflow occurs and zero when the ACC register is zero.
The high level architecture looks like this:

Instructions:

At the moment the instruction set is as follows:

NOR:
The NOR instruction executes a logic NOR on the ACC and the RAM value at the selected address, the result is stored in ACC.

NORI:
The NORI instruction executes a logic NOR on the ACC and the immediate value, the result is stored in ACC.

AND:
The AND instruction executes a logic AND on the ACC and the RAM value at the selected address, the result is stored in ACC.

ANDI:
The ANDI instruction executes a logic AND on the ACC and the immediate value, the result is stored in ACC.

ADD:
The ADD instruction adds ACC and the RAM value at the selected address, the result is stored in ACC.

ADDI:
The ADDI instruction adds ACC and the immediate value, the result is stored in ACC.

SHR
Logical shift ACC right 1 bit, store the result in ACC.

SHL
Logical shift ACC left 1 bit, store the result in ACC.

STA:
Store the ACC into RAM at the selected address.

JCC:
If carry is 0 or zero is 1, set the program counters lowest 8 bits to the immediate value.

JPCC:
If carry is 0 or zero is 1, set the program counter lowest 8 bits to the value in RAM at the selected address.

LPC:
Set the program counters lowest 8 bits to the immediate value

LTPC:
Set the program counters highest 8 bits to the immediate value, the next time the lowest 8 bits are changed. This means that after executing the LTPC instruction, the PC changed the next time an LPC instruction or a JCC/JPCC instruction with the carry at 0 or zero flag at 1 is executed.

STRAM:
Select the top RAM bank to use

This brings the total to 14 instructions out of the maximum of 16, so there is a bit of space left.
As this is a very limited set, some common macro’s are available, the assembler will translate them to the correct machine code:

NOP:
No operation.
Implementation:
Addi 0

NOT:
Invert ACC.
Implementation:
NORI 0

SUB:
Subtract ACC from the RAM value at the selected address, the result is stored in ACC.
Implementation:
NOT
ADD memaddr
ANDI 255
ADDI 1

SUBI:
Subtract ACC from the immediate value, the result is stored in ACC.
Implementation:
NOT
ADDI immediate
ANDI 255
ADDI 1

CLR:
Clear ACC
Implementation:
NORI 255

LDA:
Load an immediate value into ACC.
Implementation:
CLR
ADDI immediate

LDM:
Load a value from RAM into ACC.
Implementation:
CLR
ADD memaddr

MOVE:
Move a value from RAM to a different place in RAM
Implementation:
CLR
ADD srcaddr
STA destaddr

STORE:
Store an immediate value to RAM
Implementation:
CLR
ADDI immediate
STA destaddr

RAM banks:

The way the RAM works required some extra explanation, as it’s a bit unusual.
As the StupidCPU is an 8 bit CPU, it can access a maximum of 256 bytes of RAM, which is a bit low. To give it access to more RAM, bank switching is used.
With bank switching there are multiple banks of RAM, for example 4 banks of 256 bytes of RAM. The CPU can access one bank at a time, being able to select the accessible bank using a special instruction.

There is one problem for the StupidCPU, it has only 1 register. When the bank is switched, all previously accessible variables are gone. To counteract this, the StupicCPU has banks of 128 bytes, and only the highest 128 bytes of RAM are switched.
This way, the lowest 128 bytes of RAM are always accessible, the only downside is that with an 16 bit RAM bus the maximum amount of accessible RAM is 32K.

All peripherals are memory mapped, for example, an GPIO peripheral would be selected as a top bank and can then be written to/read from like RAM.

 

That’s it for now, the next blog post will go into some more design details.

Credit where credit is due:

The images made for the instructions where made with bitfield and the other images with yEd


Pages:12345