RISC-V 双周简报 (2018-09-01)

要点新闻：

darkriscv: 仅用一晚上就实现的RISC-V处理器
中天微发布全球首款支持物联网安全的RISC-V处理器

RV新闻

darkriscv: 仅用一晚上就实现的RISC-V处理器

高人Marcelo Samsoniuk在github上以BSD License发布了其一个晚上设计出的RISC-V处理器，足以说明了标准本身的精简。

The main problem around the picorv32 is that most instructions requires 3 or 4 clocks per instruction, which resembles the 68020 in some ways, but running at 150MHz. Anyway, with 3 clocks per instruction, the peak performance is around 50MIPS only. As long I had some good experience with experimental RISC cores, I started code the darkriscv only to check the level of complexity. For my surprise, in the first night I mapped almost all instructions of the RV32I specification and the darkriscv started to execute the first instructions correctly at 75MHz and with one clock per instruction, which resembles a fast and nice 68040! wow! :)

Github Repo: darklife/darkriscv

中天微发布全球首款支持物联网安全的RISC-V处理器

目前是Alibaba旗下子公司的杭州中天微，近日发布了其面向物联网安全的RISC-V处理器，这款处理器可选的支持TEE环境，核心很小很精简。

技术特征：

RISC-V全兼容，RV32ECM指令集
16个32-bit GPR (E-ext)
只支持M-Mode
2级精简流水线，IF/EX
哈佛总线架构
紧耦合IP，包括计时器、矢量中断控制器组件
极简调试模块，支持片上硬件调试
可配置指令cache
可配置TEE引擎

技术优势：

成熟的扩展指令集
成熟精简的架构
支持TEE安全引擎
成熟稳定的工具链及开发工具
软件自主生态支持
强大的国内自主研发团队
高效的技术支持服务
具有大规模量产经验
DesignKit SoC参考设计平台
成熟可靠的仿真验证环境

Link: 中天微发布全球首款支持物联网安全的RISC-V处理器

谷歌计划在其下一代的开源硬件安全模块中采用RISC-V处理器

微软和谷歌最近都开始规划从芯片级别提高其服务的安全性。在最近的Hot Chips大会上，Spectre/Meltdown依然是关注的热点，而且目前并没有非常好的解决方案。

谷歌之前公布了其Titan安全芯片，以此提升其服务的安全性；同时谷歌也宣布在明年的Titan芯片中将会采用RISC-V处理器，并且可能会是一个开源实现。

Google is forming an industry group to launch, probably next year, work on an open source implementation of Titan, possibly based on a 32-bit version of a RISC-V core. The open variant is geared for broad use in any embedded or consumer product.

小编提示：目前来看，Titan解决的并不是Spectre/Meltdown这类的问题，而解决可信启动和安全认证的解决方案，代替的是过去不开源的TPM；而Titan Security Key则是类似Yubikey的方案，这里比较容易混淆。

Link: Microsoft and Google Planning Silicon-Level Security

技术讨论

为什么RISC-V需要mscratch寄存器

Pierre G.在sw-dev上提问为什么在压栈和出栈都需要借助mscratch寄存器，是不是强制需要的? 在Priviledged ISA specification中mscratch寄存器在描述的作用是：

The mscratch register is an XLEN-bit read/write register dedicated for use by machine mode. Typically, it is used to hold a pointer to a machine-mode hart-local context space and swapped with a user register upon entry to an M-mode trap handler.

Samuel Falvo II和Michael Clark做了回答

Samuel Falvo II 的回答:

The problem is that the stack pointer represents the value of SP as it was just before the trap occurred. So if you’re implementing an interrupt handler, and interrupts are vectored, then sure, you can make the reasonable argument that SP is valid, and you can probably use it.

However, if you’re handling interrupts in a non-vectored manner, OR, if you’re handling a trap such as a page fault or other such thing, can you really trust the value of SP to be valid? Consider, maybe the SP register got corrupted and now contains an odd address; or, maybe it points into ROM; etc. OR, maybe the SP register is completely valid, but now points into a page which is not mapped (e.g., you’ve exceeded the OS-allocated stack space, and now a page fault handler has to expand the stack by mapping a new page). In pretty much any of these circumstances, a reference through SP will incur another trap, and will do so at precisely the worst possible time – between the trap having been taken and the time when user-state has been preserved for later restoration.

The use of the scratch register is important because it’s the only way you can statically guarantee a buffer large enough to hold user state while taking a trap and figuring out what to do about it. Maybe, after having decided it was a plain interrupt, you switch over to using the user-mode stack. Or, maybe for a page fault, you need to put the thread to sleep while the fault handler waits on I/O to page in a needed block of memory. Etc.

On other processor architectures, like ARM, 680x0, or x86, each privilege mode has its own stack pointer, which the supervisor-mode code can depend on to always be correct. The SP register gets reloaded with a constant every time that privilege mode is entered. RISC-V doesn’t have this mechanism in hardware; relying on the scratch register is the closest analog it has.

总结起来：在ARM, 680x0, or x86架构的处理器中，他们每个特权模式都有单独的栈指针，所以每次进入supervisor-modes时，SP都能读取正确的值, 从而保证正确运行。但是RISC-V在硬件没有这样的机制，只能通过mscratch寄存器来模拟实现类似的功能。

Michael Clark的回答：

t is needed when you implement U mode so that user mode and machine mode can have separate stacks. Arguably it could be made optional for M-mode only systems. A separate interrupt stack doesn’t seem like it should necessarily be mandatory.

当系统中user模式和machine模式需要单独的栈时，你需要用到mscratch 如果系统只有一个中断栈是可以不需要

sw-dev上的讨论

为什么RISC-V GCC (Linux)默认使用medlow代码模型

RISC-V现在有两个代码模型：medlow和medany。

medlow: 对全局数据的寻址使用lui/ld的指令组合，程序被限制在[-2GB,2GB]内的连续2GB虚拟地址空间范围。
medany: 对全局数据的寻址使用auipc/ld指令码组合，程序被限制在任意连续的2GB虚拟地址空间范围。

在这两种代码模型中，lui和auipc都是为了设置全局数据的[31:8]位的20位高址。在medlow模型中，由于lui/ld的指令组合使用0x00000000为基址做32比特地址寻址，当局部出现的多个全局数据寻址，他们可共享同一个lui设定的高址。在medany模型中，lui被替换为auipc，使用pc为基址。由于所有的全局寻址为基于pc的相对寻址，medany代码模型可以支持对任意地址空间的寻址，突破medlow的地址空间限制。同时，medany可以支持加载时地址重定向，即地址无关代码(position independent code, PIC)的生成。

在Linux环境中，所有的程序都是运行在虚拟内存中，独占一个虚拟内存空间。当不需要加载时地址重定向的需求时(非动态加载库)，使用medlow的lui/ld组合，GCC能够生成性能较优的代码，因而Linux上的RISC-V GCC默认使用medlow代码模型。在编译动态加载库时，则需要手动使用medany代码模型。

为什么medany生成的代码性能比medlow差一些呢？由于pc不是一个常量，对局部的多个全局数据寻址共享同一个auipc设定的高址可能会造成错误。所以auipc/ld的指令组合并不能很好地被GCC优化，会产生大量的auipc指令。此外，现有GCC编译器在-O -mcmodel=medany -mexplicit-relocs的参数组合下甚至会生成错误的代码。在没有必要时，应当使用medlow模型，如果需要生成可重定向代码，现在暂时不要使用-O -mcmodel=medany -mexplicit-relocs的参数组合。

sw-dev上的讨论 [1] [2] [3]

GNU 工具链 medany 和 explicit-relocs 选项造成的问题

int array[995] = { [10] 10, [99] 99 };
long long ll = 100;

long long
sub (void)
{
  return ll;
}

int
main (void)
{
  return sub ();
}

这段代码用 rv32i newlib 配置的 riscv gun 工具链加上 -O -mcmodel=medany -mexplicit-relocs 选项会产生下列汇编：

sub:
.LA0: auipc a5,%pcrel_hi(ll)
lw a0,%pcrel_lo(.LA0)(a5)
lw a1,%pcrel_lo(.LA0+4)(a5)
ret

这看起来合理，尽管也许应该是 %pcrel_lo(.LA0)+4 ，因为 +4 应该是放在 ll 地址后面，而不是 .LA0 。但是当反汇编 a.out 时发现：

000101ac <sub>:
   101ac: 00002797          auipc a5,0x2
   101b0: 7fc7a503          lw a0,2044(a5) # 129a8 <ll>
   101b4: 8007a583          lw a1,-2048(a5)
   101b8: 00008067          ret

+4 的偏移量溢出了，没有警告生成了出错的代码。

我小心选择数组的长度来强制复现该错误。

这里出错的原因是，尽管变量 ll 是 8 字节对齐，但 auipc 不是，而 medany 用的是 auipc 指令到 ll 变量之间的偏移量，所以这个偏移量不是 8 的整倍数。 auipc 只保证 4 按字节对齐（如果没有指定压缩指令集 C 扩展），指定了的话是按 2 字节对齐。GCC 假定任意小于变量对齐的偏移量是安全的，显然实际上（这个例子里）并不是。

同样的错误会发生在使用 long doubles 和 int128_t 的 rv32 he rv64 ，因为需要 16 字节对齐。

不幸的是，目前没有很好的解决方案。如果禁止 pcrel_lo 的偏移量，但会需要额外产生地址的指令使得效率上不如 medlow 。如果强制 auipc 按正确的方式对齐，就会潜在地在 auipc 之前增加多个空指令，对代码量的多少和性能产生不良影响。

只使用 -mcmodel=medany 是安全可靠的，但同时使用 -mcmodel=medany 和 -mexplicit-relocs 就会造成问题。

目前来说，可能让 gcc 的该选项无效是最好的选择，因为更改名字或者给出警告会让构建产生错误，招来大家的抱怨。如果只是让该选项无效，很少会有人注意到。

sw-dev上的讨论: sw-dev

社区关于RISC-V的工具链维护状况的一些抱怨(续)

Jim Wilson 回复之前 Tommy Murphy 的信息，说明了以下内容：首先对于 gdb 和 binutils 的编译参考如下

Binutils 可以使用主线代码。 gdb 可以使用主线代码及其 gdb-8.2-branch （尚未正式发布），或者使用 riscv-gnu-toolchain 仓库的 gdb 分支。

在 riscv-gnu-toolchain 仓库创建了分离的 riscv-binutils 和 riscv-gdb 树，其中：

riscv-binutils 是 FSF（主线） binutils，并可能添加一些 backported 的补丁
riscv-gdb 依然使用一个与主线不同的本地分支

对于 gcc 测试套件的运行环境搭建，参考如下：

针对基于 qemu 的测试, 你可以简单的在 riscv-gnu-toolchain 中运行 “make check”
针对基于 gdb 模拟器的测试, 也很简单。
针对板级测试, 你需要创建一个描述如何同目标板通信的 dejagnu/expect 文件（详情请参考 dejagnu 文档），具体情况具体分析。

以下是不同 FSF 工具的测试参考文档：

对于 riscv-binutils-gdb 和 riscv-binutils-gdb 的解释如下：

它们共享同一开发树： binutils 和 gdb 的发布分支在同一个 git 仓库中。
riscv-gnu-toolchain 会 checkout 出两个 riscv-binutils-gdb 的副本：riscv-binutils（binutils 发布分支）和 riscv-gdb（gdb 发布分支）; 理论上我们使用主线 gdb，但我们本地仓库还没来得及更新到主线。

Karsten Merker 回复之前 Tommy Murphy 对于 toolchain 组件版本的疑问，说明了以下内容：

对于 Linux/glibc，binutils 2.31.1 和 gcc 8.2.0 可以配合使用。（Debian 用 gcc 8.2 作为默认编译器）
对于某些软件包，在所用 gcc 8 时会有问题，而 gcc 7 则没有问题。但这些问题并非 RISC-V 特有的，其他构架上也会有同样的问题。

代码更新

rocket-chip在repo中移除riscv-tools

过去由于RISC-V的工具链处于不断的变化之中，所以在rocket-chip中不得不包含对某个特定版本的riscv-tools的git submoudle链接。而最近的Pull Request显示，开发者正在将riscv-tools submodule移除，因为工具链已经足够稳定，无需再指定特定工具链了。

The software ecosystem is now mostly stable. There are work-arounds to avoid bringing the giant riscv-tools into the tree for most projects using rocket. This PR makes their life easier.

Github PR: https://git.io/fA3h0

实用资料

SiFive TileLink Specification 中文化资料

刘鹏同学正在翻译整理 SiFive TileLink Specification 文档，目前翻译了部分内容，后续内容会陆续放出。待翻译完毕会重新进行整理校对，该文档主要是对于 TileLink 协议进行翻译与解释，重点讲述了 TileLink 中各通道内信号传输规则，适合于初学者了解 TileLink 协议的特点，设计思想。翻译内容链接如下，英文原文档在下方。

Fedora编译RISC-V Linux的脚本

Richard W.M. Jones在GitHub上完整公开了他编译Fedora RISC-V Linux的所有脚本。对编译Linux平台bootstrap过程感兴趣的同学不要错过这个好工程。

fedora-riscv-bootstrap

行业视角

Rambus: 用RISC-V降低风险

Rambus最近发表博客，阐述了他们对于采用开放标准ISA的态度。

With companies like Apple, Facebook, Google, and Samsung building their own processors instead of relying on Intel, Qualcomm, or others, there is major interest in RISC-V. The open source, free approach could potentially lower risks associated with building custom chips. Because of the low cost and the open source nature of the architecture, manufacturers are free to design a chip without expending the amount of resources usually associated with designing a chip. Companies like Nvidia and Western Digital have signed on to use RISC-V in their own silicon. The former is using RISC-V for a governing microcontroller that manages its graphics cards while the latter plans to unveil a new RISC-V processor for its cores in its hard drives for 2019-2020.

Link: Lowering Risks with RISC-V