Splitting a module in multiple files

Hi guys, in all honesty, I found the rust module system the most complicated and confusing I had ever seen in about 25 years of programming(also I googled around and it looks like I am not the only one). Anyway I do not want to rant I just want a little help.

Can somebody please explain to me how I split a module in multiple files? I do not want to create artificial sub modules, I know how to do it like that, but it just happens that my module files grows to big with all different impls for structs and with the amount of unit tests going in. All I want is to have various parts of the same module in different files so I can track them more easy. The only solution I found so far is include! , does anyone know something else(please no submodules) ?

Regards, Romulus

What exactly is is you don't like about creating submodules? They're the normal tool for code organization in Rust, so there's not really a way to avoid them.

If your issue is that you don't like how long the paths are, you can use 're-exports':

// You can imagine that all the modules below are in seperate files...

pub mod top_level {
   pub struct MyTopLevelStruct;

   // This is a re-export:
   pub use nested::MyNestedStruct;

   // Note that this a private module!
   mod nested {
       pub struct MyNestedStruct;
   }
}

// To the outside world, it looks like both structs are in top_level:
use top_level::MyTopLevelStruct;
use top_level::MyNestedStruct;

// Because the nested module was private,
// the outside world can't tell it's there:
use top_level::nested::MyNestedStruct; // COMPILE ERROR
2 Likes

It is not about long path. Maybe it is just me, but I believe it just plain wrong engineering to put a struct in submodule just because the module file is too large. Also hiding that, looks to me like a plain hack. Basically you are forced to structure your code in an unnatural mode which has nothing to do with the relation between the entities you defined on these modules.

IMHO a submodule should imply some kind of dependency to the parent one, not the fact that the parent module file was too large for a regular developer to handle.

For example in my case I feel uncomfortable to work with files longer than 1000 lines but that I believe is a very poor argument for creating a submodule, even a hidden one.

There is no way to do this. The only way is the “artificial sub-modules” way.

Yeah, an ugly hack for such a simple problem. And we all think that rust is such an elegant language....

I'm not clear why it's considered such an ugly hack. Surely you have to organize your code in some way, and to me it seems elegant that the way one refers to code enables you to locate said code in the filesystem.

Surely it wouldn't be better to require recursive grepping just to find the implementation of a function or data structure? Of course with * imports you can reproduce that feature of undiscoverable implementations even in rust, but at least it's on an opt-in basis.

3 Likes

@motoras I'm curious what language you think does it well? Everything from Python, Java, C#, OCaml, etc... etc... all have similar module systems to how Rust's is. C has no concept of modules at all, everything is flat. C++ is different in that it has 'namespaces' that are constantly open (that creates a whole form of issues though related to namespace pollution), otherwise everything is flat like C. Lisp is... different as well in a multitude of different ways.

3 Likes

I have no idea which one is done well. But I am pretty sure the rust way is ugly at least if not plain wrong. Can you tell me which of the languages you mentioned does not allow to split a module/package in multiple files?

Rust allows us to split the package into different files - that's just that these files always contain at least one module each.

6 Likes

I guess you completely missed my point. I do believe that having the option to split a module in multiple files, it's a simple requirement which any kind of module system should fulfill. I also I do believe that by any software engineering standard, a sub-module must have some relation of logical dependency with it's parent, and should not be created due to some artificial imposed language constraints.
Being forced to use sub-modules in order to simply split your code in multiple files, it's at best a very unintuitive workaround, or as I said before a straight hack.

Not sure what you meant by "package' in this context. I was thinking of java packages when I mentioned that word. And thanks God, I do have the option to put as many file as I want in a java package, which I thought it's the closest java concept to rust modules.

I'd love if you could point to a language that can do this? Other than Lisp or so I can't think of any immediately off hand? It would make compilation extremely more difficult as each module needs to be built as a standalone translation unit.

No a java module is a class (the old adage of everything in java is a class, lol). You can have 1 or more java modules/classes in a file, but you cannot split up a module/class among multiple files. A package is a namespace.

4 Likes

For what I know, you can define a java package in multiple files, you can have a cpp namespace in multiple files, and if I remember well you can declare a module multiple times in ruby. There are always restrictions and corner cases and weak points with any solution, but in general it is practical to have such a feature, and IMHO most languages provide a simple and intuitive solution for this problem.

Regarding your second observation, it is first time in my life that I heard that a java class is a kind of module. While is true that a class can have inner classes, so sometimes( but rare) also can act as a namespace, any java program I ever seen in my life organize, is code in packages. In my opinion java.util.collections package it will be somehow the java equivalent of the std:collections while let's say the LinkedList class from that package it is the java equivalent for the LinkedList struct of he std::collections.

CPP namespaces != CPP Modules (which is concept now for note), but neither are like Rust modules, which seems more like a translation unit in C++ terms.

An issue with making something like a C++ namespace is you can't have strong references across them, they are inherently 'fuzzy', only existing as much as things you've specifically brought in, essentially just like renaming the things within it. It's not a 'unit' of things, but really just a space of names. Java's is similar although it has some scoping within them as well, mostly thanks to the runtime being able to combine them on load instead of at compile-time.

Rust, like Python, those modules are more of a proper tree. If you were able to split a single 'module' across multiple files, then you need to have all of those files compiled as a single unit so that things that access them can have dependencies working correctly. As it is now the filesystem defines it very well so it knows how to compile on down within it more efficiently.

I think it's because I've worked a lot on 'other' JVM languages including having to mess with the bytecode itself quite excessively. Packages in java are really nothing more than renamed 'scopes' of things, internally the JVM is very 'flat', like C. Classes are used to hold functionality together, and static classes (non-instanceable classes with only static methods) are extremely common on the JVM, and those themselves are used like Rust modules to hold static functions.

Take Kotlin or Scala, their extension methods live purely within static classes. Haskell on the JVM uses almost exclusively static classes for it's modules. Groovy... is a mess, it conflates it all together far worse than you could imagine. ^.^;

Except you can't have code in packages, the JVM just does not work that way. All code absolutely always must exist within a class, whether an instanceable class or not. Packages are nothing but a prepended string onto the class name. A class named Blah in a package of vwoop.blorp becomes the name vwoop.blorp.Blah near exclusively in the JVM. That even leaks into java the language in that you must use that name unless you setup an alias before it (via import).

Though it's not. The java.util.collections package contains no code, it's nothing but a namespace for classes, to extend their name to make ambiguity less possible. The LinkedList struct of std::collections is a type, and there is code that can be both within collections, or within implementations of either the struct or trait. In java there is only the class as the unit of code organization. Whether you want to instance it into an object or not is tangential, but all code lives within classes, not packages.

4 Likes

One module per file is more practical, anyway....

1 Like

This post was flagged by the community and is temporarily hidden.

Yes, if you like to work with large files. For me a file more than few hundred lines is hard to work with. I've seen people working with files of 1000+ lines, but for my old processor(I mean brain) that's too much to handle.

I too find it easier to work with smaller modules, and every time that I have worked on a project I have found a sensible way to reduce a module down to under 300-400 lines each. This while maintaining structure in the module scheme.

Usually I do this by splitting implementation blocks into their own private module, each module handles a different aspect of the project. For example, in vec-utils I split up the implementation of the three parts of my api like so:

  • the core api (this handles the simple case)
  • box handling (this was patched on later, so it doesn't strictly follow the name vec-utils, but the spirit of this crate which is removing unnecessary allocations)
  • generalized api (this handles the general case, and is a little less optimal for the common case)
    • I could split the general api into two parts, one to handle the variadic workaround and another to apply the workaround, but I didn't really think about that at the time

This is just one small example from the projects that I have done. I didn't plan out this layout from the start. While I was working on the modules, once a module reached a critical size of about 500 lines, I would try and find places where there was a substructure that could be extracted. Almost always, there was at least 1 substructure that played a large role in the current module, but could be extracted.

So I disagree that it can be hard to find ways to sensibly split your modules into sub-modules. Usually once a module gets to a certain size, there are multiple separate aspects that could be split.

Another way to reduce the size of modules is to split off a group of related types into a sub-module, this way you can use the privacy rules to better reason about your code, as the parent module can't access private members of the sub-module. And if you keep a clear api, even between modules of a crate, this can lead to a more maintainable project that.

2 Likes

I believe we can actually create a procedure macro that automatically "glue" "partial modules" together at compile time, if it's worth it.

Wouldn't the listings and debug information be relative to the unified module? That seems to partially defeat the OP's purpose.