Crates.io.. search by traits, & safety?


#1

Q1 is i possible/ would there be demand for a feature to search the crates by the traits they implement,
e.g. we can search for the keyword ‘collections’ , but we could be more specific (“find me all the crates that contain impl’s of trait Index”)’.

Just as the type/trait graph might be a useful way of navigating Rust source, it might be a useful way of searching crates too.

I see there is already ‘keyword’ based search; I suppose we could just extract trait names from impl…for…’ and automatically generate keywords for those, but separating them off in the UI to actually browse separately without polluting the keywords themselves might be nice

Q2 Would it also be an idea to filter by safety,
i.e. any crate that contains no unsafe blocks would be considered ‘especially safe’, whilst crates which use unsafe blocks internally (even if presenting a safe interface) are only safe if they’ve been through empirical testing;

this leads me to ask if this is already handled; the idea of unsafe is “if it crashes, you know where to look” …but if thats in library code what happens?

Does that mean a project really needs to be able to specify on bringing a crate in ‘expect this is extra safe, no unsafe blocks here’.

we assume the standard library’s use of unsafe will have been through more testing, but is that really the case for anything else thrown up on crates.io.

I would personally also like an option for crate wide unsafety, turn the borrow checker into a warning… but the reverse, ‘really safe crates’ seems like they would have more utility for the rust ethos. If you could be more specific in both directions that might be nice.


Borrow/move/closure symantics are driving me to my wit's end
Safe/unsafety of operators
An Unofficial Guide to Using Rustc
#2

I reckon being able to search by traits implemented or type names could be quite useful. For example, imagine you are wanting to know what serde serializers exist (e.g. serde_json, serde_xml). You could search for any crates which implement the serde::Serializer trait.

I’d say that filtering by safety is less useful. Usually unsafe is just an implementation detail, I don’t particularly mind if my dependency needs to use unsafe to do something. Things like doubly linked lists and graphs almost always require unsafe in some form or another (if you want to avoid Rc<RefCell<T>> and have decent performance anyway) because of the multiple mutable references thing.

So I wouldn’t say it’s beneficial to penalize a crate for using unsafe. As long as they are providing a library that is well tested and explains its assumptions in the documentation (e.g. under a # Safety section), it shouldn’t matter and unsafe would just be an implementation detail.


#3

Things like doubly linked lists and graphs almost always require unsafe in some form or another

thats true, but what I would see is some sliding scale in the ecosystem as crates mature. those kind of structures might eventually want to find their way into the standard library, or be required to be ‘tested’ to the same extent.

you have implicit faith in the ‘std::library’ … but would there be a way to quantify why. Is it because widespread use uncovers any errors, or do you just expect it’s already been tested by the core team.

So I wouldn’t say it’s beneficial to penalize a crate for using unsafe.

You could look at it the other way, reward a crate for being 100% safe (an then you know that other side of testing isn’t as much of an issue)… and give it’s users extra peace of mind


#4

I’d say that I have implicit faith in std because they have very high standards and are very thorough in their testing. I trust that crates like regex and bindgen are going to be reliable and safe for similar reasons, I can look through their repo, see whether they have tests, inspect code coverage, etc.

So you could say that even though there’s a large amount of unsafe usage, I trust it because it’s open and I can inspect it if I have any doubts. Plus the green “passing” badge from travis/appveyor is kinda reassuring.

You raise a good point here. I’d be a bit suspicious if some random web API crate was using unsafe, however something which provides FFI bindings would be expected to have bits of unsafe all over the place.

As someone with a lot of C/C++ experience you probably appreciate more than most the value that a “100% safe” tag might have. Maybe I’m a bit naive, but I tend to assume that if you are using Rust you’ll almost always have 100% safe code and not be doing any funny unsafe stuff.


#5

Back to the original point on making crates.io’s search index more powerful though, I reckon it’d be really cool if crates.io could inspect libraries and give you more information about a crate. For example, indexing traits used/defined, statically determining API breakage, etc.

The tools are definitely out there to allow this extra analysis and I’ve heard of at least one language’s package manager which will a package’s versions to determine if it’s backwards compatible, so it wouldn’t be overly difficult to do. The crates.io search index is overly simplistic at the moment, so some of the ideas proposed by @dobkeratops could make it a lot easier to use.


#6

Right now, crates.io does not analyze any of the files in the crates. I’m not sure if crates.io is the right place for this feature, there’s not a real reason this has to be part of crates.io at least to start with. Perhaps docs.rs or a separate tool? https://github.com/onur/docs.rs/issues/134

As far as unsafe code, you may be interested in this pre-RFC for an idea called Cargo Safety Rails.


#7

ok that might make more sense , I note that they link directly to each other, and docs.io does itself have an overview of the crate, so that would indeed be a logical place to show how they relate to each other through traits

As far as unsafe code, you may be interested in this pre-RFC for an idea called Cargo Safety Rails3.

that does seem very closely related, covering what I had in mind. you’d almost ask given rusts security focus if you’d want that as the default, with most code relying on the std lib, allowing unsafely as an opt-in ( but that might be excessive)


#8

I made a post on the internal forum asking whether the cargo tool is able to tell you things about a library (presumably by printing a massive JSON blob).

I’m kinda curious to see what it would take to create a more powerful search index tool which is able to inspect crates and give you metrics on them (e.g. number of lines of unsafe, percentage of public API which is documented). I’ll probably also look into what @steveklabnik is doing with his rustdoc redux and if I can reuse some of the things he’s doing.


#9

ages ago I wrote a (pre 1.0) rust-source -> html view, it needed to generate what I called a “cross crate map” to locate such links. As the rust ecosystem is moving on , I also wonder if that’s already covered by some other mechanism already .


#10

Search for common traits like Index is going to be very noisy. It doesn’t necessary mean a collection, it could be implemented on some newtype, or just for convenience working with some unimportant type internally.

Filtering by presence of unsafe may create perverse incentives for crate authors. For example if it applies to the crate, but not dependencies, authors may make crate “foo” depend on “foo-unsafe-stuff-ha-ha” crate. Also badging something using “unsafe” keyword would imply the crate is not safe, but that is not necessarily true.


#11

Search for common traits like Index is going to be very noisy.

what about combinations of traits, e.g. if you hack Index, but also methods to push/pop at the end , or explicitely the ability to get a ‘random access iterator’ , that would narrow it down further?

Filtering by presence of unsafe may create perverse incentives for crate authors.

metric : consider the number of users of a crate to estimate how safe it is;
e.g. the std prelude crate has a huge number of users, so the chance of unknown errors is small,
however in the contrived ‘foo , foo_unsafe’ example foo_unsafe is only used by ‘foo’, so the chance of errors is much higher…

I wonder if such a split might actually be ok, as given the inconvenience, it would create a stronger incentive to have a big separation between the safe and unsafe parts…


#12

Unsafe is normally not that hard to check. If you can’t immediately figure out what invariants are required for it to be safe, and there’s not documentation explaining it, maybe hold off from using that library. I try to document my invariants at the unsafe site.

If you do grep -R -n -H "unsafe" src you’ll get all the unsafe code in a module, and if you just want to count them do grep "unsafe" src | wc -l (modulo multiple on same line).


#13

as its so easy to determine, that might be one of many automated insights that the website could present :slight_smile:


#14

If it were that easy the compiler would check it :stuck_out_tongue:


#15

I was thinking of creating some sort of cargo metrics tool which hooks into rustc and applies various analysers to inspect the HIR representation of a crate, before printing the analysis results to the screen as JSON. That way you get proper span information (grep only shows the line unsafe occurs on, not how many lines are unsafe) as well as more useful things like the ability to inspect types, check for doc comments, etc.

Then it should be fairly easy to create a program which will monitor crates.io, download any new crates, run cargo metrics on them, then store the analysis results in some sort of search index that a web app can query. I imagine that part would be quite similar to what docs.rs does, so you can have a skim through their source code for inspiration.


#16

If it were that easy the compiler would check it

I’m talking about information that should be useful during the search, ‘what is the crate I need’. this is a 1bit insight that could be quite useful. ‘i’m after bindings to some C library…’ -> i’m probably looking for an ‘unsafe crate’ ; i’m looking for something higher level, ‘i’m probably looking for a safe crate’.

the general metrics suggested above would be best… one of several automated insights


#17

Are there any existing services which let you inspect/search various aspects of the packages on a package manager? I feel like there would be something for NPM or ruby. It may be a good idea to look for existing services to see how popular something like this would be or what metrics would be most useful.

Of course, I’m going to play around with my cargo metrics thing regardless because it’s a nice way to learn how rustc works under the hood. If it actually goes anywhere I may even consider publishing it to crates.io.


#18

That sounds like a solution in search of a problem. For C bindings there’s already a *-sys naming convention and a crates.io category.


#19

“solution in search of a problem”

contrary to what most people think, a lot of great innovation happens that way :slight_smile:

"*-sys naming convention "

surely something computable is better than a naming convention


#20

It’s a while off!