Skip to content

Tutorial: how to deal with strings #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
certik opened this issue Jul 24, 2020 · 17 comments
Open

Tutorial: how to deal with strings #65

certik opened this issue Jul 24, 2020 · 17 comments
Labels
section: learn Relevant for the learn section on the webpage

Comments

@certik
Copy link
Member

certik commented Jul 24, 2020

I am to this day struggling how to deal with strings in modern Fortran. I would be happy to contribute a tutorial, once I learn what the best practice is.

Function accepting a string

integer function f(s)
character(*), intent(in) :: s
f = len(s)
end function

Note: the first argument in character(...) is len, so the above is equivalent to character(len=*). I think it is ok to not specify len, as things are shorter then.

Subroutine returning a string

subroutine f(s)
character(:), allocatable, intent(out) :: s
s = "Some text"
end subroutine

Note: This automatically allocate the LHS, so s will get allocated to the length of the string, no white space padding.

Question 1

In fpm, the following code:

subroutine cmd_build()
type(string_t), allocatable :: files(:)
character(:), allocatable :: basename, pkg_name, linking
integer :: i, n    
print *, "# Building project"
call list_files("src", files)
linking = ""
do i = 1, size(files)
    if (str_ends_with(files(i)%s, ".f90")) then
        n = len(files(i)%s)
        basename = files(i)%s(1:n-4)
        call run("gfortran -c src/" // basename // ".f90 -o " // basename // ".o")    
        linking = linking // " " // basename // ".o"
    end if    
end do
call run("gfortran -c app/main.f90 -o main.o")
call package_name(pkg_name)
call run("gfortran main.o " // linking // " -o " // pkg_name)
end subroutine

Gives a warning:

# gfortran (for build/gfortran_debug/fpm/fpm.o build/gfortran_debug/fpm/fpm.mod)
src/fpm.f90:163:0:

 linking = ""
 
Warning: ‘.linking’ may be used uninitialized in this function [-Wmaybe-uninitialized]

What am I doing wrong? How do I initialize an empty string?

Question 2

How do you return a string from a function as a return value?


I will probably have more questions. These are the most pressing.

@LKedward
Copy link
Member

This would be a good tutorial to have as a reference 👍

I cannot find a reference right now, but I believe the maybe-uninitialized error in gfortran occurs spuriously for allocatable strings. I have the same warning in gfortran when I use allocatable strings, but not with ifort or new flang.

@milancurcic
Copy link
Member

I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.

Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.

Question 2:

module mod_str
contains
  pure function str()
    character(:), allocatable :: str
    str = 'hello'
  end function str
end module mod_str


program test_str
  use mod_str, only: str
  print *, str()
end program test_str

@LKedward
Copy link
Member

Continuing Milan's example:

program test_str
  use mod_str, only: str
  character(:), allocatable :: my_string
  my_string = str()
end program test_str

My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level.
So in comparison to a subroutine implementation, functions returning allocatables incur an extra allocation and, in this example, an extra copy during assignment.

@LKedward LKedward reopened this Jul 24, 2020
@LKedward
Copy link
Member

(sorry, closed by mistake)

@certik
Copy link
Member Author

certik commented Jul 24, 2020

@LKedward that is precisely why I asked about this. If that is the case, that seems like a big downside and our string routines in stdlib should return the strings via arguments as subroutines, not as return values from functions.

@LKedward
Copy link
Member

If that is the case, that seems like a big downside and our string routines in stdlib should return the strings via arguments as subroutines, not as return values from functions.

Yep, I haven't benchmarked it but this is why I generally avoid functions for returning non-scalars. You can use pointers to return allocated arrays from functions more efficiently, but I also avoid using pointers.

NB: Allocation on assignment

Another useful thing to note, which I only learned recently, is that allocation-on-assignment doesn't occur for colon subscripts ((:)).

So this doesn't work:

program test_str
  use mod_str, only: str
  character(:), allocatable :: my_string
  my_string(:) = str()
end program test_str

Based on this, I would consider it good practice to use the colon subscript to explicitly indicate where there is assignment only and to avoid accidental reallocation.

Question: filling a character string

I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)?
Example case is for filling a string with all zeros.

@milancurcic
Copy link
Member

My understanding about this usage is that there are two allocation-on-assignments happening: one in the function for the function result; and one for the assignment at program level.

Yes, I think this is true for any function returning anything allocatable. It's especially penalizing for large arrays. Don't do it if you care about high performance.

I have a toy wave physics project that did this for everything, including large arrays. I was optimizing for functional API and UI, although at the time I didn't understand the implications of functions returning allocatable arrays. Later I heard from a person who found the code to do exactly what they needed but it was too inefficient so they rewrote everything to subroutines to make it fast :).

@certik
Copy link
Member Author

certik commented Jul 24, 2020

Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as intent(out) for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)

@LKedward
Copy link
Member

Regarding functions returning allocatable --- is this mandated by the Fortran Standard to allocate twice, or are compilers permitted to make it as efficient as intent(out) for subroutines? (It's just that some or most compilers currently don't optimize it out, but they could in the future.)

It would make sense that if the function is able to be inlined, then one allocation could be optimized out, but I'm no expert here.

I think that in general, the function result needs to be a distinct memory location because it may be used subsequently in an expression; i.e. there is a fundamental difference between a function result and a subroutine intent(out) dummy arg - the former is returned by value whereas the latter is essentially a pointer.

Note 1, section 15.6.2.2 from the interpretation doc:

The function result is similar to any other entity (variable or procedure pointer) local to a function sub-
program. Its existence begins when execution of the function is initiated and ends when execution of the
function is terminated. However, because the final value of this entity is used subsequently in the evaluation
of the expression that invoked the function, an implementation might defer releasing the storage occupied
by that entity until after its value has been used in expression evaluation.

@certik
Copy link
Member Author

certik commented Jul 24, 2020

My understanding of the text you posted is that the Standard allows the result of the function to be as efficient as an intent(out) dummy argument if the compiler chooses to do that.

@LKedward
Copy link
Member

Would such an optimization be prevented by the requirement that the RHS is evaluated before the assignment occurs?

From 10.2.1.3:

The execution of the assignment shall have the same effect as if the evaluation of
expr and the evaluation of all expressions in variable occurred before any portion
of the variable is defined by the assignment.

for

variable = expr

@certik
Copy link
Member Author

certik commented Jul 24, 2020

I don't know. We might need to ask at the committee. My understanding of it is that the key is "shall have the same effect", in other words, it does not actually have to happen that way, only have the same effect. So the question then becomes if double allocation has the same effect as single allocation. For a string, it seems the logic of the code would be the same. For user derived types perhaps the user requires the finalizer to be called twice.

@smeskos
Copy link
Contributor

smeskos commented Jul 24, 2020

Regarding Question1:
Ignore the warning, this is one of the flags, and actually for the same particular use case, that I suppress with -Wno-maybe-uninitialized, and if you recall one of the reasons I raised an issue here fpm. Also, take a look at Steve Kargl's post in our discourse here. Finally, another similar discussion can be found here.
Regarding Question2:
I personally follow the way presented by @milancurcic :

I do exactly the same thing as your examples of function accepting and subroutine returning strings. I imagine this is common use.

Question 1: You're doing nothing wrong. Gfortran is warning about correct Fortran.

Question 2:

module mod_str
contains
  pure function str()
    character(:), allocatable :: str
    str = 'hello'
  end function str
end module mod_str


program test_str
  use mod_str, only: str
  print *, str()
end program test_str

However, since we are into this discussion, I also have something to add about the behavior of allocatable characters that may be relevant.
The following compiles with no warnings or errors but abords at runtime with a segmentation error:

character(len=:),allocatable :: str

subroutine init_string(filename, str)
    character(len=*),intent(in) :: filename
    character(len=:),allocatable, intent(out) :: str
    open(file...)
    read(unit,*)str
    close(file...)
end subroutine init_string

while this is correct:

character(len=:),allocatable :: str

subroutine init_string(filename, str)
    character(len=*),intent(in) :: filename
    character(len=:),allocatable, intent(out) :: str
    character(len=50) :: temp ! 50 is just a random number for demonstration purposes
    open(file...)
    read(unit,*)temp
    str = trim(temp)
    close(file...)
end subroutine init_string

Another interesting behavior is when the allocatable character in the above example is part of a derived type eg:

type  t_gas
    character(len=:),allocatable :: name
    double :: mass
    etc...
end type t_gas

Now assume we defined a type(t_gas)::gas and tried to read gas%name as we did in the first nonworking example then the program runs without any error but in reality name%gas remains uninitialized, you can print it and it just returns blank but NO error!!

@certik
Copy link
Member Author

certik commented Jul 24, 2020

@smeskos I think you cannot read into an allocatable character type. I vaguely remember this being discussed in the standards committee how to improve the standard to allow this. Until then I think it is not allowed.

@everythingfunctional
Copy link
Member

I've generally just resorted to using a string type for everything, and then for intent(in) arguments just using an interface to allow people to also pass in character literals (or just character variables).

@ivan-pi
Copy link
Member

ivan-pi commented Jul 25, 2020

Question: filling a character string

I have my own related question for strings: Is there a one-liner for filling a character(*) with a non-space character(1)?
Example case is for filling a string with all zeros.

character(len=:), allocatable :: s
s = repeat('0',10)
write(*,*) s

will output 0000000000

@LKedward
Copy link
Member

Perfect, thank you @ivan-pi!

@awvwgk awvwgk added the learn label Feb 14, 2021
@awvwgk awvwgk transferred this issue from fortran-lang/fortran-lang.org Aug 19, 2022
@awvwgk awvwgk added section: learn Relevant for the learn section on the webpage and removed learn labels Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
section: learn Relevant for the learn section on the webpage
Projects
None yet
Development

No branches or pull requests

7 participants